ML03 reconstructed private training data by reading confidence signals backwards. ML04 asks a simpler but equally dangerous question: was this specific person's data used to train the model at all?
The answer to that question alone is often sensitive enough to cause real harm — even without recovering any actual data.
What Is a Membership Inference Attack?
ML models don't treat all inputs equally. They behave subtly differently on data they were trained on versus data they've never seen. They're more confident. More precise. The loss is lower. This gap is small — but it's measurable.
A membership inference attack exploits that gap. The attacker probes the model with specific records and uses the model's response behaviour to determine: was this record in the training dataset or not?
The model never hands over any data. It just answers queries. But those answers carry a signal the attacker knows how to read.
How It Works in Practice
The attacker trains a second model — called a shadow model — on data similar to the target model's training set. They then run both datasets (members and non-members) through the shadow model to observe how confidence scores differ between the two groups. That observation becomes a classifier.
// Step 1: Train shadow model on similar data shadow_model.train(similar_dataset) // Step 2: Query target model with known member and non-member records for record in test_records: confidence = target_model.predict(record) // Members tend to produce higher, more peaked confidence // Non-members produce flatter, more uncertain distributions membership_signal = extract_signal(confidence) // Step 3: Feed signal into membership classifier result = membership_classifier.predict(membership_signal) // → "IN training data" or "NOT IN training data"
The classifier learns to tell the difference. At scale, the attacker can sweep through thousands of records and build a map of who was in the training dataset.
A Concrete Scenario
A hospital trains a disease-prediction model on patient records. The model is deployed as an API for doctors. An attacker queries the model with records of specific individuals — perhaps public figures or targeted patients. The model's confidence scores are measurably higher for individuals whose records were in the training set. The attacker now knows, with meaningful accuracy, which individuals were patients at that hospital. No data was stolen. No breach occurred. The inference alone is the violation.
Why this is a privacy violation on its own: In healthcare, finance, or legal contexts, confirming that someone's record exists in a dataset — even without seeing the record — can violate GDPR, HIPAA, or similar regulations. Presence itself is sensitive data.
How It Compares to Adjacent Attacks
| Attack | Goal | What You Need | What You Get |
|---|---|---|---|
| ML03 — Model Inversion | Reconstruct training data | API + confidence scores | Approximate private records |
| ML04 — Membership Inference | Confirm presence in training set | API + shadow model | Yes/no on specific individuals |
ML03 recovers content. ML04 confirms identity. Both use confidence scores as the attack surface. ML04 is more accessible — it requires less precision and scales more easily across many targets.
Why It Is Hard to Detect
Every query in a membership inference attack looks identical to a legitimate prediction request. There is no malicious payload. No unusual data format. The attacker just sends records and reads the response. Standard monitoring sees normal API usage at elevated volume — nothing more.
The signal being exploited — confidence score distribution — is a fundamental property of how models generalise. You can't patch it away. You can only reduce or obscure it.
How You Defend Against It
-
Reduce confidence score precision. Return rounded or binned probabilities instead of full floating-point values. This degrades the signal without breaking legitimate use cases.
-
Return labels only. If the use case permits, return just the predicted class with no confidence score at all. No signal, no attack.
-
Differential privacy during training. Adding calibrated noise during training forces the model to generalise rather than memorise — the confidence gap between members and non-members narrows significantly.
-
Regularisation (L1/L2). Regularisation reduces overfitting, which is the root cause of the confidence gap. Less memorisation means less signal for the attacker.
-
Rate limiting and anomaly detection. Sweep patterns — the same class of records queried repeatedly — don't appear in normal usage. Flag and investigate them.
Why This Matters for Web3
Decentralised AI protocols expose model inference through public APIs or on-chain calls — by design. An ML model used for credit scoring in a DeFi lending protocol, or for KYC verification, is queried openly by anyone. A membership inference attack against that model doesn't just leak user privacy — it maps which wallet addresses or real-world identities are enrolled in the protocol's private training set.
On-chain everything means query logs are permanently public. An attacker can replay inference requests indefinitely, refine their shadow model, and build a membership map of who participated — without triggering any security alert, because every query was a valid contract call.
The model's confidence scores become a permanent, queryable oracle for who is in the system.
Next in the series: ML05 — Model Theft.