ML04: Membership Inference Attack

ML03 reconstructed private training data by reading confidence signals backwards. ML04 asks a simpler but equally dangerous question: was this specific person's data used to train the model at all?

The answer to that question alone is often sensitive enough to cause real harm — even without recovering any actual data.

What Is a Membership Inference Attack?

ML models don't treat all inputs equally. They behave subtly differently on data they were trained on versus data they've never seen. They're more confident. More precise. The loss is lower. This gap is small — but it's measurable.

A membership inference attack exploits that gap. The attacker probes the model with specific records and uses the model's response behaviour to determine: was this record in the training dataset or not?

The model never hands over any data. It just answers queries. But those answers carry a signal the attacker knows how to read.

How It Works in Practice

The attacker trains a second model — called a shadow model — on data similar to the target model's training set. They then run both datasets (members and non-members) through the shadow model to observe how confidence scores differ between the two groups. That observation becomes a classifier.

Membership Inference Flow

// Step 1: Train shadow model on similar data
shadow_model.train(similar_dataset)

// Step 2: Query target model with known member and non-member records
for record in test_records:
    confidence = target_model.predict(record)

    // Members tend to produce higher, more peaked confidence
    // Non-members produce flatter, more uncertain distributions
    membership_signal = extract_signal(confidence)

// Step 3: Feed signal into membership classifier
result = membership_classifier.predict(membership_signal)
// → "IN training data" or "NOT IN training data"

The classifier learns to tell the difference. At scale, the attacker can sweep through thousands of records and build a map of who was in the training dataset.

A Concrete Scenario

A hospital trains a disease-prediction model on patient records. The model is deployed as an API for doctors. An attacker queries the model with records of specific individuals — perhaps public figures or targeted patients. The model's confidence scores are measurably higher for individuals whose records were in the training set. The attacker now knows, with meaningful accuracy, which individuals were patients at that hospital. No data was stolen. No breach occurred. The inference alone is the violation.

Why this is a privacy violation on its own: In healthcare, finance, or legal contexts, confirming that someone's record exists in a dataset — even without seeing the record — can violate GDPR, HIPAA, or similar regulations. Presence itself is sensitive data.

How It Compares to Adjacent Attacks

Attack	Goal	What You Need	What You Get
ML03 — Model Inversion	Reconstruct training data	API + confidence scores	Approximate private records
ML04 — Membership Inference	Confirm presence in training set	API + shadow model	Yes/no on specific individuals

ML03 recovers content. ML04 confirms identity. Both use confidence scores as the attack surface. ML04 is more accessible — it requires less precision and scales more easily across many targets.

Why It Is Hard to Detect

Every query in a membership inference attack looks identical to a legitimate prediction request. There is no malicious payload. No unusual data format. The attacker just sends records and reads the response. Standard monitoring sees normal API usage at elevated volume — nothing more.

The signal being exploited — confidence score distribution — is a fundamental property of how models generalise. You can't patch it away. You can only reduce or obscure it.

How You Defend Against It

Reduce confidence score precision. Return rounded or binned probabilities instead of full floating-point values. This degrades the signal without breaking legitimate use cases.
Return labels only. If the use case permits, return just the predicted class with no confidence score at all. No signal, no attack.
Differential privacy during training. Adding calibrated noise during training forces the model to generalise rather than memorise — the confidence gap between members and non-members narrows significantly.
Regularisation (L1/L2). Regularisation reduces overfitting, which is the root cause of the confidence gap. Less memorisation means less signal for the attacker.
Rate limiting and anomaly detection. Sweep patterns — the same class of records queried repeatedly — don't appear in normal usage. Flag and investigate them.

Why This Matters for Web3

Decentralised AI protocols expose model inference through public APIs or on-chain calls — by design. An ML model used for credit scoring in a DeFi lending protocol, or for KYC verification, is queried openly by anyone. A membership inference attack against that model doesn't just leak user privacy — it maps which wallet addresses or real-world identities are enrolled in the protocol's private training set.

On-chain everything means query logs are permanently public. An attacker can replay inference requests indefinitely, refine their shadow model, and build a membership map of who participated — without triggering any security alert, because every query was a valid contract call.

The model's confidence scores become a permanent, queryable oracle for who is in the system.

Next in the series: ML05 — Model Theft.