ML09: Output Integrity Attack — 0xTheBlackPanther

All previous attacks in this series targeted something inside the model — training data, parameters, the feedback loop, inherited weights. ML09 operates in a completely different layer. The model runs correctly. It produces the right output. Then, in the gap between model server and consuming application, an attacker intercepts that output and rewrites it.

The model is innocent. The infrastructure around it failed.

What Is an Output Integrity Attack?

An output integrity attack is a man-in-the-middle attack applied specifically to ML inference results. The attacker positions themselves — or compromises a component — between the model's inference endpoint and the system that acts on the result. They intercept the correct output and replace it with a manipulated one before it reaches its destination.

The model did its job. What the application receives is not what the model said.

Where the Attack Happens

Output Interception

// What should happen:
[Input] → [ML Model] → correct_output → [Application]
                                              ↓
                                      Action taken on truth

// Output integrity attack:
[Input] → [ML Model] → correct_output
                              ↓
                    [ATTACKER INTERCEPTS]
                              ↓
                       tampered_output → [Application]
                                              ↓
                                    Action taken on lie

The interception can happen at multiple points: the network layer (unsecured HTTP between services), a compromised middleware component, a manipulated logging layer, or a tampered display interface that shows different values than what the model returned.

A Concrete Scenario

A hospital's diagnostic model correctly identifies a patient as having a dangerous infection. That result travels from the model server to the doctor's dashboard over an internal HTTP endpoint — no TLS, trusted network assumed. An attacker who has compromised a node on that internal network intercepts the JSON response in transit and changes "diagnosis": "severe_infection" to "diagnosis": "no_finding". The doctor reads "no finding" and sends the patient home. The model was correct. The patient was harmed by the infrastructure gap.

The critical distinction: In every other attack, fixing the model fixes the problem. In ML09, the model is not the problem. The attack lives in the deployment infrastructure — network configuration, service authentication, API design — and that's where the fix has to go.

How It Compares to ML10

Attack	What Is Tampered	Model Involved?	Where to Fix
ML09 — Output Integrity	Inference result in transit	No — model is correct	Infrastructure, transport layer
ML10 — Model Poisoning	Model parameters directly	Yes — model is compromised	Model access controls, parameter signing

How You Defend Against It

Sign all inference outputs cryptographically. The model signs its output with a private key. The consuming application verifies the signature before acting. Any tampering in transit invalidates the signature and raises an alert. This is the most direct and complete defence.
Enforce TLS everywhere — including internal services. "Trusted internal network" is not a security boundary. Encrypt all traffic between services, including model servers, API gateways, and application backends. No plaintext inference endpoints.
Validate output structure and value ranges. If your model should return a confidence score between 0 and 1, reject anything outside that range. If valid outputs are constrained to a known set of labels, reject unlisted values. Tampered outputs often break expected formats.
Maintain tamper-evident audit logs. Log inputs, raw model outputs, and the values consumed by downstream systems — separately and immutably. Divergence between what the model returned and what the application received is the fingerprint of this attack.
Isolate the inference endpoint. Minimise the number of components that sit between the model and the consumer. Every hop is a potential interception point. Shorter paths are harder to attack.

Why Cryptographic Signing Is the Right Answer

Cryptographic Output Signing

// At inference time — model signs its output:
output = model.predict(input)
signature = sign(output, private_key=model_signing_key)
response = { "output": output, "sig": signature }

// At the consumer — verify before acting:
if not verify(response.output, response.sig, public_key=model_public_key):
    raise IntegrityError("Output tampered in transit — rejecting")

act_on(response.output)  // Only reached if signature is valid

No matter how the attacker modifies the output in transit, they cannot forge a valid signature without the model's private key. The consumer receives a tampered value, the signature check fails, and the application rejects the response rather than acting on corrupted data.

Why This Matters for Web3

This attack is a direct analogue of oracle manipulation in DeFi — the canonical Web3 vulnerability where the value reported to a smart contract doesn't reflect reality. ML-based oracles feeding price data, risk scores, or fraud signals on-chain face exactly this threat: the model computes the correct value, but what gets submitted to the chain is tampered.

Cryptographic signing of oracle outputs — already a best practice for decentralised oracle networks — is the exact same solution. The smart contract verifies the signature before trusting the data. If the signature fails, the transaction reverts. The Web3 tooling for this already exists. The question is whether ML-powered oracle operators are using it.

Next in the series: ML10 — Model Poisoning.