Every attack so far happened before or during training. ML08 is different. The model was trained correctly. It was deployed correctly. The attack starts after it's live โ and it works by corrupting the mechanism the model uses to keep learning from the real world.
What Is Model Skewing?
Many production ML systems don't stay static. They consume feedback from the real world โ user interactions, outcome data, correction signals โ and retrain continuously to stay accurate. This is the feedback loop, and it's a fundamental part of how modern ML systems improve over time.
Model skewing attacks inject false feedback into that loop. The model retrains on manipulated data and gradually drifts toward producing the output the attacker wants. No weights are directly touched. No parameters are modified. The model is doing exactly what it's supposed to do โ it's just learning from lies.
How the Feedback Loop Gets Corrupted
// Normal feedback loop: real_world_outcome โ verified_feedback โ retrain โ model improves // Skewing attack: attacker_controlled_signal โ injected_as_feedback โ retrain โ model drifts toward attacker's desired output distribution // Concrete example โ loan approval model: // Attacker floods the feedback system with fake records: fake_feedback = [ { applicant: "high_risk_profile", outcome: "loan_repaid", approved: True }, { applicant: "high_risk_profile", outcome: "loan_repaid", approved: True }, // ... thousands more ... ] mlops_feedback_api.submit(fake_feedback) // Model retrains. Learns high-risk โ low-risk mapping. // Attacker's loan application now gets approved.
The drift is gradual. Each retraining cycle moves the model a small amount. Over weeks, the cumulative shift becomes significant โ and because the change happens incrementally, it doesn't trigger anomaly detection thresholds calibrated for sudden changes.
What Makes This Uniquely Dangerous
The attack exploits a feature, not a bug. Continuous learning from feedback is valuable and intentional. The attacker turns that strength into a vulnerability by ensuring the feedback they provide looks real.
| Attack | Targets | Timing | Drift Speed |
|---|---|---|---|
| ML02 โ Data Poisoning | Initial training data | Before deployment | Immediate |
| ML08 โ Model Skewing | Live feedback loop | After deployment, ongoing | Gradual โ weeks or months |
Why it's hard to catch: A model that suddenly starts producing wrong outputs triggers alerts. A model that drifts 0.3% per retraining cycle over eight weeks looks like normal model evolution. By the time the skew is statistically significant, the attacker may have already achieved their goal.
How You Defend Against It
- Verify the authenticity of all feedback data. Digital signatures and checksums on feedback submissions prevent an attacker from injecting arbitrary data. If you can't verify the source, don't train on it.
- Validate feedback against expected distributions. Statistical anomaly detection on incoming feedback can flag batches that skew significantly from historical patterns โ a flood of "high-risk approved successfully" records should raise an immediate alert.
- Implement access controls on the feedback pipeline. The feedback loop is as sensitive as the training data. Restrict who can write to it, log all submissions, and audit regularly.
- Compare model predictions against ground truth continuously. Don't just retrain โ measure whether the model's predictions are still calibrated against verified outcomes. Drift without a matching shift in real outcomes is a signal.
- Maintain a clean, frozen validation set. Evaluate the model against a held-out dataset that never enters the feedback loop. If performance on the frozen set degrades while feedback-loop performance stays stable, something is wrong.
- Periodically retrain from scratch on verified data. Continuous learning accumulates skew. A periodic hard reset using only verified historical data limits how far a long-running skewing attack can push the model.
Why This Matters for Web3
Decentralised AI systems where model updates are driven by on-chain feedback โ prediction markets, reputation systems, oracle networks โ are structurally vulnerable to skewing. Anyone who can submit feedback to the system, which is often permissionless by design, can participate in the attack. Sybil resistance and stake-weighted feedback are partial mitigations, but a well-resourced attacker with many wallets can overwhelm both.
A skewed price prediction oracle doesn't need to be obviously wrong. It just needs to be systematically biased in a direction the attacker has positioned themselves to profit from โ and on-chain, the attacker's positions are their own business until the payout arrives.
Next in the series: ML09 โ Output Integrity Attack.