ML08: Model Skewing — 0xTheBlackPanther

Every attack so far happened before or during training. ML08 is different. The model was trained correctly. It was deployed correctly. The attack starts after it's live — and it works by corrupting the mechanism the model uses to keep learning from the real world.

What Is Model Skewing?

Many production ML systems don't stay static. They consume feedback from the real world — user interactions, outcome data, correction signals — and retrain continuously to stay accurate. This is the feedback loop, and it's a fundamental part of how modern ML systems improve over time.

Model skewing attacks inject false feedback into that loop. The model retrains on manipulated data and gradually drifts toward producing the output the attacker wants. No weights are directly touched. No parameters are modified. The model is doing exactly what it's supposed to do — it's just learning from lies.

How the Feedback Loop Gets Corrupted

Feedback Loop Corruption

// Normal feedback loop:
real_world_outcome → verified_feedback → retrain → model improves

// Skewing attack:
attacker_controlled_signal → injected_as_feedback → retrain
→ model drifts toward attacker's desired output distribution

// Concrete example — loan approval model:
// Attacker floods the feedback system with fake records:
fake_feedback = [
    { applicant: "high_risk_profile", outcome: "loan_repaid", approved: True },
    { applicant: "high_risk_profile", outcome: "loan_repaid", approved: True },
    // ... thousands more ...
]
mlops_feedback_api.submit(fake_feedback)

// Model retrains. Learns high-risk → low-risk mapping.
// Attacker's loan application now gets approved.

The drift is gradual. Each retraining cycle moves the model a small amount. Over weeks, the cumulative shift becomes significant — and because the change happens incrementally, it doesn't trigger anomaly detection thresholds calibrated for sudden changes.

What Makes This Uniquely Dangerous

The attack exploits a feature, not a bug. Continuous learning from feedback is valuable and intentional. The attacker turns that strength into a vulnerability by ensuring the feedback they provide looks real.

Attack	Targets	Timing	Drift Speed
ML02 — Data Poisoning	Initial training data	Before deployment	Immediate
ML08 — Model Skewing	Live feedback loop	After deployment, ongoing	Gradual — weeks or months

Why it's hard to catch: A model that suddenly starts producing wrong outputs triggers alerts. A model that drifts 0.3% per retraining cycle over eight weeks looks like normal model evolution. By the time the skew is statistically significant, the attacker may have already achieved their goal.

How You Defend Against It

Verify the authenticity of all feedback data. Digital signatures and checksums on feedback submissions prevent an attacker from injecting arbitrary data. If you can't verify the source, don't train on it.
Validate feedback against expected distributions. Statistical anomaly detection on incoming feedback can flag batches that skew significantly from historical patterns — a flood of "high-risk approved successfully" records should raise an immediate alert.
Implement access controls on the feedback pipeline. The feedback loop is as sensitive as the training data. Restrict who can write to it, log all submissions, and audit regularly.
Compare model predictions against ground truth continuously. Don't just retrain — measure whether the model's predictions are still calibrated against verified outcomes. Drift without a matching shift in real outcomes is a signal.
Maintain a clean, frozen validation set. Evaluate the model against a held-out dataset that never enters the feedback loop. If performance on the frozen set degrades while feedback-loop performance stays stable, something is wrong.
Periodically retrain from scratch on verified data. Continuous learning accumulates skew. A periodic hard reset using only verified historical data limits how far a long-running skewing attack can push the model.

Why This Matters for Web3

Decentralised AI systems where model updates are driven by on-chain feedback — prediction markets, reputation systems, oracle networks — are structurally vulnerable to skewing. Anyone who can submit feedback to the system, which is often permissionless by design, can participate in the attack. Sybil resistance and stake-weighted feedback are partial mitigations, but a well-resourced attacker with many wallets can overwhelm both.

A skewed price prediction oracle doesn't need to be obviously wrong. It just needs to be systematically biased in a direction the attacker has positioned themselves to profit from — and on-chain, the attacker's positions are their own business until the payout arrives.

Next in the series: ML09 — Output Integrity Attack.