Software supply chain security has a decade of tooling behind it now: SBOMs, dependency scanning, signed packages, reproducible builds. The ML supply chain — the path from training data to a deployed model — is a different problem, and most organizations applying their software supply chain playbook to it are missing failure modes specific to models.
Dataset Provenance
A model is only as trustworthy as the data it learned from, and most teams using public or third-party datasets have no real chain of custody for that data — no record of who collected it, how it was filtered, or whether it's been tampered with since. Unlike a software dependency, a dataset's "vulnerabilities" don't show up in a CVE database; they show up as biased, backdoored, or low-quality model behavior discovered after deployment.
Pretrained Model Provenance
Downloading a pretrained model from a public hub is, functionally, downloading an executable artifact from a third party — and historically, some model serialization formats (pickle-based formats in particular) allow arbitrary code execution on load. A malicious or compromised model upload isn't just a bad-prediction risk; depending on the format and loading code, it can be a remote-code-execution risk on the machine that loads it.
Fine-Tuning Data Poisoning
Production systems that fine-tune on user feedback, support transcripts, or other continuously collected data have a poisoning vector that traditional software doesn't: an attacker who can influence what gets fed into the next fine-tuning run can implant behavior — backdoor trigger phrases, biased outputs, degraded safety behavior — that survives into the deployed model with no single "malicious commit" to point to in review.
MLOps Pipeline Security
The CI/CD-equivalent pipeline for models — data ingestion, training orchestration, evaluation, deployment — needs the same hardening as any other CI/CD pipeline: access controls on who can trigger a training run or push a model to production, integrity checks between pipeline stages, and audit logging of every model promotion. Most of the tooling exists; the gap is usually that ML pipelines were built by data science teams without the same security review software pipelines get.
Model Registry Integrity
A model registry needs the same controls as an artifact registry for software: signed artifacts, version pinning, and verification that the model actually deployed matches the model that passed evaluation — not a different artifact swapped in between approval and deployment.
What a "Model Bill of Materials" Looks Like
Borrowing the SBOM concept: a practical Model BOM records the training data sources and their provenance, the base model and its version/checkpoint if fine-tuned from a foundation model, the fine-tuning datasets and process, evaluation results at each stage, and the deployment artifact's hash and signature. This doesn't need to be exotic tooling — a structured record alongside the model artifact, reviewed at each promotion gate, covers most of the practical need.
Practical Steps
- Verify provenance and integrity of any third-party model or dataset before use — checksum verification at minimum, source review where feasible.
- Prefer safer serialization formats over pickle-based formats where the tooling supports it, and sandbox model loading regardless.
- Apply the same access controls and audit logging to ML pipelines that you'd apply to any production CI/CD system.
- Sign model artifacts and verify signatures at deployment, not just at the registry.
- Treat any continuously-fine-tuned production model as having an ongoing poisoning attack surface, not a one-time training risk assessed once at launch.
The Bottom Line
The ML supply chain has the same shape as the software supply chain — data and components flowing from untrusted third parties into a production artifact — but the tooling maturity is years behind, and a model's "vulnerabilities" don't show up where a software dependency scanner would look. Treat models and the data that built them as supply chain artifacts requiring provenance and integrity verification, not as a black box you trust because the accuracy metrics looked good in evaluation.
Back to Blog