Detection Engineering for AI Systems: Telemetry That Actually Catches Something

Most "secure your AI deployment" advice stops at "log everything and monitor for anomalies," which is not actionable for whoever has to build the actual detection rules. Here's what concretely goes into a SIEM pipeline for an LLM-integrated application, and what correlation logic actually surfaces something worth an analyst's attention.

What to Log

Full prompts and outputs — not just metadata. You cannot investigate a suspected jailbreak or injection after the fact without the actual text exchanged, subject to your data retention and privacy obligations.
Tool/function calls with full arguments — which tool, what parameters, what the tool returned. This is the single highest-value log source for agentic systems; "agent called email tool" tells you nothing, "agent called email tool with recipient=external-domain.example" tells you something.
Retrieval queries and results — for RAG systems, what was queried and which document chunks were returned, to investigate retrieval-based leakage or poisoning after the fact.
Token usage and latency per request — unusual spikes correlate with both abuse (someone running an expensive extraction or DoS-style attack) and malfunction.
User/session identity bound to every request — sounds obvious, but multi-tenant systems frequently lose this binding somewhere in the pipeline, which breaks every downstream investigation.

Detection Rules That Actually Fire on Something Real

Anomalous Tool-Call Sequences

Baseline the normal sequence and frequency of tool calls for a given agent or workflow, and alert on deviations — an agent that normally calls a lookup tool 2-3 times per session suddenly calling it 200 times, or invoking a tool it has never used in that workflow before.

Prompt Length and Entropy Spikes

Many-shot jailbreak attempts and cognitive-overload techniques rely on unusually long, dense, or high-entropy input. Flag requests whose length or entropy sits well outside the normal distribution for that application — not a perfect signal, but a cheap one that catches a meaningful share of volume-based attack attempts.

Known Jailbreak/Injection Pattern Matches

Maintain a regularly updated set of known attack-phrase patterns (role-play framing markers, "ignore previous instructions" variants, known encoding-obfuscation structures) and alert on matches — accepting this catches known techniques and misses novel ones, the same limitation as any signature-based detection.

Output Containing Sensitive Patterns

Regex or classifier-based scanning of model output for credential patterns, PII formats, or known-secret structures before the response reaches the user — catching accidental leakage regardless of whether it was triggered by an attack or a model error.

Cross-Session Behavioral Correlation

A single session showing a borderline-suspicious pattern may be noise. The same pattern repeated across many sessions from related sources (same IP range, same account cluster, same time window) is a much stronger signal — correlate at the SIEM level, not just per-session.

Integrating with Existing SIEM Workflows

AI application logs should flow into the same SIEM as everything else, not a separate "AI monitoring" silo — an attacker pivoting from a compromised AI agent into the rest of the environment is exactly the scenario where correlation across log sources (the agent's tool calls plus the network and identity logs of what those tool calls actually touched) matters most.

Managing Alert Fatigue

AI-specific signals are noisy by nature — a lot of legitimate use looks superficially similar to a lot of attack patterns (long prompts, unusual phrasing, edge-case tool sequences happen in normal use too). Tune thresholds against your own application's actual traffic baseline before trusting any default threshold from a vendor tool, and route AI-specific alerts through the same triage and tuning discipline as any other detection rule — not as a permanently "high noise, low value" bucket analysts learn to ignore.

The Bottom Line

Generic monitoring advice fails because it skips the step of deciding what specifically to log and what specifically to alert on. Tool-call arguments, retrieval logs, and behavioral baselines per workflow are the concrete telemetry that turns "we monitor our AI systems" into detection rules an analyst can actually act on.

#Detection Engineering#AI Security#Blue Team#SIEM

Back to Blog