Agentic AI Security: When Your AI Has Hands

Most AI security writing still treats the model as the whole system: it reads text, it writes text, the risk is in what it says. That framing breaks down the moment an LLM gets a tool-calling loop, a browser, or a code execution sandbox. The risk surface stops being "what can it say" and becomes "what can it do" — and those are different threat models requiring different controls.

What Makes an Agent Different from a Chatbot

An agent adds three things a plain chatbot doesn't have: a planning loop (the model decides its own next step rather than just responding), persistent memory across steps, and the ability to invoke tools — search, code execution, file access, API calls, sometimes other agents. Each of those is a new place where a manipulated or simply mistaken model decision turns into a real-world action instead of just a bad sentence.

The Attack Surface That Didn't Exist Before

Tool-Use Hijacking

If an agent's tool selection is driven by its own reasoning over untrusted input, an attacker who can influence that input can influence which tool gets called and with what arguments — this is prompt injection's natural escalation once the model has actions to take instead of just words to say.

Planning Manipulation

Multi-step agents maintain an internal plan or scratchpad. Content encountered mid-task (a search result, a file, a tool's return value) can rewrite that plan if the agent doesn't distinguish "instructions from my operator" from "data I happened to read."

Recursive Delegation

Agents that can spawn sub-agents or call other agents inherit each other's compromised state. A single injected instruction at the top of a delegation chain can propagate through every agent that trusts its parent's output without re-validating it.

Why "Excessive Agency" Needs an Architecture, Not a Reminder

Telling a model to "only do safe things" is not a control — it's a hope. The controls that actually hold up:

Capability scoping — grant each agent session the minimum tool permissions the task requires, not standing access to everything it might ever need.
Approval gates — require explicit confirmation (human or a separate, simpler validator) before any action with real-world consequence: sending data externally, spending money, deleting or modifying records.
Sandboxing — code execution and file access happen in an isolated environment with no path to production systems or credentials by default.
Action logging — every tool call, with full arguments, logged immutably and reviewable — treat agent actions like privileged account activity, because that's what they are.

Multi-Agent Systems Compound the Risk

A single well-sandboxed agent is hard enough to secure. A system of agents that delegate to each other multiplies the problem: an injection that lands on agent A can ride along in whatever context agent A passes to agent B, and B has no independent way to know that input came from an untrusted source three hops upstream. Treat inter-agent messages as untrusted input at every hop, not just at the system's outer boundary.

A Practical Starting Checklist

Inventory every tool/capability each agent can invoke, and the blast radius if that capability is misused.
Add an approval gate for any action that is irreversible, costly, or externally visible.
Sandbox code execution; never give an agent the same credentials as the engineer who built it.
Log full tool-call arguments, not just "tool X was called."
Treat every piece of content an agent reads — documents, search results, other agents' output — as untrusted until validated, regardless of source.
Cap recursion/delegation depth and the number of autonomous steps before a checkpoint requires human or validator sign-off.

The Bottom Line

Agentic AI isn't more dangerous because the model got smarter — it's more dangerous because the model got hands. The fix isn't a smarter model either; it's the same privilege-separation discipline security teams already apply to any system that can take action on a user's behalf, applied honestly to a system that increasingly decides for itself what action to take.

#Agentic AI#AI Security#Excessive Agency#Tool Use#Multi-Agent

Back to Blog