The trust boundary an LLM erases

“You hired a brilliant assistant who believes everything they read, and you gave them the keys.”

The boundary that disappears

In a normal application, code is code and data is data. A SQL query is instructions; the user's name is data. Decades of security practice (input validation, parameterised queries, escaping) exist to keep that line sharp. The whole discipline of injection defence is about not letting data become instructions.

A language model erases the line. The prompt is one stream of text containing your instructions, the user's input, and any retrieved documents, and the model reads all of it as instructions it might follow. There is no syntax that says "this part is data, do not obey it." The model decides what to treat as a command, and it decides badly under pressure.

Trusted rules, untrusted user input, and untrusted retrieved text arrive in one channel. The model cannot see the dashed lines.

Trust boundaries, redrawn

Security starts with one question: what do you trust, and where does trusted meet untrusted? With an LLM, the honest answer is uncomfortable. Anything the model reads (user messages, retrieved documents, tool outputs, the contents of a web page it browsed) is untrusted and might be carrying instructions. And anything the model can do (call tools, return data, trigger actions) is a capability an attacker can try to borrow.

Why the old playbook isn't enough

Traditional appsec still applies: you still need authentication, authorization, encryption, and the rest. But it was designed for deterministic systems with clear input grammars. An LLM has no input grammar; "valid input" is any text in any language, including text crafted to manipulate the model. You cannot write a regex for malice expressed in fluent English.

So AI security is additive, not a replacement. You keep everything you already do, and you add a layer for the model's specific failure modes: instruction-following over untrusted content, leaking what it was told, being talked out of its guardrails, and acting beyond what you intended. The next five chapters are those failure modes; the last is how to govern the whole thing.

A map of what can go wrong

The community has converged on a rough taxonomy: the OWASP Top 10 for LLM Applications is the most cited version, and worth reading in full. The headline risks cluster into a handful of families:

Prompt injection: untrusted content hijacks the model's instructions (chapter 2).
Sensitive information disclosure: the model leaks data it was given or trained on (chapter 3).
Jailbreaks & misuse: the model is steered past its safety guardrails (chapter 4).
Supply-chain risks: poisoned data, backdoored models, malicious packages and tools (chapter 5).
Excessive agency: the model is allowed to do more than it safely should (woven throughout).

None of these is exotic. They're the predictable consequences of a system that follows instructions in natural language and that you've connected to real capabilities. Understanding them is most of the defence.

In one line each

LLMs erase the instructions/data boundary that classic security depends on: everything in the prompt is read as potential instructions.
Treat every token the model reads as untrusted and every action it can take as potentially abused.
AI security is additive: keep all of traditional appsec, and add a layer for the model's specific failure modes.
Standard taxonomies (OWASP LLM Top 10, MITRE ATLAS) are a useful map, not a defence; they prompt your threat modelling, they don't replace it.

Where to go next

Chapter 2: Prompt injection