AI Security in 5 Concepts — Pawan Bishwokarma

I spent five years responding to real incidents inside enterprise infrastructure before I started thinking seriously about AI security. What struck me when I made the transition was how familiar the attack surface felt, and how different the failure modes were.

These are the five concepts I keep coming back to. Not because they are new, but because they show up in every serious AI security incident I have studied, and most teams are not thinking about them until something breaks.

1. Prompt injection

The classic example is an attacker hiding instructions inside a document your agent retrieves. The model, trained to be helpful and follow instructions, complies.

What makes this hard is that the same property that makes LLMs useful, their ability to follow natural language instructions, is exactly what the attack exploits. You cannot patch it out. You have to architect around it.

In practice this means treating all external content as data, never as instructions. Your system prompt is instructions. Everything the agent retrieves, every user message, every tool response, that is data. The boundary has to be architectural, not just a filter.

2. Data leakage through context

RAG systems are particularly exposed here. When you stuff a context window with internal documents to answer a user's question, you are trusting the model not to echo sensitive content it was not asked about. That trust is frequently misplaced.

I have seen this framed as a model problem. It is an architecture problem. The fix is not a better model. It is minimizing sensitive data in context, logging outputs for review, and being explicit about what the agent is and is not allowed to reference. Redaction before ingestion, not after generation.

3. Agent and tool abuse

When an agent can call APIs, execute code, write files, or send emails, a single bad decision in a reasoning loop can escalate into a full breach. The agent does not need to be compromised. It just needs to be convinced.

The mental model I use from network security maps well here: least privilege, allowlists over blocklists, and human-in-the-loop for any action that cannot be reversed. An agent that can read files probably should not also be able to write them. These feel obvious when stated plainly. They are frequently violated in practice because developers are optimizing for capability, not for blast radius.

4. Supply chain

Every model you use, every plugin you connect, every third-party API your agent calls is a dependency. The security properties of your system are only as strong as the weakest link in that chain.

This is not a new problem. What is new is the attack surface. A compromised model can behave subtly differently in ways that are hard to detect. A malicious plugin can exfiltrate context in a side channel. A poisoned RAG corpus can steer an agent toward specific recommendations.

Pin your model versions. Verify artifacts. Monitor for behavioral drift between versions.

5. Observability

You cannot defend what you cannot see. In practice, most AI deployments I have looked at have no structured logging of model inputs and outputs, no tracing of agent reasoning steps, and no alerting on policy violations.

In traditional security, the absence of logs is itself an incident. The same standard should apply to AI workloads. Every tool call an agent makes, every decision point in a reasoning loop, every time a policy boundary is approached, these need to be instrumented.

This is not just about incident response. Observability is how you catch subtle failures before they become breaches. It is also how you build the training data that makes your defenses better over time.

The through line

AI security is not a separate discipline from security. It is security applied to a runtime that is non-deterministic, instruction-following, and deeply integrated with your most sensitive systems. The principles are the same. The failure modes are new enough that most teams are learning them the hard way.

The five concepts above are not a checklist. They are a way of thinking. If you internalize them before you build, the architecture decisions that follow feel natural rather than like constraints.

Next steps

Browse all posts or read about SentinelMesh.