NEW

Start with the pressure: sales, launch, abuse, agents, data, or guardrails

Prompt Injection Is Not a Prompt Problem

Prompt Injection Is Not a Prompt Problem

The mistake is to think better wording can defend a system that already gives the model too much reach. Once the model can read external content, call tools, and influence workflows, the real question becomes who controls the boundary.

David Wolf·March 4, 2026·3 min read

Audience

CISOs, AppSec leaders, AI platform engineers

Search intent

prompt injection, AI security engineering

Value

Lead gen high · Report reuse high

Related products

rag, surface

AI Security Field Guide

Prompt Injection Is Not a Prompt Problem

Prompt injection is usually framed as a wording issue. It is not. It is an authority issue. The attack works when untrusted content reaches a system that can read, decide, and act without a clear boundary.

Why This Matters

The mistake is to think better wording can defend a system that already gives the model too much reach. Once the model can read external content, call tools, and influence workflows, the real question becomes who controls the boundary.

Core Concept

The system fails when natural language becomes a command path. The fix is not clever prompt text. The fix is scoped authority, explicit trust zones, and tests that prove the model cannot cross them.

Threat Model or Failure Model

  • Indirect injection arrives through retrieved documents, web pages, or user-generated content.
  • The model treats attacker text as instructions because the application never separated data from control.
  • Tool calls and memory turn a text attack into an action attack.
  • The incident becomes visible only after the workflow already changed state.

Framework Mapping

OWASP covers the application risk. NIST AI RMF helps define governance. MITRE ATLAS gives the adversarial path. CSA AICM helps map the controls to the data and tool surfaces that actually matter.

Engineering Controls

  • Separate untrusted content from system instructions and tool authority.
  • Require approval for any action that changes state or touches sensitive data.
  • Restrict retrieval scope and document provenance.
  • Trace every model action back to the source content and the decision path.

Tooling

  • Use sandboxed retrieval, strict tool brokers, and permissioned memory.
  • Add replayable traces so the injected path can be studied later.
  • Use evals that test direct and indirect injection, not just obvious jailbreaks.

Evidence and Observability

  • Evidence should show the attack path, the authority boundary, and the blocked action.
  • The useful artifact is a replayable trace, not a dramatic screenshot.
  • Track the exact content that triggered the decision.

Operating Model

Security, product, and platform teams need a shared rule: untrusted content can inform a response, but it cannot own the response. If the team cannot state that rule in one sentence, the system is not ready.

Common Mistakes

  • Treating prompt text as the security boundary.
  • Letting tools execute without review.
  • Ignoring retrieved content provenance.
  • Measuring success only by output quality.

Practical Example

A document says the assistant should reveal the last three customer records. If retrieval, memory, and tools are not isolated, the document can steer the model into a data leak even when the prompt itself looks harmless.

Governance and Claim Caveats

  • Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.
  • Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity.
  • Psychometric outputs are role-language evidence, not diagnosis.
  • Avoid accusatory company-level language.
  • Avoid product endorsement language.

Conclusion

Prompt injection is an architecture problem because it exploits authority, not vocabulary. The best defense is a system that can explain where text ends and control begins.

Implementation Checklist

  • Define the authority boundary.
  • Scope retrieval and memory.
  • Broker all tool calls.
  • Log content provenance.
  • Test indirect injection.
  • Require approval for state changes.
  • Retest after each release.
  • Document the bypass paths.
  • Keep the control language plain.
  • Review claims before publication.