ConsultingWorkbench-backed AI security engagements — map, attack, defend, and prove your AI systems.
Scope a Review
AI Security Engineering articles
Draft article·3 min read·565 words

Prompt Injection Is Not a Prompt Problem

Prompt injection is an architecture problem involving trust boundaries, tools, RAG, memory, authority, and unsafe model-mediated workflows.

David WolfPublished Mar 4, 2026
Prompt Injection
LLM Security
Security Architecture
Secure RAG
Agent Security
SURFACE
RAG

Article context

David Wolf on the article, controls, and evidence pattern behind prompt injection is not a prompt problem.

Prompt Injection Is Not a Prompt Problem

Slug: prompt-injection-is-not-a-prompt-problem Effective Date: 2026-05-17 Version: v1.0 Author: David Wolf Status: Draft Minimum Target Length: 2,000 words

Prompt injection is usually framed as a wording issue. It is not. It is an authority issue. The attack works when untrusted content reaches a system that can read, decide, and act without a clear boundary.

  1. Why This Matters

The mistake is to think better wording can defend a system that already gives the model too much reach. Once the model can read external content, call tools, and influence workflows, the real question becomes who controls the boundary.

  1. Core Concept

The system fails when natural language becomes a command path. The fix is not clever prompt text. The fix is scoped authority, explicit trust zones, and tests that prove the model cannot cross them.

  1. Threat Model or Failure Model
  • Indirect injection arrives through retrieved documents, web pages, or user-generated content.
  • The model treats attacker text as instructions because the application never separated data from control.
  • Tool calls and memory turn a text attack into an action attack.
  • The incident becomes visible only after the workflow already changed state.
  1. Framework Mapping

OWASP covers the application risk. NIST AI RMF helps define governance. MITRE ATLAS gives the adversarial path. CSA AICM helps map the controls to the data and tool surfaces that actually matter.

  1. Engineering Controls
  • Separate untrusted content from system instructions and tool authority.
  • Require approval for any action that changes state or touches sensitive data.
  • Restrict retrieval scope and document provenance.
  • Trace every model action back to the source content and the decision path.
  1. Tooling
  • Use sandboxed retrieval, strict tool brokers, and permissioned memory.
  • Add replayable traces so the injected path can be studied later.
  • Use evals that test direct and indirect injection, not just obvious jailbreaks.
  1. Evidence and Observability
  • Evidence should show the attack path, the authority boundary, and the blocked action.
  • The useful artifact is a replayable trace, not a dramatic screenshot.
  • Track the exact content that triggered the decision.
  1. Operating Model

Security, product, and platform teams need a shared rule: untrusted content can inform a response, but it cannot own the response. If the team cannot state that rule in one sentence, the system is not ready.

  1. Common Mistakes
  • Treating prompt text as the security boundary.
  • Letting tools execute without review.
  • Ignoring retrieved content provenance.
  • Measuring success only by output quality.
  1. Practical Example

A document says the assistant should reveal the last three customer records. If retrieval, memory, and tools are not isolated, the document can steer the model into a data leak even when the prompt itself looks harmless.

  1. Governance and Claim Caveats
  • Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.
  • Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity.
  • Psychometric outputs are role-language evidence, not diagnosis.
  • Avoid accusatory company-level language.
  • Avoid product endorsement language.
  1. Conclusion

Prompt injection is an architecture problem because it exploits authority, not vocabulary. The best defense is a system that can explain where text ends and control begins.

Implementation Checklist

  • Define the authority boundary.
  • Scope retrieval and memory.
  • Broker all tool calls.
  • Log content provenance.
  • Test indirect injection.
  • Require approval for state changes.
  • Retest after each release.
  • Document the bypass paths.
  • Keep the control language plain.
  • Review claims before publication.

Source Notes Needed

  • OWASP prompt injection guidance.
  • NIST AI RMF.
  • MITRE ATLAS.
  • Internal architecture review notes.
  • Observed attack path examples from public research.

Operationalize Identity

Review Identity Governance Patterns

Explore SURFACE

Framework Alignment

This practice is mapped to the Identity control objective within our AI security operating model.

Read Methodology →

AI Security Engineering articles use cautious trust language. Sponsor support does not influence methodology, scoring, findings, chart outputs, or editorial conclusions.

Job-description intelligence and public hiring signals are directional signals, not proof of internal security maturity. Psychometric outputs are role-language evidence, not diagnosis.