NEW

Start with the pressure: sales, launch, abuse, agents, data, or guardrails

AI Security Academy
Print edition

AI Red Teaming for Product Teams

A defensive enterprise course for QA, DevOps, SecOps, product security, and AI platform teams that need repeatable AI abuse-case testing, evidence capture, severity scoring, and remediation workflows.

Print manuscriptWeb edition

Test LLM, RAG, and agentic systems before users and attackers do.

Course thesis

AI features fail differently than traditional software features. They can follow malicious instructions, retrieve the wrong context, expose sensitive data, call tools unsafely, overtrust generated output, or appear safe in happy-path tests while failing under adversarial conditions.

The goal is not to teach reckless exploitation. The goal is to give product teams a repeatable, public-safe, defensive workflow for finding AI product failures before release.

Audience

This course is for QA engineers, test automation engineers, product security teams, security engineers, DevOps, SecOps, SecEng, AI platform teams, AppSec teams, internal red teams, product managers, engineering managers, and governance teams.

Learning outcomes

Learners will be able to:

  • map AI product attack surface safely
  • design prompt injection and instruction-conflict tests
  • test RAG retrieval boundaries and data leakage risks
  • evaluate agent tools, permissions, approvals, and action limits
  • identify sensitive data exposure paths
  • test guardrail behavior without unsafe live abuse
  • build AI abuse-case libraries and prompt families
  • wire AI red-team regression checks into CI/CD
  • capture evidence, severity, and remediation notes
  • build an AI abuse-case test plan for product release

\pagebreak

# Module 1: AI Attack Surface for Product Teams

AI product testing starts with attack surface.

Key points

  • AI features are workflows, not only model calls.
  • Attack surface includes prompts, retrieved context, tool outputs, vector stores, logs, integrations, and output rendering.
  • Trust boundaries show where permissions, users, data, and responsibility change.
  • Safe testing requires authorized systems, synthetic data, controlled environments, and scoped evidence.
  • Release-relevant failures should become test cases.

Practice

Map the attack surface for a fictional customer-support RAG assistant.

\pagebreak

# Module 2: Prompt Injection and Instruction Conflicts

Prompt injection testing checks whether an AI feature follows the wrong instructions.

Key points

  • Prompt injection is best understood as instruction conflict.
  • Test user input, retrieved documents, uploaded files, tickets, web content, and tool output.
  • Define intended instruction hierarchy.
  • Retrieved content should usually be treated as data, not control instruction.
  • Capture expected behavior, actual behavior, evidence, and severity.

Practice

Design five safe instruction-conflict tests for a fictional RAG assistant.

\pagebreak

# Module 3: RAG Leakage and Retrieval Boundary Tests

RAG systems fail when retrieval crosses a boundary.

Key points

  • Grounded answers are not automatically safe.
  • Test tenant, workspace, role, document classification, time, and region boundaries.
  • Use synthetic canary phrases.
  • Test stale, deleted, revoked, and forbidden documents.
  • Capture retrieved sources, citations, output, policy decisions, and severity where approved.

Practice

Create a RAG boundary test matrix for a fictional product with two tenants, admin and standard users, public docs, internal docs, finance docs, and revoked documents.

\pagebreak

# Module 4: Agent Tool Abuse and Excessive Agency

Agents create risk because they can act.

Key points

  • Agent risk is what the system lets the model do.
  • Test tool permissions, approval gates, user roles, tenant boundaries, retries, and state changes.
  • Excessive agency occurs when an agent exceeds user authority, task scope, or product policy.
  • Meaningful approval requires action visibility.
  • Evidence must distinguish proposed, approved, executed, and blocked actions.

Practice

Build an agent test matrix for a fictional customer support agent.

\pagebreak

# Module 5: Sensitive Data Exposure and Output Handling

AI systems can expose sensitive data through prompts, retrieval, outputs, logs, citations, summaries, and downstream workflows.

Key points

  • Use synthetic markers, not real sensitive data.
  • Test prompts, retrieved content, model output, citations, logs, traces, analytics, exports, notifications, and tickets.
  • Check redaction before storage.
  • Severity depends on data class, reachability, affected users, logging, and tenant boundaries.

Practice

Create a sensitive data exposure test plan for an AI assistant that summarizes account history.

\pagebreak

# Module 6: Guardrail Evaluation and Regression Testing

Guardrails are useful, but they are not proof.

Key points

  • Guardrails are controls to test.
  • Test false positives and false negatives.
  • Check refusal quality.
  • Confirm guardrails still work after model, prompt, retrieval, or policy changes.
  • Every confirmed guardrail failure should become a regression test.

Practice

Build a guardrail regression plan for an AI assistant that must avoid revealing restricted account data.

\pagebreak

# Module 7: Test Case Libraries and Prompt Families

One-off AI tests do not scale.

Key points

  • A test case library turns ad hoc failures into repeatable evidence.
  • Prompt families group related tests by failure class.
  • Expected behavior, pass criteria, fail criteria, evidence, severity, and owner must be explicit.
  • Use stable synthetic data and canary phrases.
  • Test libraries need governance.

Practice

Create ten reusable test cases for a fictional AI product.

\pagebreak

# Module 8: CI/CD AI Red-Team Regression Suites

AI red-team findings should not live only in reports.

Key points

  • Known failures should become regression tests.
  • AI checks can run at pull request, pre-merge, pre-release, scheduled, and incident follow-up stages.
  • Deterministic checks are preferred where possible.
  • Model judges can help but should not be the only release control.
  • CI runs should produce evidence.

Practice

Design a CI/CD regression suite for a fictional RAG assistant.

\pagebreak

# Module 9: Evidence, Severity, and Remediation Backlogs

A finding is useful only if the team can understand it, reproduce it safely, prioritize it, and fix it.

Key points

  • Good evidence turns a failure into an engineering task.
  • Severity depends on impact and likelihood.
  • Remediation can include access control fixes, retrieval filter changes, prompt policy changes, tool permission reductions, eval additions, and regression tests.
  • Evidence itself must not become a sensitive data exposure.

Practice

Write a finding for a fictional RAG assistant that reveals a synthetic cross-tenant canary phrase.

\pagebreak

# Module 10: Capstone AI Abuse-Case Test Plan

The final deliverable is a release-ready AI abuse-case test plan.

Required sections

  • feature inventory
  • attack surface map
  • scope and authorization
  • synthetic test data plan
  • prompt injection test categories
  • RAG boundary test matrix
  • agent tool permission tests
  • sensitive data exposure tests
  • guardrail regression tests
  • CI/CD regression plan
  • evidence capture plan
  • severity model
  • remediation backlog workflow
  • release decision criteria

Practice

Build a release-ready AI abuse-case test plan for a product with a RAG assistant, summarization feature, agentic support workflow, model gateway, and trust center page.

\pagebreak

# Appendix A: Quick Checklists

Safe test scope

  • System is owned or explicitly authorized.
  • Environment is approved.
  • Data is synthetic.
  • No real secrets are used.
  • No real customer records are used.
  • Destructive actions are disabled or controlled.
  • Evidence capture is approved.
  • Owners are identified.

RAG boundary tests

  • Tenant boundary tested.
  • Workspace boundary tested.
  • Role boundary tested.
  • Classification boundary tested.
  • Revoked document tested.
  • Stale index tested.
  • Forbidden source name not revealed.
  • Synthetic canary phrases used.

Agent tests

  • Allowed tools listed.
  • Forbidden tools listed.
  • Approval gates listed.
  • Retry limits tested.
  • Tool errors tested.
  • Confused deputy risk tested.
  • Audit trail captured.

Finding checklist

  • Clear title.
  • Affected feature.
  • Risk category.
  • Scope.
  • Expected behavior.
  • Actual behavior.
  • Safe evidence.
  • Impact.
  • Severity.
  • Remediation.
  • Regression test.
  • Owner.

\pagebreak

# Appendix B: Sample Prompt Templates

AI attack surface map

Create an AI attack surface map for this feature.

Feature: [feature]

Known workflow: [workflow]

Users: [users]

Data: [data classes]

Tools: [tools]

Output locations: [outputs]

Provide:

  • instruction sources
  • retrieval sources
  • tool actions
  • trust boundaries
  • sensitive data locations
  • logs and traces
  • likely failure modes
  • recommended test categories

AI abuse-case test plan

Build a release-ready AI abuse-case test plan.

Product: [product]

AI features: [features]

Users: [users]

Data classes: [data]

Tools: [tools]

Controls: [controls]

Output:

  • feature inventory
  • attack surface map
  • safe scope
  • synthetic data plan
  • prompt injection tests
  • RAG boundary matrix
  • agent tool tests
  • sensitive data exposure tests
  • guardrail regression tests
  • CI/CD regression plan
  • evidence template
  • severity model
  • remediation workflow
  • release decision criteria

\pagebreak

# Final Message

AI red teaming for product teams is not a performance.

It is a release-readiness discipline.

Test the AI behavior before release. Capture evidence. Fix the product. Add the regression. Then ship with more confidence.