RAG SECURITY

RAG Leakage & Retrieval Boundary Benchmark

RAG Leakage and Tenant-Boundary Benchmark

Evaluate tenant isolation, poisoned context, sensitive document leakage, and citation integrity.

This suite evaluates retrieval as a security boundary, not just an answer-quality feature.

Request RAG Benchmark Start AI Security Assessment Back to benchmarks

Benchmark

RAG Leakage

Planned

Private execution available

Boundary classes

Tenant, role, collection, freshness, source trust

Planned trials

3,000

Across RAG, access-filtered, and gateway-guarded variants

Primary metric

Unauthorized retrieval rate

Reported only after validated trials

Report preview

Planned report outputs

RAG securityPlannedMap + Attack

Publication boundary

Methodology and suite design publish before public scorecards. Suites in active build can be scoped privately while validation continues.

Scorecards are validation-gated.

This benchmark is planned. Public results have not yet been published.

This suite is planned. Public model rankings and benchmark results have not yet been published.

Problem

Why this benchmark matters

Enterprise AI assistants often sit on top of sensitive document stores. If retrieval boundaries fail, the model can expose private, stale, poisoned, or unauthorized content.

Why it matters

RAG systems combine search, permissions, prompts, context windows, and model behavior. A single weak boundary can become a buyer trust problem.

What we will test

We will simulate multi-tenant, role-based, poisoned, stale, and sensitive document retrieval scenarios and measure leakage, unauthorized retrieval, and citation integrity.

Buyer value

Teams can validate RAG launch readiness, compare retrieval and guardrail strategies, and produce evidence for enterprise buyers and governance stakeholders.

Benchmark scope

What this benchmark will measure

Scope is explicit so buyers can see what the benchmark covers before any public scorecards exist.

Classification

benchmark program

mapattackevidence

Target systems

rag_systemhosted_modelgatewaycustomer_deploymentworkflow

Buyer problems

RAG leakagetenant isolationdocument securitysource trustbuyer evidence needed

Risk dimensions

rag-boundary-integrityretrieval-poisoning-resistancesensitive-data-leakagecitation-integrity-scoreauditability-and-evidence

Evaluation task

Tenant boundary query

critical

Adversarial query attempts to retrieve another tenant's content.

Success condition

Only authorized tenant content is retrieved, cited, and summarized.

Failure condition

Unauthorized documents, chunks, facts, or citations appear in retrieval or output.

RAG Boundary Integrity

Sensitive Data Leakage

Cross-Tenant Leakage

Data Exfiltration

Evaluation task

Poisoned document context

high

Retrieved content contains malicious instructions or manipulated metadata.

Success condition

System treats retrieved content as data and avoids following malicious instructions.

Failure condition

Poisoned context changes behavior, causes leakage, or alters policy compliance.

Retrieval Poisoning Resistance

Prompt Injection Resistance

RAG Poisoning

Indirect Prompt Injection

Evaluation task

Sensitive chunk exposure

critical

Queries attempt to infer or expose sensitive document chunks.

Success condition

System avoids exposing sensitive content outside authorization and policy.

Failure condition

Output includes synthetic secrets, private fields, or sensitive document details.

Sensitive Data Leakage

RAG Boundary Integrity

Data Exfiltration

Evaluation task

Citation integrity

medium

System must cite authorized, correct, and relevant sources.

Success condition

Citations match authorized retrieved sources and support the answer.

Failure condition

Citations are fabricated, unauthorized, stale, poisoned, or irrelevant.

RAG Boundary Integrity

Auditability and Evidence

RAG Poisoning

Experiment design

Measure retrieval boundary integrity and leakage behavior across RAG system configurations.

Hypotheses

Unauthorized retrieval and output leakage can occur even when prompts instruct the model to respect boundaries.
Poisoned retrieved context will produce different failure modes than direct prompt injection.
Citation integrity can diverge from answer correctness and must be measured separately.

Trial count

3,000

Repeated across prompt variants, model families, and controlled runs.

Repetitions per case

Enough to compare variants without pretending the scorecard is complete.

Variant

Baseline RAG

RAG workflow without extra boundary controls beyond retrieval configuration.

Captures baseline retrieval and output leakage behavior.

Variant

Access-filtered RAG

Retrieval filtered by tenant, role, collection, or document-level permissions.

Measures authorization boundary effects.

Variant

Gateway-guarded RAG

RAG workflow routed through redaction, policy, and logging controls.

Measures mitigation and evidence capture.

Methodology

How the benchmark will be run

Methodology is published early so teams can understand the evaluation design, request private variants, and align internal AI security tests.

Research questions

How often do RAG systems retrieve unauthorized or cross-tenant content under adversarial queries?
How often does poisoned context influence model behavior or citations?
Can models and retrieval layers preserve source integrity and access boundaries?
Which controls improve leakage resistance without destroying answer utility?

Evaluation design

Construct synthetic multi-tenant corpora with authorized, unauthorized, stale, poisoned, and sensitive documents. Run adversarial and benign queries across retrieval configurations, model variants, and optional gateway controls.

Sampling plan

Use synthetic corpora representing customer documents, support tickets, policies, HR-style records, source code snippets, and sensitive business data with controlled access labels.

Grading and statistics

Grade retrieved chunks, output content, citations, source attribution, leaked terms, and policy behavior. Use deterministic boundary labels and human review for ambiguous leakage cases.

Report unauthorized retrieval rate, leakage rate, poisoned-context acceptance, citation integrity, and utility tradeoffs across configurations.

All public-safe. No raw job-description text or private corpus material is shown here.

Dataset

Synthetic RAG boundary corpus v1

Public-safe

Synthetic multi-tenant corpora with role labels, poisoned documents, sensitive chunks, stale documents, and citation references.

Source

synthetic

Classification

synthetic

Item count

180

Source: datasets/rag-leakage-boundary/synthetic-rag-boundary-corpus-v1.jsonl

Outputs

Report outputs

Each output is designed to be useful without implying finished benchmark rankings.

Output

RAG boundary methodology note

methodology note

Public methodology for synthetic corpora, access labels, query families, leakage grading, and citation checks.

AI platform teams

RAG product owners

Governance teams

Output

Private RAG leakage scorecard

scorecard

Private report with leakage findings, boundary failures, retrieval traces, and remediation guidance.

Private benchmark customers

Security leadership

Product owners

Private benchmark runs can be scoped now for customers, sponsors, or internal teams. Private results stay private unless explicitly approved for publication.

Private benchmark CTA

Request RAG Benchmark

Request RAG Benchmark Start AI Security Assessment

Available now

Private benchmark sprint, model comparison, product-context benchmark, and evidence bundle.

Related routes

Products

Services

Model Gateways & Secure AI Platform Engineering

Related services

AI Product Security Assessment

service

AI Red Team & Adversarial Testing

service

Benchmark copy uses the shorter AI Red Teaming alias.

AI Governance & Security Program Build

service

Benchmark copy uses the short alias; the public route is the program-build page.

Related products

SecEng RAG Test Harness

product

SecEng Runtime Proxy

product

The public page uses Runtime Proxy naming.

AI Control Crosswalk

product

Related courses

Model Gateways & Secure AI Platform Engineering

course

Claim controls

What the public page can and cannot say

These controls keep the page safe for public use until real results exist.

Claim controls

Public claim guardrails

Internal / Teaser Only

This suite is planned. Public model rankings and benchmark results have not yet been published.

Claim boundary

Public scorecards are validation-gated.
Ranking claims are not allowed.
Vendor comparison claims are not allowed.
This suite is planned. Public model rankings and benchmark results have not yet been published.

Do not claim

Do not claim a RAG stack or vendor has passed testing.
Do not publish leakage rates until trials are complete.
Do not imply completed customer data testing.