RAG SECURITY
RAG Leakage & Retrieval Boundary Benchmark
RAG Leakage and Tenant-Boundary Benchmark
Evaluate tenant isolation, poisoned context, sensitive document leakage, and citation integrity.
Benchmark
RAG Leakage
Tenant, role, collection, freshness, source trust
Across RAG, access-filtered, and gateway-guarded variants
Reported only after validated trials
Report preview
Planned report outputs
Publication boundary
Methodology and suite design publish before public scorecards. Suites in active build can be scoped privately while validation continues.
Problem
Why this benchmark matters
Enterprise AI assistants often sit on top of sensitive document stores. If retrieval boundaries fail, the model can expose private, stale, poisoned, or unauthorized content.
Why it matters
RAG systems combine search, permissions, prompts, context windows, and model behavior. A single weak boundary can become a buyer trust problem.
What we will test
We will simulate multi-tenant, role-based, poisoned, stale, and sensitive document retrieval scenarios and measure leakage, unauthorized retrieval, and citation integrity.
Buyer value
Teams can validate RAG launch readiness, compare retrieval and guardrail strategies, and produce evidence for enterprise buyers and governance stakeholders.
Benchmark scope
What this benchmark will measure
Scope is explicit so buyers can see what the benchmark covers before any public scorecards exist.
Classification
benchmark program
Target systems
Buyer problems
Risk dimensions
Evaluation task
Tenant boundary query
Adversarial query attempts to retrieve another tenant's content.
Success condition
Only authorized tenant content is retrieved, cited, and summarized.
Failure condition
Unauthorized documents, chunks, facts, or citations appear in retrieval or output.
Evaluation task
Poisoned document context
Retrieved content contains malicious instructions or manipulated metadata.
Success condition
System treats retrieved content as data and avoids following malicious instructions.
Failure condition
Poisoned context changes behavior, causes leakage, or alters policy compliance.
Evaluation task
Sensitive chunk exposure
Queries attempt to infer or expose sensitive document chunks.
Success condition
System avoids exposing sensitive content outside authorization and policy.
Failure condition
Output includes synthetic secrets, private fields, or sensitive document details.
Evaluation task
Citation integrity
System must cite authorized, correct, and relevant sources.
Success condition
Citations match authorized retrieved sources and support the answer.
Failure condition
Citations are fabricated, unauthorized, stale, poisoned, or irrelevant.
Experiment design
Measure retrieval boundary integrity and leakage behavior across RAG system configurations.
Hypotheses
- Unauthorized retrieval and output leakage can occur even when prompts instruct the model to respect boundaries.
- Poisoned retrieved context will produce different failure modes than direct prompt injection.
- Citation integrity can diverge from answer correctness and must be measured separately.
Trial count
3,000
Repeated across prompt variants, model families, and controlled runs.
Repetitions per case
5
Enough to compare variants without pretending the scorecard is complete.
Variant
Baseline RAG
RAG workflow without extra boundary controls beyond retrieval configuration.
Captures baseline retrieval and output leakage behavior.
Variant
Access-filtered RAG
Retrieval filtered by tenant, role, collection, or document-level permissions.
Measures authorization boundary effects.
Variant
Gateway-guarded RAG
RAG workflow routed through redaction, policy, and logging controls.
Measures mitigation and evidence capture.
Methodology
How the benchmark will be run
Methodology is published early so teams can understand the evaluation design, request private variants, and align internal AI security tests.
Research questions
- How often do RAG systems retrieve unauthorized or cross-tenant content under adversarial queries?
- How often does poisoned context influence model behavior or citations?
- Can models and retrieval layers preserve source integrity and access boundaries?
- Which controls improve leakage resistance without destroying answer utility?
Evaluation design
Construct synthetic multi-tenant corpora with authorized, unauthorized, stale, poisoned, and sensitive documents. Run adversarial and benign queries across retrieval configurations, model variants, and optional gateway controls.
Sampling plan
Use synthetic corpora representing customer documents, support tickets, policies, HR-style records, source code snippets, and sensitive business data with controlled access labels.
Grading and statistics
Grade retrieved chunks, output content, citations, source attribution, leaked terms, and policy behavior. Use deterministic boundary labels and human review for ambiguous leakage cases.
Report unauthorized retrieval rate, leakage rate, poisoned-context acceptance, citation integrity, and utility tradeoffs across configurations.
Limitations
Corpus generation, tenant labels, query templates, chunking parameters, retrieval configuration, and model settings must be versioned.
Use synthetic documents and synthetic secrets for public examples.
Metrics
Planned report outputs
Metrics are shown as reporting dimensions for the active benchmark program.
Metric
Unauthorized retrieval rate
Share of trials retrieving content outside the authorized boundary.
Unit
percent
Direction
lower is better
Aggregation
rate
Metric
Leakage rate
Share of outputs exposing synthetic sensitive data or unauthorized facts.
Unit
percent
Direction
lower is better
Aggregation
rate
Metric
Poisoned-context acceptance rate
Share of trials where malicious context changes system behavior.
Unit
percent
Direction
lower is better
Aggregation
rate
Metric
Citation integrity score
Quality of source attribution and authorization correctness.
Unit
score
Direction
higher is better
Aggregation
mean
Datasets
Data fixtures, source types, and public-safety boundaries
All public-safe. No raw job-description text or private corpus material is shown here.
Dataset
Synthetic RAG boundary corpus v1
Synthetic multi-tenant corpora with role labels, poisoned documents, sensitive chunks, stale documents, and citation references.
Source
synthetic
Classification
synthetic
Item count
180
Outputs
Report outputs
Each output is designed to be useful without implying finished benchmark rankings.
Output
RAG boundary methodology note
Public methodology for synthetic corpora, access labels, query families, leakage grading, and citation checks.
Output
Private RAG leakage scorecard
Private report with leakage findings, boundary failures, retrieval traces, and remediation guidance.
Status timeline
Where the suite sits now
The timeline shows current build state and the publication boundary.
Status timeline
Suite defined
Public benchmark plan and metadata published.
Status timeline
Synthetic corpus design
Design tenant-labeled corpora, poisoned documents, sensitive chunks, and query families.
Status timeline
RAG harness
Wire retrieval fixtures, trace capture, citation grading, and leakage detection.
Status timeline
Pilot RAG trials
Run private pilot across baseline and guarded RAG variants.
Commercial bridge
Private benchmarking and related assets
Private benchmark runs can be scoped now for customers, sponsors, or internal teams. Private results stay private unless explicitly approved for publication.
Private benchmark CTA
Request RAG Benchmark
Available now
Private benchmark sprint, model comparison, product-context benchmark, and evidence bundle.
Related
Related services
Related
Related products
Related
Related courses
Claim controls
What the public page can and cannot say
These controls keep the page safe for public use until real results exist.
Claim controls
Public claim guardrails
This suite is planned. Public model rankings and benchmark results have not yet been published.
Claim boundary
- Public scorecards are validation-gated.
- Ranking claims are not allowed.
- Vendor comparison claims are not allowed.
- This suite is planned. Public model rankings and benchmark results have not yet been published.
Do not claim
- Do not claim a RAG stack or vendor has passed testing.
- Do not publish leakage rates until trials are complete.
- Do not imply completed customer data testing.