The State of AI Security Engineering Report 2026

promptfoo

promptfoo helps teams evaluate prompts, compare models, run regression tests, and perform LLM red-team checks as part of development workflows.

Website Docs GitHub

4.6 / 5|90 / 100

Reviews

Status

active

Taxonomy

Categories

Evaluation and BenchmarkingLLM SecuritySecure AI SDLCDeveloper Security

Classes

FrameworkEval HarnessCliOpen Source Project

Tool types

Eval Orchestration FrameworkPrompt Injection TesterJailbreak Tester

Use-case coverage

Use cases are taxonomy tags, not verified coverage guarantees.

Primary

Llm Eval HarnessingPrompt Injection TestingSecure Ai Sdlc Gating

Secondary

Model Behavior Regression TestingPre Launch Ai Security ReviewDeveloper Security Training

Rating breakdown

4 reviews · confidence Low

4.6

stars

Usability90

Implementation87

Operational_reliability80

Security_control_depth76

Evidence_readiness83

Value_for_cost94

Adoption_depth73

Support_quality83

Review signal

G2-style structured review fields are aggregated into research-oriented dimensions.

4 reviews

Top strengths

Good Developer Workflow
Fast Time To Value
Easy To Use

Top pain points

Unclear Coverage
Limited Integrations
False Negative Risk

Notable review language

Fits naturally into developer workflows and helps make evals a habit.

Great for turning subjective prompt testing into repeatable evidence.

References and evidence

promptfoo GitHub repository

github.com

Github·Source Code

promptfoo documentation

promptfoo.dev

Docs·Documentation

Screenshots

Screenshot records are metadata placeholders until captured assets are added.

promptfoo evaluation dashboard

Evaluation dashboard placeholder.

Related tools

garak

NVIDIA

4.1 / 5

Open-source LLM vulnerability scanner for probing models and applications with adversarial tests.

Pillars

AttackDefend

Categories

LLM Security, AI Red Teaming, Evaluation and Benchmarking

Use cases

Prompt Injection Testing, Jailbreak Resistance Testing, Llm Eval Harnessing +1 more

Open Source FreeApache 2.0Active

PyRIT

Microsoft

4.0 / 5

Open-source Python Risk Identification Toolkit for generative AI red teaming.

Pillars

AttackDefend

Categories

AI Red Teaming, LLM Security, Evaluation and Benchmarking

Use cases

Ai Red Team Exercises, Jailbreak Resistance Testing, Prompt Injection Testing

Open Source FreeMITActive

OpenAI Evals

OpenAI

3.5 / 5

Open-source evaluation framework for testing language model behavior.

Pillars

AttackDefend

Categories

Evaluation and Benchmarking, LLM Security, Research and Education

Use cases

Llm Eval Harnessing, Model Behavior Regression Testing, Security Research

Open Source FreeMITActive

TruLens

TruEra

3.9 / 5

Open-source evaluation and tracking toolkit for LLM and RAG application quality.

Pillars

AttackDefend

Categories

Evaluation and Benchmarking, RAG Security, AI Observability

Use cases

Llm Eval Harnessing, Retrieval Audit Evidence, Ai Control Drift Monitoring

Open Source FreeMITActive

Back to tools

promptfoo

promptfoo helps teams evaluate prompts, compare models, run regression tests, and perform LLM red-team checks as part of development workflows.

Website Docs GitHub

4.6 / 5|90 / 100

Reviews

Status

active

Taxonomy

Categories

Evaluation and BenchmarkingLLM SecuritySecure AI SDLCDeveloper Security

Classes

FrameworkEval HarnessCliOpen Source Project

Tool types

Eval Orchestration FrameworkPrompt Injection TesterJailbreak Tester

Use-case coverage

Use cases are taxonomy tags, not verified coverage guarantees.

Primary

Llm Eval HarnessingPrompt Injection TestingSecure Ai Sdlc Gating

Secondary

Model Behavior Regression TestingPre Launch Ai Security ReviewDeveloper Security Training

Rating breakdown

4 reviews · confidence Low

4.6

stars

Usability90

Implementation87

Operational_reliability80

Security_control_depth76

Evidence_readiness83

Value_for_cost94

Adoption_depth73

Support_quality83

Review signal

G2-style structured review fields are aggregated into research-oriented dimensions.

4 reviews

Top strengths

Good Developer Workflow
Fast Time To Value
Easy To Use

Top pain points

Unclear Coverage
Limited Integrations
False Negative Risk

Notable review language

Fits naturally into developer workflows and helps make evals a habit.

Great for turning subjective prompt testing into repeatable evidence.

References and evidence

promptfoo GitHub repository

github.com

Github·Source Code

promptfoo documentation

promptfoo.dev

Docs·Documentation

Screenshots

Screenshot records are metadata placeholders until captured assets are added.

promptfoo evaluation dashboard

Evaluation dashboard placeholder.

Related tools

garak

NVIDIA

4.1 / 5

Open-source LLM vulnerability scanner for probing models and applications with adversarial tests.

Pillars

AttackDefend

Categories

LLM Security, AI Red Teaming, Evaluation and Benchmarking

Use cases

Prompt Injection Testing, Jailbreak Resistance Testing, Llm Eval Harnessing +1 more

Open Source FreeApache 2.0Active

PyRIT

Microsoft

4.0 / 5

Open-source Python Risk Identification Toolkit for generative AI red teaming.

Pillars

AttackDefend

Categories

AI Red Teaming, LLM Security, Evaluation and Benchmarking

Use cases

Ai Red Team Exercises, Jailbreak Resistance Testing, Prompt Injection Testing

Open Source FreeMITActive

OpenAI Evals

OpenAI

3.5 / 5

Open-source evaluation framework for testing language model behavior.

Pillars

AttackDefend

Categories

Evaluation and Benchmarking, LLM Security, Research and Education

Use cases

Llm Eval Harnessing, Model Behavior Regression Testing, Security Research

Open Source FreeMITActive

TruLens

TruEra

3.9 / 5

Open-source evaluation and tracking toolkit for LLM and RAG application quality.

Pillars

AttackDefend

Categories

Evaluation and Benchmarking, RAG Security, AI Observability

Use cases

Llm Eval Harnessing, Retrieval Audit Evidence, Ai Control Drift Monitoring

Open Source FreeMITActive