aisecurity.llc

THE STATE OF AI SECURITY ENGINEERING

2026 ANNUAL REPORT

What Companies Really Mean by
“AI Security Engineer”

A structured analysis of 293,846+ job descriptions across 5,350 companies — validated by direct practitioner surveys, arXiv research momentum, open-source builder activity, and industry media signals. The most comprehensive benchmark of the AI security engineering labor market available.

Required reading for CISOs scoping programs, hiring managers writing requisitions, recruiters sourcing candidates, and practitioners navigating a discipline being defined in real time.

Job descriptions analyzed

294K+

Named market findings

15 findings

Role archetypes defined

9 archetypes

Independent signal layers

7 layers

About the authors and editors

Contributor notes for the 2026 report

These bios are intentionally brief. They identify the people who shaped the manuscript and the narrow reason each one is included here.

Co-authors

Primary manuscript authors and research framing.

Co-author

Alex Eisen

Advises on AI risk, incident response readiness, and research-informed product security priorities.

Relevance

Applied security-research and AI-risk framing to the control-plane sections.

Co-author

Alon Braun

Strategy, product framing, and advisory translation for teams that need a usable operating model.

Relevance

Shaped report structure, executive translation, and public-safe positioning.

Editors

Editorial review for clarity, precision, and publication-safe language.

Editor

Tim Kerimbekov

Risk-informed security strategy and operating-model guidance grounded in product and enterprise experience.

Relevance

Reviewed risk language and operating-model guidance for practical clarity.

Editor

Dorina Miroyannis

Legal and policy coverage for teams that need privacy, security, and terms pages updated without losing contractual precision.

Relevance

Reviewed policy language, contract boundaries, and public-safe wording.

Contents

01Executive SummaryFive numbers that define the 2026 market1
02The 15 FindingsWhat the corpus reveals, and what to do about it2
03Vertical Intelligence10 industry profiles3
04Role Architecture Canon9 archetypes with decision tree, risk matrix, and interview questions4
05Boardroom-to-Backlog PlaybookPre-posting checklist and what you wrote vs. what you need5
06External Validation SignalsarXiv, GHArchive, media, Wikimedia, vulnerability, and framework intelligence6
07Glossary12 AI security terms every hiring team needs to know7
08Methodology and Claim Boundaries8

Executive Summary

A new security discipline is being staffed before it has been defined. AI Security Engineering has crossed from experimentation into institutional demand — but the organizational infrastructure required to absorb that demand is running years behind. This report documents the gap between what companies say they need and what the market can actually deliver, drawn from 293,846+ job descriptions, direct practitioner surveys, academic research, open-source builder activity, industry media coverage, and vulnerability intelligence. Across seven independent signal layers, the conclusion is the same.

Five numbers that define the 2026 market

Number	What it means	What this means for your next hire
290×	Growth in AI security hiring from 2022 to 2026. Every position is calibrated for senior experience in a discipline that did not exist four years ago.	Every experienced AI security candidate has 2–4 competing offers right now. Scoped single-domain roles close 2–3× faster than chimera specs. Over-scoped roles sit open.
39:1	Legacy compliance frameworks (GDPR, HIPAA, SOC 2) vs AI-native governance frameworks (NIST AI RMF, EU AI Act, ISO 42001) in hiring language. The compliance reflex is structural, not incidental.	Your JD is almost certainly calibrated for a compliance program manager, not an AI security engineer. If it mentions GDPR but not NIST AI RMF, you're hiring the wrong profile.
8:1	Traditional security tool vocabulary (detection, SIEM, AppSec scanners) vs AI-native evaluation and observability tools in the same hiring corpus. The tooling stack is calibrated for compliance audit, not AI behavioral risk.	If your JD requires Splunk but not LangSmith, you're screening for infrastructure security, not AI behavioral risk. You will select for the wrong profile.
0.24%	Share of all 2026 postings that contain agentic attack surface language (prompt injection, function calling, tool calling). Agentic deployments are at scale; agentic security hiring is at near-zero.	If you've shipped an AI agent, you have agentic security exposure right now. The hire you need doesn't exist at scale in the market — fractional or contract coverage is likely the only near-term option.
57:1	Job postings that reference agentic attack surfaces vs postings with agentic control design language. The market is learning the vocabulary of the problem faster than it is staffing for the solution.	Your JD can name the risk — but hiring for the control architect who can actually solve it requires a different archetype than what most postings describe. See the decision tree in §04.

The consequence of these five numbers is not subtle: companies are posting role language that implies team-shaped capability while budgeting and interviewing as if they are hiring one specialist contributor. The result is a hiring market defined by Chimera Specs — one salary, five professions — and a discipline being invented at the top of the org chart with almost no junior entry pathways. Governance language has outrun evidence language. Compliance vocabulary has crowded out AI-native control language. And the hiring market is running three years behind the deployment curve on the agentic attack surfaces that matter most.

Three structural shifts would change this trajectory. First: separate ownership domains from hiring units — one role, one primary function, one accountability domain. Second: require evidence artifacts at the point of governance obligation creation — every compliance framework reference in a job description should come paired with a named evidence artifact the hire will produce. Third: invest in practical assessment infrastructure now — adversarial AI labs, scenario-based evaluation environments, and archetype-specific interview rubrics are not future-state investments; they are the prerequisite for building any repeatable hiring signal in a discipline without shared standards. The organizations that solve these three problems first will not just hire better — they will define the operating model that the rest of the market eventually adopts.

These conclusions are not derived from a single source. The hiring corpus is the primary evidence layer, but it does not stand alone. Primary practitioner survey research confirms the ownership gap: 27% of respondents report no clear AI security owner in their organization. Self-reported program maturity averages 2/5 — squarely in the emerging band, consistent with executive awareness without engineering delivery. 57% of practitioners recognize AI Security Engineering as an emerging distinct discipline, yet are watching their organizations hire as if it is a subspecialty of AppSec. The most cited risk — Data leakage via AI (38% of respondents) — is precisely the class of threat that agentic deployment creates and that 0.24% of job descriptions currently screen for. Meanwhile, arXiv research shows academic output in agentic attack surfaces accelerating faster than any other AI security domain. GitHub builders are funding the same taxonomy with open-source activity. Industry media is amplifying governance framing to board audiences who interpret it as compliance progress rather than control absence. And public vulnerability disclosures — 1,458 AI-relevant CVEs, 3 confirmed in CISA KEV — confirm that the attacks are not hypothetical: the exploits exist and are being used. The hiring market, the practitioners, the researchers, the builders, the press, and the CVE feed are all observing the same structural condition from different vantage points — and all of them are describing a gap between deployment and defense.

294K+

Job descriptions analyzed

5,350 companies · ATS data 2013–2026

27%

Report no clear AI security owner

Primary survey · cross-persona signal

Role archetypes defined

With boundary conditions and 90-day outputs

39:1

Legacy vs AI-native frameworks

GDPR (5.5K) vs NIST AI RMF (91) — the compliance reflex is structural

"The hiring market, the practitioners, the researchers, the builders, the press, and the CVE feed are all observing the same structural condition from different vantage points — and all of them are describing a gap between deployment and defense."

Key findings — where the signals converge

Finding #1 · Agentic Attack Surface

Agentic deployments are live; agentic security hiring is near-zero.

511 job postings reference agentic attack surfaces in 2026. Only 9 reference agent control design language — a 57:1 gap. arXiv confirms agentic attack surface research is the fastest-accelerating academic domain. The hire the market needs doesn't exist at scale.

Finding #2 · Compliance Displacement

Legacy compliance frameworks are crowding out AI-native engineering requirements.

39:1 ratio of legacy (GDPR, HIPAA, SOC 2) vs AI-native (NIST AI RMF, EU AI Act, ISO 42001) framework mentions in 2026 hiring language. Organizations are hiring compliance program managers for AI security engineering roles — and selecting for the wrong archetype.

Finding #3 · LLM Framework Vulnerability Density

AI/ML product and framework vulnerabilities are the dominant disclosed attack surface.

1,458 AI-relevant CVEs classified from 26K+ total records. The top bucket — ai ml framework library vulnerability — accounts for 378 disclosures. 3 appear in CISA KEV (active exploitation confirmed). The vulnerabilities exist; the roles to remediate them are barely hired.

Finding #4 · Entry Pipeline Collapse

290× hiring growth with zero junior pathways is building a structural staffing collapse.

AI security hiring grew 290× from 2022 to 2026. Every role targets senior experience in a discipline that did not exist four years ago. There is no OSCP equivalent for AI security, no standardized assessment infrastructure, no certifiable junior curriculum. The pipeline cannot refill without deliberate investment now.

Finding #5 · Signal Convergence Confirms the Structural Diagnosis

Six independent data sources — none sharing methodology or institutional source — describe the same gap.

The ATS corpus (employer language), practitioner survey (first-person experience), arXiv (academic attention), GHArchive (builder behavior), media (editorial coverage), and CVE disclosures (exploit pressure) are each independently classified against the same AI security taxonomy. Each produces a different slice of evidence. All six describe the same structural gap between deployment and defense. When independent systems converge this consistently, the convergence is the argument — not any single signal.

The 15 Findings

Fifteen market findings derived from job description signal analysis across 293,846+ postings. Each finding is a reusable market concept supported by corpus evidence, with explicit claim boundaries and audience-specific action implications.

#	Finding	Key signal	Your next step
01	The Frankenstein Role	Avg breadth 6.79 — Gov & Defense leads all industries	Run a role-architecture workshop before opening any AI security req
02	Skill Washing	100K+ AI-labeled roles in Manufacturing with lowest AI specificity	Rewrite JD around named controls and evidence artifacts, not title keywords
03	The Unicorn Index	Every security role family averages 5.9–6.5 capability breadth	Scope each role to one primary ownership domain — use the checklist in §05
04	The Probability Pivot	57:1 agentic attack surface vs control design language in 2026	Add probabilistic failure-mode scenarios to every AI security interview loop
05	The Evidence Gap	GDPR 5,461 jobs (0.9% evidence language) vs NIST AI RMF 91 (13.2%)	Map every governance obligation to a named evidence artifact before posting
06	Agentic Anarchy	511 agentic attack surface signals vs 9 agent control postings — 2026	Assign Agent Security Engineer ownership before your next agentic deployment
07	The vCISO Vacuum	Mid-market AI exposure before full-time staffing maturity — no designed model	Define a staged operating model — fractional or MSSP — before the first hire
08	Boardroom-to-Backlog Gap	Privacy framework 8,555 jobs vs AI Governance 295 jobs — 29:1	Require every board AI risk narrative to map to a named backlog item with an owner
09	Skills Validation Gap	No OSCP equivalent for AI security — generic loops, archetype-specific roles	Build archetype-specific interview rubrics before the first screen — see §04
10	Model Supply Chain Blind Spot	Runtime signals dominate; artifact integrity and provenance under-specified	Add model lifecycle and provenance control ownership to every AI security scope
11	Entry-Level Extinction	290× growth 2022–2026; zero junior pathways; senior-only into new discipline	Create explicit junior-to-mid transition pathways now — not after the pipeline collapses
12	The Red Team Misnomer	"AI red team" applied to governance, product, platform, and abuse testing equally	Name three specific adversarial exercise types in every red team posting
13	The Compliance Reflex	Legacy frameworks 12.4K jobs vs AI-native frameworks 317 — 39:1	Add one AI-native framework reference for every legacy compliance requirement in your JD
14	The Tool Incumbency Trap	Detection/AppSec (4.2K jobs) vs AI-native eval + observability (552 jobs) — 8:1	Audit AI security tooling against AI-specific threat coverage, not compliance coverage
15	The Agentic Surface Emergence	0.24% of postings mention agentic controls; trajectory steep from near-zero	Add prompt injection and function calling scenarios to interview loops now

Finding 01 · Talent & role-design crisis

The Frankenstein Role

AI Security Engineer postings increasingly bundle five historically separate capability families into one requisition — at one salary.

AI feature velocity, buyer scrutiny, and compliance pressure converged before organizational design matured. Companies needed someone who could simultaneously own product security, model governance, adversarial testing, regulatory evidence, and agentic controls. They wrote that into one job description instead of designing a team.

6.79

Highest avg role breadth score

Government and Defense — 2026

5.21

Cross-corpus average breadth score

Across all 294K+ job descriptions analyzed

100K

Jobs in highest-volume industry

Manufacturing Industrial and OT — 2026

Average role breadth score by industry — 2026

Score reflects number of distinct capability families bundled in role language. Higher score = more Frankenstein. Scale: 0–10.

The corpus-level average role breadth score is 5.21 — across all 294K+ analyzed postings. Government and Defense leads all industries at 6.79, followed by Retail and Ecommerce at 6.29. Financial Services — with 38.8K jobs, the largest single scoped-security market — sits at 5.44. Manufacturing leads in volume (100K postings) but has the weakest AI-native specificity. Even Telecommunications, the most conservative sector, posts roles averaging 3.73 capability families. No industry is writing clean, scoped AI security roles.

"No industry is writing clean, scoped AI security roles. The Frankenstein pattern is not a failure of individual job descriptions — it is a failure of organizational role architecture across an entire discipline."

What leaders are misreading

Leaders treat this as a talent shortage problem alone. It is primarily a role-design failure. The scarcity is not of capable people — it is of organizations willing to do the architectural work of separating ownership domains before opening requisitions.

Failure mode if unaddressed

Chronic mis-hiring, scope collapse after onboarding, low tenure stability, and perpetual re-opening of the same role.

What this changes now

Define one primary ownership domain per role before writing the job description.
Require every AI security requisition to name the capability family it sits in.
Use role-architecture workshops before recruiting — not after.

Finding 02 · Title/substance mismatch

Skill Washing

AI-labeled security titles often outpace the AI-specific control, testing, and evidence language inside the same posting.

Title demand outpaced discipline standardization. Recruiting teams adopted AI prefix language as a market signal before organizations developed meaningful AI-specific control requirements to pair with it. The title changed; the job description content did not.

100K

Jobs in largest AI hiring market

Manufacturing Industrial and OT

38.8K

Jobs in second-largest market

Financial Services

AI security-labeled job volume by industry — 2026

Job count for roles with AI security signals in title or description. High volume does not imply AI-specific control language depth.

Manufacturing Industrial and OT leads by a wide margin at 100K AI security-labeled roles in 2026 — but this sector also has the weakest AI-native specificity signals. Financial Services posts 38.8K such roles and Government/Defense 30.5K, sectors where legacy compliance language dominates. The AI label is applied broadly; the AI-specific control substance is applied narrowly.

"The AI label is applied to the title; the AI-specific control substance is applied to almost nothing. The gap between these two numbers is not a measurement artifact — it is an organizational design failure at industry scale."

What leaders are misreading

AI in the title is treated as proof of AI security scope. It is not. Title density is a market demand signal, not a capability evidence signal.

Failure mode if unaddressed

Teams hire for legacy security profiles while believing they have staffed AI risk. The coverage gap is invisible until an incident makes it legible.

What this changes now

Rewrite requisitions around named controls, evidence artifacts, and execution outputs.
Screen for role-language specificity in candidates, not keyword density.
Train recruiting teams to distinguish AI-labeled roles from AI-specific roles.

Finding 03 · Team-shaped requirements

The Unicorn Index

The market prices one role while describing team-level capability breadth. Every role family is affected — none is exempt.

Immediate pressure to cover product, governance, and customer assurance simultaneously led hiring managers to compress team-shaped requirements into single-contributor budgets.

6.53

Highest breadth — Data Security roles

2026 role family

6.42

AI Security-specific roles

2026 role family

5.9

Lowest of top 8 role families

Identity Security

Average role breadth score by security role family — 2026

Breadth score across all roles assigned to each security capability family. The unicorn problem is not confined to AI roles.

Data Security roles lead with a 6.53 average breadth score, followed by Application Security at 6.49 and Product Security at 6.44. AI Security-specific roles sit at 6.42 — high, but not uniquely high. The unicorn problem is structural across security hiring, and AI has not made it worse — it has made it visible.

"Every security role family sits above 5.9 on the breadth scale. The Unicorn problem is not uniquely an AI security problem — AI has simply made it impossible to ignore. You cannot hire your way out of a role architecture failure."

What leaders are misreading

Compensation is treated as the only lever. Scope architecture is the lever that actually matters. Paying more for the unicorn does not change the impossibility of the spec.

Failure mode if unaddressed

Open roles stay unfilled or are filled with scope collapse: the hire arrives, renegotiates scope in week four, and the role re-opens within eighteen months.

What this changes now

Scope every security role to one primary ownership domain before posting.
Use phased staffing plans — not omnibus hires — for multi-domain coverage.
Align compensation explicitly to stated breadth assumptions.

Finding 04 · Systems reasoning shift

The Probability Pivot

AI security demands a cognitive shift from deterministic defect reasoning to probabilistic systems reasoning — but neither hiring loops nor interview rubrics have adapted to evaluate it.

Traditional security is built on deterministic failure logic: a buffer overflows or it does not. A CVE is present or patched. AI systems fail probabilistically: the same prompt produces different outputs across temperature settings, context window states, and retrieval results. An adversarial input may work 40% of the time, not 100%. A control that reduces attack success from 80% to 20% is meaningful even though it does not eliminate the vector. This is an entirely different reasoning mode — and it is not what standard AppSec or penetration-testing interview loops are designed to evaluate.

511

Jobs with agentic attack surface language — 2026

Up from 48 in 2025 — a 10× year-over-year jump

Jobs with agent control design language — 2026

The gap between attack surface awareness and control architecture is 57:1

Standardized interview rubrics for probabilistic AI failure reasoning

The cognitive shift is named in research; it is not yet codified in hiring practice

Jobs mentioning agentic attack surfaces — by year

Rapid growth in attack surface language vs near-zero growth in agent control language. The discipline is becoming aware of the problem faster than it is staffing for the solution.

Agentic attack surface signals appeared in effectively zero job postings before 2024. By 2025: 48 postings. By 2026: 511 — a 10× year-over-year jump in a single year. The same 2026 dataset shows only 9 postings with meaningful agent control language — a 57:1 gap between attack surface awareness and control design language. This gap is not a data artifact. It reflects a genuine organizational pattern: teams recognize the agentic attack surface exists before they know how to architect controls for it. Interview loops still screen for static AppSec knowledge — deterministic vulnerability identification, CVE analysis, exploit reproduction. None of these evaluate the probabilistic tradeoff reasoning that AI security actually requires.

What leaders are misreading

Security interview performance on traditional vulnerability questions is used as a proxy for AI security capability. Strong AppSec candidates may fail at probabilistic reasoning; strong probabilistic reasoners may not have AppSec backgrounds. The screen is measuring the wrong variable.

Failure mode if unaddressed

Hiring systematically selects for deterministic-thinking security candidates in roles that require probabilistic-reasoning AI security judgment. The discipline builds a talent cohort optimized for the wrong problem class.

What this changes now

Add explicit probabilistic failure-mode scenarios to every AI security interview loop.
Evaluate tradeoff reasoning and risk quantification under uncertainty — not only exploit identification.
Define "acceptable control posture" for AI systems in probabilistic terms before designing interview criteria.

Finding 05 · Governance-to-execution gap

The Evidence Gap

Governance language appears before engineering evidence language in AI security hiring. Organizations can describe the policy obligation but cannot yet describe the proof.

Policy and framework adoption moved faster than productionized control instrumentation. Boards demanded governance narratives; governance teams wrote policies; engineering teams were not yet staffed to produce the evidence that would validate those policies.

5.5K

GDPR job postings in 2026

0.9% contain evidence-producing control language

NIST AI RMF job postings in 2026

13.2% contain evidence-producing control language

60:1

GDPR-to-NIST-AI-RMF job ratio

Volume disparity between legacy and AI-native governance frameworks

Framework adoption in AI security hiring — 2026

Cyan = legacy compliance & privacy frameworks · Violet = AI-native governance frameworks. GDPR (5,461 jobs, 0.9% with evidence language) vs NIST AI RMF (91 jobs, 13.2% with evidence language) — the inverse specificity paradox.

GDPR appears in 5.5K job postings — the single largest framework signal in 2026. Only 0.9% of those contain evidence-producing language such as control attestations or telemetry requirements. HIPAA: 3.1K postings, 0% evidence language. SOC 2: 1.9K postings, 0% evidence language. The signal inverts with AI-native frameworks: NIST AI RMF appears in just 91 postings, but 13.2% contain evidence-producing language. ISO/IEC 42001: 15 postings, 13.3% evidence language. Where organizations are deliberately staffing AI-native governance, they write more rigorous requirements — but almost none are doing so yet.

What leaders are misreading

Policy completion is interpreted as risk reduction. Governance programs that produce narrative confidence without operational proof are not risk reduction programs — they are risk documentation programs.

Failure mode if unaddressed

Governance programs accumulate policy artifacts while control behavior remains unmeasured. The board believes the posture is improving; the actual posture is static.

What this changes now

Map every governance obligation to a named evidence artifact at design time.
Require telemetry, eval outputs, and remediation closure as deliverables — not documentation.
Treat evidence quality as a board-reportable KPI, not an engineering detail.

Finding 06 · Delegated action risk

Agentic Anarchy

Agent security is delegated action security. The market is still framing it as chatbot security — a category error with operational consequences.

Tool-calling and workflow automation expanded AI blast radius from response quality to action execution. A chatbot that says the wrong thing is a reputational risk. An agent that takes the wrong action is an operational one. The control architectures for these two scenarios are fundamentally different.

511

Jobs with agentic attack surface signals

2026 — from effectively zero in 2023

Jobs with agent control language

2026 — the actual control gap

0.21%

Share of 2026 jobs with agentic signals

Out of 242K analyzed

Agentic attack surface signals — job mention frequency

Count of job postings mentioning each agentic attack surface. Function calling, prompt injection, and tool calling are the dominant emerging signals.

The agentic attack surface vocabulary is present but sparse. Function Calling appears in 278 postings; Prompt Injection in 258; Tool Calling in 236. Combined, these three represent under 800 job postings against a 2026 corpus of 242K — less than 0.35%. Meanwhile, agentic deployments are accelerating across every vertical. The hiring market is running approximately three years behind the deployment curve on agentic control vocabulary.

"A chatbot that says the wrong thing is a reputational risk. An agent that takes the wrong action is an operational one. The control architectures for these scenarios are not the same."

What leaders are misreading

Prompt-layer defense is treated as sufficient control architecture for agentic systems. It is not. Prompt hardening addresses what the model says. Action authorization addresses what the agent does — and action authorization is almost entirely absent from the hiring vocabulary.

Failure mode if unaddressed

Permitted agent actions become high-impact misuse pathways. Authorization, rollback, and audit capabilities are absent at launch because no one owned them.

What this changes now

Design action authorization as a first-class control — not an afterthought.
Require rollback, telemetry, and approval logic for every delegated agent workflow.
Threat-model delegated action paths before release, not after incident.

Finding 07 · Mid-market exposure gap

The vCISO Vacuum

Many organizations are too small to hire the AI security unicorn but too exposed to defer. Mid-market companies are the gap the market has not designed for.

AI exposure emerges before dedicated staffing maturity. A 200-person SaaS company shipping AI features faces the same model behavior, agentic control, and data-boundary obligations as a 20,000-person enterprise — without the headcount budget to staff a full AI security program. The discipline has no designed operating model for this segment.

100K

Manufacturing Industrial and OT jobs

2026 — largest AI security hiring market, lowest role specificity scores

38.8K

Financial Services jobs

2026 — high governance volume, unicorn compression at mid-market

Designed operating models for fractional AI security

The market has produced role archetypes, not service delivery models

AI security job volume by industry — 2026

Hiring volume by sector. High-volume markets with low role specificity (Manufacturing) have the most acute vCISO vacuum — exposure without staffing architecture to address it.

The vCISO vacuum is most acute at the intersection of three conditions: organizations large enough to face real AI security exposure, too small to hire a dedicated AI security team, and operating in sectors without established third-party support models. Manufacturing leads on volume (100K postings) but has the weakest AI-specific control language — which signals that most of those organizations are posting aspirational language without the internal expertise to execute it. Financial Services faces a different version: the governance vocabulary exists, but mid-market firms (500–5,000 employees) cannot absorb a unicorn AI security hire at the compensation level required. MSSPs and fractional AI security services represent the structurally correct answer to this gap — a managed capability model where the expertise is amortized across clients, and the operating model is designed for staged delivery rather than a single hire.

"A 200-person SaaS company shipping AI features faces the same agentic control and data-boundary obligations as a 20,000-person enterprise — without the headcount to staff a response. The market has not designed an operating model for this gap. That is an MSSP opportunity, not a hiring problem."

What leaders are misreading

Staffing constraints are treated as justification for deferral. They are the reason to adopt a managed or fractional operating model — not the reason to defer risk exposure.

Failure mode if unaddressed

AI exposure accumulates without named controls or evidence artifacts. When an incident occurs, the organization discovers simultaneously that it has no controls, no evidence of prior posture, and no internal expertise to respond.

What this changes now

Define a staged AI security operating model before hiring: what do you need coverage on in 90 days, 6 months, and 12 months?
Use fractional or MSSP AI security support as a bridge — not a permanent deferral.
Establish evidence artifacts and internal ownership before the first full-time AI security hire arrives.

Finding 08 · Execution translation failure

Boardroom-to-Backlog Gap

Executive AI risk narratives fail to translate into named engineering controls, accountable owners, and evidence artifacts — the machinery to answer the board's AI risk questions does not exist.

Board pressure on AI risk arrived before organizations built the execution infrastructure to respond to it. The result is a performative loop: the board asks, the CISO narrates, governance teams document, and engineers are not yet staffed or empowered to produce the evidence artifacts that would validate any of it. The gap between the boardroom question and the backlog item with an owner is where AI risk exposure actually lives.

8.6K

Privacy framework hiring signal

2026 — largest governance category in the hiring corpus

295

AI Governance-specific job postings

2026 — 29× smaller than Privacy framework hiring

Standard formats for AI control evidence artifacts

The board is asking questions; the industry has no agreed format for the answer

Framework category job volume — 2026

Privacy and Compliance framework hiring dominates. AI Governance hiring (295 jobs) is 29× smaller than Privacy framework hiring — but AI Governance roles write more rigorous evidence language when they do appear.

The boardroom-to-backlog gap is visible in two ways in the hiring data. First: Privacy framework hiring (8.6K jobs) and Compliance hiring (3.8K jobs) dwarf AI Governance hiring (295 jobs) — organizations are staffing compliance narrative, not AI control execution. Second: fewer than 1% of Privacy and Compliance framework job postings contain evidence-producing language. The hiring architecture reflects the governance architecture: policy is produced; proof is not. This is not a CISO failure — it is an organizational design failure. The board deck exists; the backlog item with a named owner and an evidence requirement does not.

What leaders are misreading

Strategy articulation is treated as execution readiness. Board confidence in the CISO's AI risk narrative is confused with board visibility into control posture. These are different things, and the difference will surface in the next incident.

Failure mode if unaddressed

Risk narratives repeat unchanged across four to eight board cycles. Each cycle the narrative grows more detailed; the underlying control posture remains unmeasured. When an incident occurs, the organization discovers it cannot demonstrate what it claimed to the board.

What this changes now

Require every AI risk statement presented to the board to map to at least one named backlog item with an accountable owner.
Set evidence artifact requirements — telemetry, eval output, remediation closure — at strategy planning time, not after audits.
Build a control evidence scorecard reviewed at the same cadence as board reporting.

Finding 09 · Assessment maturity lag

Skills Validation Gap

The market demands AI security engineering capability before it has standardized practical evaluation pathways.

Role demand accelerated before assessment models matured. There are no standardized AI security certifications, no widely-accepted lab environments for AI security skill demonstration, and no practical exam pathway equivalent to OSCP or equivalent for AI-specific attack surfaces. Organizations are running generic security interviews against AI security requirements — selecting candidates through loops calibrated for a different discipline.

30.8K

ML and AI Engineering — 2026

Largest role family in AI security-adjacent hiring

2.5K

Cyber Defense — 2026

Second largest AI security-adjacent hiring pool

Standardized practical AI security assessments

No OSCP equivalent exists for prompt injection, agentic control, or RAG security

AI security job volume by security role family — 2026

Each role family requires distinct assessment approaches. ML/AI Engineering hiring uses engineering screens calibrated for model building, not model security. Governance/GRC hiring uses policy screens calibrated for compliance, not control evidence.

The skills validation gap is structural. Organizations are hiring across 13+ security role families under AI security labels, each with distinct competency profiles that generic interview loops cannot differentiate. A standard AppSec interview tests for vulnerability identification in code — it does not evaluate agentic control design, RAG boundary enforcement, or probabilistic failure-mode reasoning. There is no OSCP equivalent for AI security. No standardized adversarial AI lab examination. No practical certification pathway that tests whether a candidate can exploit a RAG retrieval pipeline, bypass a prompt injection filter, or design a delegated-action authorization policy under adversarial conditions. The emerging solution pathway is scenario-based practical assessment: hands-on cyber range environments where candidates demonstrate real judgment — not recall — against deployed AI system configurations. Exercises that mirror real attack scenarios: prompt injection chains against agentic workflows, RAG boundary manipulation to extract context-window data, model behavior evaluation under distribution-shift attacks, authorization logic design for tool-calling systems. This mirrors how traditional security assessment evolved from certification to practice-based evaluation — a shift the AI security discipline is only beginning to make. The organizations that standardize practical AI security assessment infrastructure first will build the only reliable hiring signal in the market, and create the first reusable evaluation standard for the discipline.

"There is no OSCP equivalent for AI security. No standardized adversarial AI lab. No practical exam for prompt injection exploitation, agentic control design, or RAG boundary enforcement. The market is hiring at scale for a discipline the assessment ecosystem cannot yet validate."

What leaders are misreading

Credential density (certifications, degree programs, tool familiarity) is treated as competency evidence. It is a prior-discipline filter, not an AI security evaluation. The skills that matter most — ambiguity tolerance, probabilistic failure reasoning, control design under non-deterministic conditions — are not assessed by any current credential pathway.

Failure mode if unaddressed

Selection quality becomes noisy and non-repeatable across interviewers. False positive rate is high, false negative rate is equally high (good candidates rejected for the wrong signals). Scope collapse occurs after onboarding because the interview loop measured credentials, not capability.

What this changes now

Build role-archetype-specific interview rubrics before opening any AI security requisition — one rubric per archetype, scored criteria, not generic security questions.
Replace recall-based technical screens with adversarial AI lab exercises: prompt injection exploitation chains, RAG boundary manipulation, agentic authorization design under attack conditions.
Define a practical skills demonstration standard before the first screen — name the specific attack scenarios and control design tasks a candidate must demonstrate, not just the credentials they must hold.

Finding 10 · Lifecycle control deficit

Model Supply Chain Blind Spot

Model provenance, artifact integrity, dependency management, and deployment gates are systematically under-specified in AI security role language.

Organizations focus first on model behavior and user interaction risks. The assumption is that if the model behaves correctly at inference time, the system is secure. Supply chain compromise does not attack inference — it attacks the artifact pipeline upstream of it.

Data Poisoning signal jobs

Supply chain attack — model training corruption signal in 2026 postings

Model artifact / weights signal jobs

Lifecycle integrity signal — dramatically under-represented vs runtime signals

278

Prompt Injection signal jobs

Runtime signal — 5× more prominent than supply chain signals in the same corpus

Attack surface signal job mention frequency

Frequency of attack surface vocabulary in AI security job postings. Model supply chain signals (model weights, data poisoning) lag behind runtime and agentic signals.

The attack surface vocabulary in AI security hiring is dominated by runtime and agentic signals: function calling, prompt injection, tool calling. Model supply chain signals — model weights, data poisoning, artifact integrity — appear far less frequently, suggesting that lifecycle security ownership is not yet a primary hiring criterion. Organizations actively building AI security programs are staffing runtime controls; lifecycle controls remain an afterthought. This creates a specific risk class: an attacker who poisons the training dataset, corrupts model weights during packaging, or injects backdoors into a fine-tuning pipeline bypasses all runtime controls because the compromise occurred upstream of every inference-time defense.

"Runtime controls address what happens when the model runs. They do not address what happens when the model was built from a poisoned dataset, packaged from a compromised artifact, or deployed through a backdoored fine-tuning pipeline. The supply chain attack surface is outside every runtime security dashboard in the current stack."

What leaders are misreading

Runtime controls are treated as complete security posture. They address what happens when the model runs. They do not address what happens when the model was built, packaged, or deployed from a compromised artifact.

Failure mode if unaddressed

Silent supply chain risk accumulates outside visible incident pathways. When it surfaces, it bypasses all runtime controls because the compromise occurred upstream.

What this changes now

Add model lifecycle and provenance control ownership into every AI security role scope.
Require release-gate evidence for model artifact changes.
Include dependency and artifact integrity checks in security review processes.

Finding 11 · Talent supply crisis

Entry-Level Extinction

AI Security Engineering is being invented at the top of the org chart. The market is hiring senior-only into an unproven discipline, with almost no junior pathways.

Immediate risk pressure and budget constraints favor experienced hiring language. Organizations need someone who can own the domain on day one. The consequence is a discipline with no talent pipeline — which is sustainable for approximately one hiring cycle.

39.3K

AI security-adjacent jobs in 2026

Up from 7,833 in 2025

7.8K

Jobs in 2025

5× growth year-over-year

134

Jobs in 2022

The starting point four years ago

AI security-adjacent job posting volume — by year

Year-over-year growth of AI security role postings. Explosive recent growth with no junior pipeline creates a structural supply problem.

The growth trajectory is extraordinary: 134 postings in 2022, 526 in 2024, 7.8K in 2025, 39.3K in 2026. This is a 290× expansion in four years. Every position in this expansion is calibrated for senior experience in a discipline that did not exist four years ago. The pipeline problem is not hypothetical — it is already present, and the organizations hiring aggressively today are consuming the very talent pool they will depend on in three years.

"290× growth in four years. Every position calibrated for senior experience in a discipline that did not exist four years ago. The pipeline problem is not coming — it is already here."

What leaders are misreading

Senior-only staffing appears efficient in the short term. It consumes available experienced candidates without building the pipeline that will replace them. The efficiency is borrowed from the future.

Failure mode if unaddressed

Future mid-level talent pipeline collapses. Organizations that did not invest in junior pathways will face a structural shortage of experienced AI security engineers at exactly the moment when they have the budget and maturity to hire them.

What this changes now

Create explicit junior-to-mid transition pathways in AI security programs now.
Define apprentice-supportable scopes tied to measurable first-year outputs.
Pair senior hires with skills-transfer mandates as a condition of the role.

Finding 12 · Role language confusion

The Red Team Misnomer

"AI red team" is used as a catch-all for governance reviews, product assessments, platform controls, and abuse testing — diluting the term to meaninglessness.

The phrase carries market credibility and executive legibility. It is used as shorthand for broad AI risk work because it is understood by leadership and valued in the market, regardless of what the actual role delivers.

201

AI Security-specific roles

2026 — highest avg breadth score of any bucket

28.8K

Software Engineering Roles

2026 comparison cohort

3.9K

Traditional Security Roles

2026 comparison cohort

Job volume by role classification bucket — 2026

AI Security-specific roles represent a small fraction of total security hiring but carry the highest average role breadth score — the clearest signal of the Frankenstein problem concentrated at the AI-labeled tier.

AI Security-specific roles number just 201 in the 2026 dataset against 3,894 traditional security roles and 28,768 software engineering roles. But those 201 roles carry the highest average Frankenstein score — 6.52 — of any role bucket. Organizations using precise AI security language are simultaneously writing the most over-scoped roles. The "AI red team" label is a primary driver: it is applied to adversarial prompt testing, product risk assessment, governance review, platform security architecture, and abuse testing — interchangeably, in the same posting, against a single hire. A real AI red team exercise is a hands-on adversarial evaluation against a deployed model or agent system: crafting inputs that elicit unsafe behavior under realistic operational conditions, testing authorization boundaries at inference time, exploiting RAG retrieval pipelines to extract out-of-scope context, mapping tool-calling attack paths through multi-step agentic workflows. This requires lab environments, reproducible finding formats, and evaluators who understand probabilistic failure modes — not governance documentation skills. What most "AI red team" postings actually describe is risk program management or product security review with an adversarial framing — a categorically different function requiring a categorically different profile.

"A real AI red team exercise means crafting adversarial inputs against a deployed model, testing authorization boundaries at inference time, and exploiting RAG retrieval pipelines under realistic attack conditions. Most 'AI red team' postings describe governance review with a red-team brand applied to it."

What leaders are misreading

Label precision is assumed from title vocabulary. "Red team" is assumed to mean active adversarial evaluation. It frequently means risk review with a red-team brand. The candidate who gets hired is selected for governance fluency; the organization then discovers it cannot conduct an adversarial AI exercise.

Failure mode if unaddressed

Organizations build "AI red team" programs that produce governance documentation rather than adversarial findings. The program exists; the adversarial capability does not. Security posture remains unmeasured at exactly the layer the label promised to test.

What this changes now

Require every "AI red team" role posting to name at least three specific adversarial exercise types the hire will execute — prompt injection chains, RAG exfiltration, agentic authorization bypass, jailbreak evaluation.
Separate governance review roles from active adversarial testing roles explicitly: different job family, different interview loop, different success criteria, different budget line.
Evaluate AI red team candidates with live adversarial exercises in realistic lab environments — not scenario recall, not governance case studies, not generic AppSec challenges.

Finding 13 · Legacy framework dominance

The Compliance Reflex

Legacy compliance frameworks dominate AI security hiring language by 39:1 versus AI-native governance frameworks — GDPR alone outweighs all AI-native frameworks combined.

Regulatory readiness programs were already funded and staffed when AI security emerged as a hiring need. Organizations reached for the framework vocabulary they already had — GDPR, HIPAA, SOC 2, FedRAMP — rather than developing AI-native governance language.

12.4K

Legacy framework signal jobs

GDPR · HIPAA · SOC 2 · FedRAMP · PCI DSS · NIST 800-53 — 2026

317

AI-native framework signal jobs

EU AI Act · NIST AI RMF · MITRE ATLAS · ISO/IEC 42001 — 2026

39:1

Legacy-to-AI-native ratio

GDPR alone (5.5K) vs all AI-native frameworks (317)

Named framework adoption in AI security hiring — 2026

Cyan = legacy compliance and privacy frameworks · Violet = AI-native governance frameworks. GDPR (5,461) alone exceeds EU AI Act + NIST AI RMF + MITRE ATLAS + ISO 42001 combined (317) by ~17×.

Named framework data makes the disparity concrete: GDPR appears in 5.5K job postings, HIPAA in 3.1K, SOC 2 in 1.9K, FedRAMP in 1.2K, PCI DSS in 694. Against this: EU AI Act appears in 189 postings, NIST AI RMF in 91, MITRE ATLAS in 22, ISO/IEC 42001 in 15. The 39:1 ratio is a direct measure of which vocabulary organizations actually use when staffing AI security. Legacy compliance frameworks produce familiar evidence artifacts — audit reports, attestation letters — that organizations already know how to generate. AI-native governance frameworks require evidence types most organizations cannot yet produce.

"GDPR in 5,461 postings. NIST AI RMF in 91. Organizations are using the vocabulary they already own to staff a risk they do not yet understand."

What leaders are misreading

Framework density in hiring language is interpreted as AI security readiness. A posting requiring GDPR and SOC 2 expertise is not an AI security posting. It is a compliance posting with an AI prefix.

Failure mode if unaddressed

Teams ship AI systems with governance narratives built on legacy framework compliance, while leaving AI-specific control surfaces — model behavior, agent authorization, prompt security — unmeasured and unevidenced.

What this changes now

Require at least one AI-native governance framework reference for every legacy compliance requirement in AI security roles.
Add agentic control and evaluation language as mandatory criteria in AI security requisitions.
Use framework mix as a hiring-quality diagnostic in your AI security program assessment.

Finding 14 · Incumbent tooling lock-in

The Tool Incumbency Trap

Traditional security tooling appears in AI security hiring at 8:1 versus AI-native evaluation and observability tooling — the tooling stack mirrors the compliance reflex exactly.

Organizations procure through existing trust paths and repurpose familiar tooling rather than adopting AI-specific testing stacks. Each individual decision is rational. The collective result is a security program whose tool vocabulary is calibrated for compliance audit, not AI behavioral risk.

3.7K

Detection & Response tool jobs

Splunk, Sigma, Falco, Elastic — 2026 corpus

552

AI-native eval + observability jobs

LLM observability (431) + model eval (94.0) + guardrails (27.0)

8:1

Traditional vs AI-native tool ratio

Category-level signal: detection/AppSec language vs LLM evaluation language

Named tool adoption in AI security hiring — 2026

Cyan = traditional security tools (SIEM, detection, AppSec) · Violet = AI-native evaluation and observability tools. Combined AI-native total (479) trails combined incumbent total by 4:1.

The named tool picture is unambiguous: Splunk appears in 1.1K AI security job postings, Falco in 485, Semgrep in 141, Snyk in 137. Against these: LangSmith in 260, Langfuse in 145, Ragas in 49, DeepEval in 25. At the category level the gap is larger — Detection and Response tooling totals 3.7K job mentions versus 552 for all AI-native evaluation and observability tools combined — 8:1. Splunk detects attacker behavior in logs. It does not evaluate model output distribution, detect prompt injection at inference time, or assess delegated-action authorization posture. These are structurally different capabilities requiring different tooling — and that tooling is almost absent from AI security hiring language.

What leaders are misreading

Tool familiarity is treated as AI risk coverage. SIEM and AppSec scanners address what attackers do to infrastructure. They do not address what AI systems do under adversarial prompt conditions or what agents do when tool-call authorization is absent.

Failure mode if unaddressed

Security programs build audit-ready portfolios with familiar tooling while systematically underinvesting in AI behavioral evaluation. The coverage gap is structurally invisible inside existing security tool dashboards — it will surface first as an incident.

What this changes now

Audit your AI security tooling explicitly against AI-specific threat coverage, not compliance or SIEM coverage.
Require at least one AI-native evaluation or observability tool in every AI security program's tooling baseline.
Track your traditional-to-AI-native tool ratio as a program maturity diagnostic — not just tool count.

Finding 15 · Early but accelerating risk surface

The Agentic Surface Emergence

Prompt injection, function calling, and tool calling security signals are still under 0.3% of all postings, but rising quickly from a near-zero baseline.

Deployment of tool-calling and function-calling systems is outpacing hiring-market adaptation. AI agents are being built and shipped; the hiring market does not yet have language for the controls those agents require.

278

Function Calling signal jobs

2026 — top agentic signal

258

Prompt Injection signal jobs

2026

1205

Total agentic surface signal jobs

Across all 9 tracked surfaces

Agentic attack surface signal job frequency

Total job postings containing each agentic attack surface signal. Combined total is under 1,300 mentions against 294K+ analyzed postings — but the growth rate is the signal.

Combined agentic attack surface signals total 1205 job mentions — approximately 0.24% of the analyzed corpus. Function Calling leads at 278, Prompt Injection at 258, Tool Calling at 236. Jailbreak (75), Model Drift (64), and Data Poisoning (54) follow. These numbers look small. They are the leading edge of a surface that is growing faster than the hiring market can name it. The appropriate response is not to wait until the numbers are large — by then the deployment gap will be measured in years, not months.

"Agentic systems in production are already at scale. Agentic security hiring is at near-zero. The deployment-to-hiring gap is measured in years, not quarters."

What leaders are misreading

Low absolute share is interpreted as low urgency. The urgency signal is in the deployment-to-hiring gap, not the absolute numbers. Agentic systems in production are already at scale; agentic security hiring is at near-zero.

Failure mode if unaddressed

Delegated-action controls lag deployment and become latent high-impact risk. By the time the hiring market normalizes agentic security vocabulary, the first generation of agentic systems will have been operating for years without appropriate controls.

What this changes now

Treat delegated-action authorization and rollback controls as near-term hiring priority — not future-state.
Add prompt injection and function calling scenarios to AI security interview loops now.
Instrument authorization, rollback, and audit evidence for all current agent workflows.

Vertical Intelligence

Ten industry verticals analyzed across AI security hiring signal, role breadth, and control specificity. Each profile represents the pattern visible in job description language — not company-level security posture.

AI security role breadth score by industry — 2026

Government and Defense leads in role breadth (6.79), meaning job descriptions bundle the most capability families. Manufacturing has the highest volume (100K+ jobs). Financial Services has the largest scoped-security hiring market.

Financial Services

5.44

breadth score
38.8K jobs · 505 cos

Signal: Control frameworks, model risk validation, regulatory vocabulary, auditability.

Blind spot: Weak AI-specific testing workflows and tooling language.

Hire for: Governance Evidence Lead + AI Product Security Engineer pairing.

Model RiskRegulatoryEvidence-first

Government & Defense

6.79

breadth score
30.5K jobs · 350 cos

Signal: RMF, ATO requirements, assurance controls, mission resilience, formal risk language.

Blind spot: Low specificity on contemporary AI evaluation and agentic control practices.

Hire for: AI Security Architect + ML Security Engineer pairing.

Compliance-heavyAssuranceRMF/ATO

Healthcare & Life Sciences

4.52

breadth score
19.8K jobs · 332 cos

Signal: PHI controls, HIPAA, clinical-risk vocabulary, sensitive data handling.

Blind spot: Low operational detail on adversarial testing, RAG validation, and safety/security integration.

Hire for: Hybrid profile: privacy control depth + practical AI abuse testing.

PHI/HIPAAClinical RiskPrivacy-dense

Manufacturing & Industrial OT

5.07

breadth score
100K jobs · 1672 cos

Signal: Reliability, continuity, OT context, safety and mission-impact language.

Blind spot: Generic LLM-attack narratives have low relevance; lifecycle and operational controls dominate.

Hire for: AI Security Architect with industrial systems depth.

OT/ICSOperational SafetyResilience

AI-Native Companies

5.00

breadth score
14.7K jobs · 218 cos

Signal: RAG access, agents, model APIs, evals, safety controls, rapid deployment.

Blind spot: Explicit mapping to formal control frameworks and evidence standards.

Hire for: Agent Security Engineer + Governance Evidence Lead early.

AgenticRAGFast-moving

SaaS & Cloud Software

4.26

breadth score
5.5K jobs · 273 cos

Signal: RAG access boundaries, tenant isolation, customer data controls, enterprise trust.

Blind spot: Role inflation mixing product security ownership with governance program design.

Hire for: AI Product Security Engineer + RAG Security Engineer as distinct mandates.

Multi-tenantProduct SecuritySaaS Trust

Cybersecurity Vendors

5.25

breadth score
1.3K jobs · 25 cos

Signal: Evaluation rigor, detection quality, customer assurance, field credibility.

Blind spot: Conflating adversarial testing, product assurance, and solutions-facing roles.

Hire for: Field-deployable AI security experts who can build and explain.

CredibilityAdversarialProduct Assurance

Insurance

5.09

breadth score
5.5K jobs · 93 cos

Signal: Governance, decision integrity, fraud risk, compliance obligations.

Blind spot: Low specificity on adversarial testing and practical red-team workflows.

Hire for: Model Risk Security Partner with explicit technical validation support.

Decision RiskFraudModel Governance

Retail & Ecommerce

6.29

breadth score
9.3K jobs · 108 cos

Signal: Fraud prevention, account abuse, customer-data protection, operational detection.

Blind spot: Architecture-level delegated-action controls under-specified.

Hire for: Agent Security Engineer + AI Product Security Engineer collaboration.

Fraud/AbuseCustomer TrustAgentic

Telecommunications

3.73

breadth score
2.7K jobs · 38 cos

Signal: Detection, automation, response, network resilience, high-volume control operations.

Blind spot: Under-specified governance-to-product execution pathways.

Hire for: AI AppSec Engineer + Agent Security Engineer with operations maturity.

ScaleDetectionOperations

Role Architecture Canon

Nine distinct AI security engineering archetypes, each with explicit mission scope, trigger conditions, boundary definitions, danger signals, and first-90-day deliverables. Use these as role design inputs — not job description templates.

AI security job volume by role family — 2026

How current hiring distributes across role families. ML/AI Engineering leads by volume (30K+) but carries lower AI-security specificity. The 9 archetypes below map to these families — each with a clean scope that the market rarely writes cleanly.

Which archetype do you need?

Answer one question about your primary use case. The right archetype follows directly.

Your primary use case	The one question to ask	Right archetype
Shipping AI features in a product	Are AI features secure before they reach production?	AI Product Security Engineer
Embedding AI security in the dev process	Do developers know how to write AI-safe code?	AI AppSec Engineer
Proving your AI systems are exploitable (before attackers do)	Can you run an adversarial exercise against your deployed model today?	AI Red Team Engineer
Securing AI agents that take real actions	Who owns authorization and rollback for delegated-action workflows?	Agent Security Engineer
Controlling what data RAG can surface	Can a query return data the user isn't authorized to see?	RAG Security Engineer
Producing evidence for governance obligations	Can you show the board a control that proves the policy works?	Governance Evidence Lead
Securing model development and deployment	Who owns artifact integrity and training pipeline security?	ML Security Engineer
Converting model risk into security controls	Do model risk assessments produce enforceable control requirements?	Model Risk Security Partner
Defining AI security standards across teams	Is there a shared trust model and control ownership map across all AI teams?	AI Security Architect

Risk domain ownership matrix

● Primary domain · ○ Contributing · — Not in scope. Use to identify coverage gaps and avoid ownership conflicts.

Archetype	Prompt Security	RAG / Retrieval	Agent / Action Auth	Model Lifecycle	Gov Evidence	Adversarial Testing	Product / SDLC	Architecture
AI Product Security Engineer	●	●	○	—	○	○	●	○
AI AppSec Engineer	●	○	—	—	—	●	●	—
AI Red Team Engineer	●	●	●	—	—	●	—	—
Agent Security Engineer	●	—	●	—	○	●	○	○
RAG Security Engineer	○	●	—	—	○	●	—	—
Governance Evidence Lead	—	—	—	○	●	—	—	○
ML Security Engineer	—	—	○	●	○	—	○	○
Model Risk Security Partner	—	○	—	●	●	○	—	—
AI Security Architect	○	○	○	○	●	—	○	●

AI Product Security Engineer

High demandSenior IC

Hire this if

You're shipping AI features and security reviews happen after merge, not before.

Secure AI-enabled product capabilities from design through release and post-release operation.

Boundary

Does not own enterprise-wide governance strategy or model lifecycle outside product scope.

Anti-pattern to avoid

Overloaded with policy ownership and customer-assurance narrative without implementation authority.

First 90-day outputs

Product threat model set, AI feature control backlog, release-gate checklist, customer assurance pack.

Danger signals in candidates

Claims broad AI governance ownership with no shipped product features; confuses adversarial testing with code review.

Interview question

Walk me through how you would threat-model a new AI feature that uses RAG to answer user questions about account data.

Strong answer

Identifies data boundary risks at retrieval, context injection at the prompt layer, and authorization failures on what data the RAG can surface — then produces a control backlog item, not just a risk list.

Weak answer

Describes generic OWASP Top 10 threats applied to the API layer without addressing the RAG-specific attack surface.

AI AppSec Engineer

High demandMid to Senior IC

Hire this if

You have AI features in the SDLC but your AppSec process has no AI abuse cases.

Integrate AI abuse patterns and controls into secure SDLC practice.

Boundary

Does not own broad AI governance program design.

Anti-pattern to avoid

Measured on generic vulnerability volume instead of AI-specific control outcomes.

First 90-day outputs

AI abuse-case library, secure coding guardrails, review workflow updates, remediation playbook.

Danger signals in candidates

Focuses only on code-level vulnerabilities; conflates static analysis with behavioral testing; no AI abuse-case library.

Interview question

What is the difference between a prompt injection vulnerability and an input validation vulnerability? How would you test for each?

Strong answer

Explains that prompt injection exploits the model's inability to distinguish instruction from data; designs a test sending adversarial instruction-breaking inputs, not just malformed strings.

Weak answer

Treats prompt injection as a type of XSS or SQL injection with LLM-specific syntax.

AI Red Team Engineer

Scarce — criticalSenior IC / Staff

Hire this if

You've built AI systems and need to know whether they're exploitable before adversaries do.

Execute adversarial evaluation against AI systems and corresponding controls.

Boundary

Does not own governance reviews or architecture assessments by default.

Anti-pattern to avoid

Labeled "red team" but scoped as policy review or general risk management.

First 90-day outputs

Adversarial scenario suite, reproducible finding format, retest protocol, control hardening recommendations.

Danger signals in candidates

Cannot name three specific adversarial exercise types; focuses on policy review; no hands-on adversarial lab experience.

Interview question

Describe a multi-step adversarial scenario you would run against a deployed RAG system to exfiltrate out-of-scope context.

Strong answer

Describes a prompt injection chain that escalates from query reformulation to system-role override, then retrieval boundary bypass to extract out-of-scope context; includes a reproducible finding format and retest protocol.

Weak answer

Describes "testing for jailbreaks" without a structured adversarial methodology or reproducible format.

Agent Security Engineer

Critical — emergingSenior IC / Staff

Hire this if

You have AI agents taking real actions — file writes, API calls, data access — and nobody has designed the authorization layer.

Secure delegated-action pathways for tool-calling and autonomous workflows.

Boundary

Does not own all conversational safety policy and UX moderation concerns.

Anti-pattern to avoid

Confined to prompt-layer defenses while action authorization is left undefined.

First 90-day outputs

Delegated-action threat model, authorization matrix, approval and rollback design, audit trail requirements.

Danger signals in candidates

Only talks about prompt-layer defense; doesn't distinguish chatbot safety from action authorization; no delegated-action threat modeling in portfolio.

Interview question

An AI agent has been granted file system access and API keys to three external services. Walk me through how you'd design the authorization model.

Strong answer

Proposes least-privilege action scoping per tool, approval thresholds for irreversible actions, audit trail for all delegated calls, rollback capability design; asks who reviews auth boundary changes.

Weak answer

Suggests restricting the system prompt and adding output filters; doesn't address action authorization or rollback at all.

RAG Security Engineer

High — growingSenior IC

Hire this if

You're serving user queries from a retrieval system backed by organizational data and you're not sure what data could leak.

Enforce retrieval integrity, context boundaries, and data-access control in RAG systems.

Boundary

Does not own full enterprise data-governance operations.

Anti-pattern to avoid

Treated as a prompt-engineering specialist instead of a retrieval-control engineer.

First 90-day outputs

Retrieval control architecture, access boundary tests, leakage detection checks, incident triage flow.

Danger signals in candidates

Treats RAG security as prompt engineering; can't explain retrieval boundary enforcement; no index access control design experience.

Interview question

How would you test whether a RAG system can be induced to surface documents a querying user isn't authorized to see?

Strong answer

Designs tests using queries that embed system-level keywords or other-user context; checks whether retrieved documents respect the same access controls as direct document access; tests for prompt-embedded retrieval manipulation.

Weak answer

Suggests adding output-side guardrails; doesn't test the retrieval boundary itself.

Governance Evidence Lead

High — regulatory pressureSenior IC / Manager

Hire this if

Your board or regulator is asking for AI risk evidence and your team produces policy documents, not control proof.

Translate governance requirements into verifiable engineering evidence.

Boundary

Does not own implementation of every control — owns the evidence standard.

Anti-pattern to avoid

Assigned reporting responsibility without authority to enforce evidence quality.

First 90-day outputs

Policy-to-control matrix, artifact taxonomy, reporting cadence, assurance quality gates.

Danger signals in candidates

Produces policies without engineering evidence; can't explain what an evidence artifact is; treats compliance attestation as equivalent to control proof.

Interview question

What is the difference between a policy, a control, and evidence? Give one concrete example of each for an AI governance program.

Strong answer

Policy: "Model outputs shall be monitored for harmful content." Control: "LLM observability tool logs all outputs with flagging." Evidence: "Monthly eval report showing flag rate, sample review, and remediation closure." Distinct and hierarchical.

Weak answer

Conflates policy language with evidence; calls a risk register "evidence"; cannot name a specific evidence artifact format.

ML Security Engineer

Growing — nicheSenior IC

Hire this if

You're building or fine-tuning models in-house and haven't addressed artifact integrity, training-data provenance, or deployment gate security.

Secure model development, packaging, deployment, and serving pathways.

Boundary

Does not own enterprise risk narrative or board reporting by default.

Anti-pattern to avoid

Reduced to inference-endpoint hardening while artifact lifecycle remains unmanaged.

First 90-day outputs

Model lifecycle control map, artifact integrity checks, deployment gate definitions, monitoring baselines.

Danger signals in candidates

Focused only on inference security; no MLOps pipeline knowledge; can't describe artifact integrity checks or model provenance verification.

Interview question

How would you design a release gate for a fine-tuned model being promoted from development to production?

Strong answer

Defines checks for: training data provenance, artifact hash verification, eval suite pass/fail, access control audit on who changed model weights, lineage documentation; mentions rollback protocol.

Weak answer

Describes a code review process; doesn't address model artifact integrity, training pipeline controls, or lifecycle-specific security requirements.

Model Risk Security Partner

High in regulated verticalsSenior IC / Director

Hire this if

You have a model risk or governance function but it doesn't produce enforceable security control requirements.

Convert model risk language into enforceable security control requirements.

Boundary

Does not own standalone product engineering roadmap.

Anti-pattern to avoid

Trapped in governance language without implementation pathways.

First 90-day outputs

Risk-to-control mappings, validation criteria, escalation thresholds, evidence requirements.

Danger signals in candidates

Can't translate a risk statement into a specific control requirement; conflates model performance risk with model security risk; produces risk ratings without remediation timelines.

Interview question

You've determined that a customer-facing credit decision model has a 12% adversarial perturbation success rate under targeted attacks. What do you do next?

Strong answer

Quantifies business impact threshold for acceptable perturbation rate; designs a monitoring requirement with a named threshold and escalation path; proposes control options with tradeoffs; writes the control into the risk register as a remediation requirement.

Weak answer

Reports the finding to the risk committee without a control design or remediation timeline.

AI Security Architect

High at scaleStaff / Principal

Hire this if

You're building multiple AI systems across teams with no shared trust model, control ownership map, or architecture standard.

Define secure reference architecture across AI systems, data paths, and delegated-action components.

Boundary

Does not own day-to-day control operations.

Anti-pattern to avoid

Architecture ownership without decision rights over implementation standards.

First 90-day outputs

Architecture baseline, trust-boundary definitions, control ownership map, design-review rubric.

Danger signals in candidates

Produces architecture diagrams without control ownership assignments; can't distinguish reference architecture from specific system design; no cross-team implementation authority.

Interview question

You've been asked to define a secure reference architecture for an enterprise that will build 30 AI-powered features across 12 product teams. Where do you start?

Strong answer

Starts with trust boundary mapping and shared control surfaces (auth, logging, model serving); identifies platform-level vs product-level controls; produces a control ownership map before any architecture diagram; designs the design-review process.

Weak answer

Jumps immediately to technology selection (which LLM, which vector DB); doesn't establish ownership or governance before building.

Boardroom-to-Backlog Playbook

Four audiences. Each with a distinct failure mode and a distinct set of decisions that change posture. The common thread: evidence artifacts are the connective tissue between strategy and execution.

Audience

CISO

Common mistakes

Assuming AI security ownership is implicit in existing team charters
Accepting policy language without evidence pathways
Treating one hire as a full operating model

Three decisions in 90 days

Assign explicit cross-functional AI security ownership with a named map
Approve role architecture before opening requisitions
Require evidence artifacts in quarterly AI risk reporting — not policy updates

Minimum evidence artifacts

AI security ownership matrix
AI risk-to-backlog register with named owners
Control evidence scorecard reviewed at the same cadence as policy reviews

Early warning signal

Risk narratives repeat in board decks while the control evidence scorecard remains static quarter-over-quarter.

Audience

Hiring Managers

Common mistakes

Writing Chimera Specs — one salary, five professions
Mixing incompatible seniority expectations within a single requisition
Using "AI red team" as undefined shorthand for any AI risk work

Three decisions in 90 days

Define one primary ownership domain per role in a single sentence without conjunctions
Separate required from adjacent capabilities — adjacent belongs in interview context, not requirements
Align the interview loop to the role archetype before the first screen

Minimum evidence artifacts

Role-boundary brief (one page maximum)
Interview rubric by archetype with scored criteria
First-90-day output checklist shared with candidates at offer

Early warning signal

Interview debrief feedback is high-variance across interviewers and not comparable. The loop is measuring different things.

Audience

Recruiters

Common mistakes

Title matching without execution evidence — "AI security engineer" in title is not AI security capability
Over-indexing on tool and framework keyword density
Ignoring adjacent-role transferability from ML engineering, product security, and AppSec

Three decisions in 90 days

Screen for artifact-backed signals — shipped work, not self-reported skills
Use archetype-specific scorecards calibrated with the hiring manager before sourcing
Calibrate weekly on top-of-funnel quality, not volume

Minimum evidence artifacts

Scored screening rubric by role archetype
Candidate evidence log with artifact references
Role-fit risk notes passed to interviewers with each advance

Early warning signal

High top-of-funnel volume with low technical conversion rate at first interview. The screening criteria are not calibrated to the actual role.

Audience

Practitioners

Common mistakes

Presenting broad capability claims without concrete shipped outputs or demonstrated AI-specific work
Under-communicating cross-functional collaboration value in interviews
Relying on traditional security credentials as AI security proxies — OSCP, CEH, and CISSP do not assess AI-specific attack surfaces

Three decisions in 90 days

Publish at least one public-safe execution artifact demonstrating AI security control design judgment (threat model, eval design, authorization architecture)
Build demonstrated competency in one primary archetype through hands-on practice — practical lab environments and adversarial AI exercises before interview season
Practice executive risk translation anchored in technical evidence — the skill that differentiates senior AI security candidates from senior security candidates with an AI prefix

Minimum evidence artifacts

Hands-on AI security exercise completions (adversarial prompting, RAG boundary testing, agentic control design)
Control design examples with outcome context — not just "did X" but "designed X and it changed Y"
Cross-functional delivery proof: evidence you shipped a control that engineering, product, and governance all accepted

Early warning signal

Strong technical screens, weak role-fit decisions. The gap is executive communication and organizational translation — and the inability to demonstrate AI-specific work beyond claimed familiarity.

For service providers & MSSPs

The 15 findings in this report describe a hiring market with a structural gap between exposure and staffing capacity. MSSPs and managed AI security service providers are positioned to fill this gap — but only with service delivery models explicitly designed for AI-specific control surfaces, not extensions of legacy compliance audit or traditional SOC offerings.

Client Segment	Primary Exposure	Service Model
Mid-market (500–5K employees)	AI feature deployment without dedicated AI security staffing	Fractional AI Security Lead + quarterly control evidence review
AI-native companies (pre-IPO)	Rapid agentic deployment, RAG boundary risk, no governance evidence	Agent Security Engineering retainer + evidence artifact program
Enterprise (existing program)	Compliance-reflex programs missing AI-native control coverage	AI security program gap assessment + AI-native framework overlay
Regulated industries (FSI, Healthcare)	Evidence gap — governance language without evidence artifacts	Governance evidence production + board-reportable control scorecard

The service provider hiring signal: Only 4 job postings in the 2026 corpus explicitly reference security training platform experience. Practical skills demonstration platforms (adversarial AI labs, cyber range environments, hands-on LLM security exercises) represent an emerging assessment infrastructure gap — the first providers to standardize AI security practical assessment will define the evaluation standard for the discipline.

What you wrote vs. what you actually need

Common job description language and what it actually signals about the role you're trying to fill.

What you wrote in the JD	What you actually need	Right archetype
"Own the AI security program end-to-end"	A team charter, not a single role. Define one anchor domain first.	Start with AI Security Architect, then add specialists
"AI red team and governance program lead"	Two incompatible full-time profiles compressed into one requisition.	AI Red Team Engineer + Governance Evidence Lead — separate reqs
"Prompt engineering and AI security testing"	Behavioral red-teaming is not prompt engineering. The skillsets don't overlap.	AI Red Team Engineer with adversarial lab methodology
"GDPR, HIPAA, SOC 2 compliance for AI systems"	Legacy compliance ≠ AI security engineering. You're building a compliance posture, not an AI control posture.	Governance Evidence Lead — or reclassify as a compliance role
"Secure our LLM-powered chatbot"	Depends on "secure" — data access risk vs conversational safety vs agentic actions are different scopes.	RAG Security Engineer (data leak) or Agent Security Engineer (actions)
"ML pipeline security and model governance"	Lifecycle control and risk reporting are separate ownership surfaces.	ML Security Engineer + Model Risk Security Partner — sequential or parallel
"Experience with Splunk and AI threat modeling"	Detection tooling and AI behavioral risk are different coverage planes requiring different profiles.	Two roles: traditional security + AI Product Security Engineer
"5+ years experience in AI security"	The discipline is 3 years old at scale. You're describing a market that doesn't exist yet.	Rewrite as capability-domain experience, not time-in-title

Before you post — 6-item pre-posting checklist

Can you state this role's primary ownership domain in one sentence without using the word "and"?

If you need "and," you have two roles. Split them or choose one.

Have you named the three specific deliverables this hire will produce in the first 90 days?

If you can't name them, the role scope isn't defined yet.

Have you removed experience requirements that describe tenure in a discipline younger than the required years?

AI security at scale is ~3 years old. "7 years in AI security" screens out everyone.

Does your interview loop include at least one AI-specific adversarial scenario or control design exercise?

If not, you're selecting on legacy security credentials, not AI security capability.

Does the JD include at least one AI-native framework reference (NIST AI RMF, EU AI Act, MITRE ATLAS)?

GDPR-only is a compliance posting, not an AI security posting.

Does this role map to a single archetype from the nine defined in §04?

If it spans more than one, it's a Chimera Spec. Redesign before posting.

The 30-day test — what any good hire delivers in month one

Has produced one named AI security deliverable — threat model, control design, eval output, or policy-to-control mapping.

Has met every team that contributes to or depends on AI security coverage. Can name each team's primary AI security gap.

Can name the three highest-priority AI security gaps in the current program — in order, with rationale.

Has identified at least one gap in current evidence artifacts for board-reportable AI risk. Has proposed a format to close it.

Has proposed a 90-day roadmap with named owners and at least two measurable outputs. Has shared it with the hiring manager.

Role design red flags — quick reference

Any single job description containing three or more of the following signals is exhibiting the Frankenstein pattern. Use this as a pre-posting checklist.

Red Flag Signal	What it indicates	The design fix
Five or more "and" connectives in requirements	Chimera Spec — multiple capability families compressed into one role	Define one primary ownership domain. Move secondary capabilities to "preferred" or adjacent team scope.
"AI red team" without named exercise types	Red Team Misnomer — label applied to undefined scope	Name three specific adversarial exercise types the hire must execute. If you can't name them, you need a governance reviewer, not a red teamer.
GDPR + HIPAA + SOC 2 with no AI-native framework	Compliance Reflex — legacy vocabulary without AI-specific coverage	Add at least one AI-native governance reference (NIST AI RMF, EU AI Act, MITRE ATLAS) or reframe as a compliance role, not AI security.
"Experience with Splunk/SIEM" as primary AI security tool	Tool Incumbency Trap — runtime detection coverage mistaken for AI behavioral risk coverage	Add at least one AI-native evaluation or observability tool requirement. Distinguish infrastructure security from AI behavioral security.
5+ years experience in a discipline that is 2 years old	Unicorn Index — experience requirements impossible to satisfy at stated scope	Rewrite experience requirements around capability domains and demonstrated outputs, not time-in-market.
"Responsible for building and maintaining the AI security program" as a single-contributor role	vCISO Vacuum in reverse — team-shaped mandate on individual budget	Scope the first hire to one anchor deliverable. Use a phased staffing plan, MSSP support, or fractional coverage for the remaining program surface.

AI Security Tools Intelligence

Tool-market mapping complements job-description intelligence by showing the ecosystem teams actually evaluate and deploy. This section provides public-safe aggregate coverage metrics and directional product-landscape context.

145

Tools in manifest

Curated market coverage across AI security workflow categories

Distinct categories

Spanning evaluation, observability, governance, red-team, and control tooling

n/a

Sample average rating

Directional quality signal from available rated entries only

The tools layer is ecosystem intelligence, not endorsement. Coverage is designed to support CISO vendor scanning, hiring-manager tooling literacy, and practitioner comparison workflows. For interactive exploration, use the tools directory at /tools.

External Validation Signals

A market thesis derived from a single source is an opinion. This section is the evidence layer: seven independent signal sources — practitioner surveys, academic research, open-source builder activity, industry media, public knowledge codification, vulnerability intelligence, and framework-control intelligence — each arriving at the same conclusion through entirely different mechanisms. The hiring corpus tells you what companies say they need. These signals tell you what researchers are studying, what builders are shipping, what practitioners are experiencing, what the press is amplifying, where exploit pressure is visible in public disclosure, and how control frameworks map in practice. When all seven describe the same structural gap, the convergence is the argument.

Evidence layer at a glance

27%

No clear AI security owner

Survey · 57% recognize it as a distinct discipline · avg maturity 2/5

2,730

arXiv papers analyzed

Top term: privacy-preserving · 67 papers

2,500

GitHub repos tracked

27K+ event proxies · top topic: vellum-ai/vellum-assistant

613K+

Media items classified

37,172 mapped to AI security taxonomy

1,458

AI-relevant vulnerability records

3 CISA KEV overlaps · top bucket ai ml framework library vulnerability

Frameworks tracked

42 directional crosswalk mappings

1,458

AI-relevant CVEs/advisories

From 26K+ total records · 3 CISA KEV entries

378

Top domain: ai ml framework library vulnerability

Largest AI security vulnerability classification bucket

Primary Research — Practitioner Survey

The survey layer is the only signal in this report that asks practitioners directly: what are you experiencing? Four survey instruments — CISOs and security leaders, AI security practitioners, recruiters and hiring managers, and adjacent security engineers — plus a flash assessment collected first-person responses across the same dimensions the hiring corpus measures at a distance. The results are striking for how closely they mirror the corpus signal. 27% report no clear AI security owner in their organization — the practitioner's version of the ownership vacuum Finding 07 documents in job description language. Self-reported program maturity sits at 2/5 (emerging band), confirming that executive awareness has outrun engineering delivery, not the reverse. 57% of respondents recognize AI Security Engineering as an emerging distinct discipline, yet describe hiring practices built on AppSec and GRC assumptions. The most-cited practitioner risk — Data leakage via AI at 38% — maps directly to the agentic attack surface language in 0.24% of job postings. Practitioners can name the threat. The hiring market has not yet built the role that owns it.

Top practitioner-cited AI security risks — cross-persona survey

arXiv — Research Momentum

Academic research is an independent leading indicator: what researchers publish today becomes practitioner vocabulary in 12–24 months, and hiring language in 24–36. The arXiv signal measures where that leading edge is. A seeded metadata pull across eight AI security taxonomy buckets — analyzing 2,730 papers by title, abstract, and category — surfaces the same domain topology as the hiring corpus, without any shared data. Top matched term: privacy-preserving (67 papers). Largest classified research bucket: prompt and generation security. The most recent month in the dataset (2026-05) shows 1036 papers — part of a sustained acceleration trajectory. This is not coincidence. Researchers are studying the problems that practitioners cite as critical risks and that hiring language is only beginning to name. The gap between academic term frequency and hiring-language adoption is the predictive signal: the concepts with high arXiv velocity but low hiring-corpus density are the next generation of required skills.

arXiv matched term frequency — seeded AI security pull

GHArchive — Builder Ecosystem

Open-source activity is the market's most honest signal: builders allocate time to things they believe matter, independent of employer mandate or press framing. The GHArchive signal tracks 2500 classified GitHub repositories across eight AI security domains — 26,662 total event proxies (stars, forks, pull requests, issues) from Unknown-dominated tooling. The result: active builder ecosystems exist across every AI security taxonomy bucket without exception. No domain is purely academic. The top topic tag is vellum-ai/vellum-assistant, consistent with Finding 06 (Agentic Anarchy) and Finding 15 (The Agentic Surface Emergence) — the threat the hiring market has barely noticed is the one that builders have been building defenses for. Governance/assurance engineering and detection/runtime monitoring show the highest unique-actor density, which also maps to the governance hiring signal dominance documented in Finding 08 (Boardroom-to-Backlog Gap). Builders are ahead of the hiring market in every domain where the hiring market lags. That is the normal order — until organizations start hiring to catch up.

Unique GitHub contributors by AI security domain — GHArchive

Active GHArchive repos by month

GHArchive event type mix (AI-security-scoped stream)

Collaboration intensity by bucket (avg actors per repo)

Review pressure by bucket (review-to-push ratio)

Event concentration risk by bucket (top-10 share)

Classifier evidence strength by bucket

Control artifact signals by bucket

Release cadence by bucket

Media — Industry Coverage

Industry media shapes board-level mental models before any hiring signal reaches them. 613,416 items from aggregated RSS/Atom feeds — major tech media, security outlets, AI lab blogs — classified against the same AI security taxonomy. Of the classified volume (6.1% of total; the remainder is general tech coverage without AI security signal), AI model research and AI cyber defense dominate, with secure AI SDLC and governance following. This distribution explains a specific failure mode documented in Finding 13 (The Compliance Reflex) and Finding 08 (Boardroom-to-Backlog Gap): when boards read about AI security, they read about model capability risks and regulatory frameworks — not about control engineering, evidence artifacts, or ownership structures. The media vocabulary that reaches boardrooms is calibrated for awareness, not for the operational decisions that CISOs and hiring managers actually need to make. The result is governance language that drives policy creation without driving control creation. The hiring corpus reflects the same bias: governance-shaped role language crowding out engineering-shaped role language, in the same proportion media shapes board perception.

Media volume by AI security theme — classified items only

Wikimedia — Knowledge Codification

Public knowledge codification is a discipline-maturity clock. When a concept acquires a Wikipedia article, a Wikidata entity, a taxonomy entry in public knowledge graphs, it has crossed from practitioner jargon into institutional vocabulary. The codification lag — the delay from practice emergence to public encoding — is an independent proxy for how far behind educational infrastructure, recruiting knowledge, and junior candidate preparation sit relative to the frontier. AI security subfields with strong codification coverage signal that knowledge infrastructure exists to train and credential the next generation. Subfields with thin codification are where hiring managers are demanding senior experience for problems that have no curriculum, no certifications, and no standardized vocabulary yet — which is the direct precondition for Finding 11 (Entry-Level Extinction). The discipline cannot build junior pipelines for concepts that are not yet in the knowledge infrastructure that junior candidates use to learn and recruiters use to screen.

Vulnerability Intelligence — CVE & Advisory Disclosures

Public vulnerability disclosures are the market's most unambiguous signal: when a CVE is published against an AI/ML product or framework, the attack surface is no longer theoretical. This layer aggregates 26K+ records from NIST NVD, GitHub Advisory Database (GHSA), and OSV.dev, classified for AI/ML relevance using a two-stage pipeline: product/package name matching against a 35+ package dictionary, followed by keyword-weighted scoring across 21 semantic buckets from the MITRE ATLAS taxonomy. Of 26K+ total records, 1,458 were classified as AI-relevant at confidence ≥ 0.5. 3 appear in the CISA KEV — confirming active exploitation in the wild. The dominant classification bucket is ai ml framework library vulnerability (378 records), directly mapping to the LLM application frameworks that practitioners cite as their top risk surface and that arXiv research focuses on most intensely.

The vulnerability signal is structurally different from the other signal layers: it does not measure what people say they need, what researchers are studying, or what builders are shipping. It measures where attackers have already found exploitable weaknesses. When the CVE distribution aligns with practitioner-cited risks and arXiv research focus across independent classification taxonomies, the convergence moves from directional to evidential. The exploits exist. The question is whether the organizations deploying AI have staffed the roles capable of remediating them — and the hiring corpus documents the answer.

Framework Intelligence — Control and Mapping Coverage

Framework intelligence adds a different validation lens: not attack pressure, but control-language interoperability. This layer tracks public framework assets across MITRE ATLAS, NIST AI RMF, OWASP LLM Top 10, and related governance references; currently 8 frameworks with 42 directional crosswalk mappings. The coverage split — 3 machine-readable vs 5 document-only frameworks — is itself operationally important. Teams cannot automate control validation where framework assets are only narrative text. Crosswalk density and domain coverage show where organizations can build control traceability today versus where taxonomy translation is still heuristic. This is directional signal, not official equivalence mapping.

Framework crosswalk coverage by control domain

Framework source retrieval status

What convergence means for claim confidence

Each of these seven signal layers is independent. They share a taxonomy, but not a dataset, a methodology, or an institutional source. The hiring corpus comes from employer language. The survey comes from practitioner experience. arXiv comes from academic attention. GHArchive comes from builder behavior. Media comes from editorial selection. Wikimedia comes from knowledge community consensus. Vulnerability intelligence comes from public CVE/advisory disclosures. Framework intelligence comes from public control-framework assets and directional crosswalk analysis. When seven independent systems all map the same topology of problems — agentic surface exposure with no control ownership, governance vocabulary without engineering delivery, discipline emergence without junior pipeline — that convergence is not confirmation bias. It is structural evidence. The central claim of this report is not that the hiring corpus suggests a gap. It is that seven independent systems are measuring the same gap from seven different angles, and the measurements agree.

Glossary

Twelve AI security terms every hiring team needs to know. These are the vocabulary gaps between what hiring managers write and what AI security engineers actually do.

Prompt Injection

An attack where adversarial text in the input causes a model to follow attacker instructions rather than the application's system prompt. Distinct from input validation attacks — it exploits the model's inability to reliably separate instruction from data.

Why it matters: affects any user-facing AI system. Cannot be fully mitigated by output filters alone.

RAG (Retrieval-Augmented Generation)

Architecture where the model retrieves relevant context from an external data store before generating a response. Creates retrieval boundary and data-access control risks that standard chatbot security does not address.

Why it matters: the retrieval layer can surface documents the querying user isn't authorized to see.

Agentic AI

An AI system that takes actions — calls APIs, writes files, executes code, sends messages — rather than only generating text responses. Requires action authorization controls, rollback capability, and audit trails that conversational AI does not require.

Why it matters: a chatbot saying the wrong thing is reputational risk. An agent taking the wrong action is operational risk.

Jailbreak

A prompting technique that causes a model to violate its alignment training or application-level guardrails. Distinct from prompt injection — it exploits the model's own reasoning rather than instruction/data confusion.

Why it matters: jailbreak-resistant systems require architectural controls, not just content filters.

Model Evaluation (Eval)

A structured assessment of model behavior across a defined test suite. The primary evidence artifact for demonstrating that a model meets safety, quality, and security specifications. Not a one-time test — an ongoing process.

Why it matters: the difference between "we believe the model is safe" and "we have evidence the model is safe."

NIST AI RMF

NIST AI Risk Management Framework (2023). The primary US government AI governance standard. Organized around four functions: Govern, Map, Measure, Manage. The AI-native counterpart to NIST 800-53 for traditional IT security.

Why it matters: 91 job postings cite it vs 5,461 for GDPR — the adoption gap tells you exactly how far most programs lag.

MITRE ATLAS

Adversarial Threat Landscape for AI Systems. The AI-native counterpart to MITRE ATT&CK. Catalogs adversarial ML attack techniques including model evasion, data poisoning, and model inversion. Used for adversarial red team exercise design.

Why it matters: if your red team doesn't reference ATLAS, they're probably not running AI-specific exercises.

EU AI Act

European Union AI regulatory framework. Establishes risk classification tiers: unacceptable, high-risk, limited risk, minimal risk. High-risk AI systems face mandatory transparency, documentation, and human oversight requirements.

Why it matters: affects any company serving EU customers. Effective 2024, enforcement rolling through 2026–2027.

Delegated Action

An action taken by an AI agent on behalf of a user or system, via tool calling or function calling. The security-critical concept: delegated actions may be irreversible and require explicit authorization scoping that conversational responses do not.

Why it matters: authorization for delegated actions is the #1 agentic control gap in the 2026 corpus.

Governance Evidence

A verifiable artifact — eval output, telemetry log, remediation closure record — that demonstrates a stated control is operating as claimed. Distinguished from a policy document, which states what should happen but does not prove it does.

Why it matters: boards are asking for evidence; most programs produce policies. The gap is the Governance Evidence Lead's entire mandate.

Frankenstein Role

A job description that bundles five or more historically separate capability families into one requisition at one salary. The primary driver of chronic mis-hiring, scope collapse after onboarding, and low tenure stability in AI security.

Why it matters: the corpus-wide average breadth score is 5.21. Almost no organization is exempt.

Chimera Spec

A job description with team-shaped capability requirements — spanning multiple ownership domains — at individual contributor budget and compensation. Related to the Unicorn Index finding. Not a talent shortage problem; a role-design failure.

Why it matters: the fix is role architecture, not compensation. Paying more for the unicorn doesn't make the spec possible.

Methodology & Claim Boundaries

This section defines what this report can and cannot claim, how each signal layer was constructed, and the limits within which its evidence is reproducible. These are not disclaimers — they are operational guardrails. A finding is only as useful as the clarity about what data produced it and what inferences it can support.

Data sources & multi-signal approach

This report triangulates across seven independent signal layers: (1) ATS job corpus — 293,846+ job descriptions from 5,350 companies spanning 2013–2026, with primary analysis weight on the 241,553 postings from 2026; (2) Primary survey research — four survey instruments across CISOs, practitioners, recruiters, and adjacent engineers; (3) arXiv research momentum — seeded metadata from 903+ academic papers; (4) GHArchive builder ecosystem — 2,500 classified repos and 26,662 scoped event proxies in the current ingest window; (5) Media/news corpus — 776K+ items from aggregated RSS/Atom feeds; (6) Wikimedia knowledge codification — concept maturation tracking via public knowledge artifacts; (7) Vulnerability intelligence — normalized CVE/advisory streams with AI-relevance classification and KEV cross-reference. Each signal layer is independently classified against the same AI security taxonomy.

ATS corpus & primary data collection

The ATS job corpus is based on structured analysis of 293,846+ job descriptions (5,350 companies) collected from major applicant tracking systems (ATS) spanning 2013–2026, with primary analysis weight on the 241,553 postings from the 2026 period where AI security signal density is highest. The corpus includes roles across the full AI security adjacent spectrum: explicitly labeled AI security roles, traditional security roles (AppSec, cloud security, penetration testing), software engineering roles with security signals, and general software engineering roles used as a baseline comparator. Job descriptions were deduplicated by content hash and normalized for signal extraction.

AI-native companies, cybersecurity vendors, and financial services are over-represented in the hiring corpus relative to their share of total employment — reflecting their higher job posting volume and ATS adoption rates. Findings account for this concentration where relevant.

Signal extraction & taxonomy

Signal Domain	What it measures	Coverage
Role breadth (Frankenstein score)	Distinct capability families bundled in a single posting — product security, governance, adversarial testing, lifecycle control, agentic controls	All 294K+ postings
Framework mentions	Named governance, compliance, and AI-native security frameworks cited in role language (GDPR, NIST AI RMF, EU AI Act, MITRE ATLAS, ISO 42001, etc.)	All 294K+ postings
Tool mentions	Named security and AI-native tools cited in role language — detection, AppSec, LLM observability, model evaluation, guardrails	All 294K+ postings with quality filters
Agentic surface signals	Vocabulary specific to agent-layer attack surfaces: prompt injection, function calling, tool calling, RAG boundary, jailbreak, model drift, data poisoning	All 294K+ postings
Evidence language	Governance obligation language vs. evidence-producing language (telemetry requirements, eval output requirements, remediation proof, attestation)	Framework-tagged postings
Seniority distribution	Role-family breadth scores stratified by seniority level	AI security-labeled roles

Tool signal quality note: Tool mention signals use substring dictionary matching. Signals with high false-positive risk (where the tool name appears as a substring of common English words) are excluded from named tool analysis. Category-level tool signals aggregate across multiple tools and are more robust than individual tool name signals.

Claim boundaries

Claim Level	Definition	Use
Public Claim Ready	Direct quantitative signal from job description corpus. Reproducible from the extraction taxonomy.	External report, media, decks. Cite with stated corpus caveats.
Public Claim with Caveat	Corpus pattern analysis. Directional signal, not precise measurement. Individual-company variation is real.	Directional assertions, with explicit "based on hiring language" qualifier.
Internal Hypothesis Only	Inferred or extrapolated beyond corpus evidence. Not validated by job description signal.	Research agenda, further study. Not suitable for external publication.

What this data can support: Market-level patterns in hiring language · Framework and tool adoption signals · Role breadth and capability-family trends · Year-over-year signal growth rates

What this data cannot support: Company-level security maturity assessments · Individual practitioner capability evaluations · Proof that any company has or lacks any specific security control · Claims about actual deployed AI systems or security incidents

Sponsor independence: Research methodology, signal taxonomy, and findings are independent of sponsor involvement. Sponsors receive access to findings; they do not influence finding selection, framing, or data interpretation.

aisecurity.llc research

The State of AI Security Engineering

This report provides job-description intelligence and aggregate benchmark signals. Findings reflect role-language evidence, not company-level maturity proof.

What Companies Really Mean by “AI Security Engineer”

Contributor notes for the 2026 report

Alex Eisen

Alon Braun

Tim Kerimbekov

Dorina Miroyannis

Executive Summary

The 15 Findings

The Frankenstein Role

Skill Washing

The Unicorn Index

The Probability Pivot

The Evidence Gap

Agentic Anarchy

The vCISO Vacuum

Boardroom-to-Backlog Gap

Skills Validation Gap

Model Supply Chain Blind Spot

Entry-Level Extinction

The Red Team Misnomer

The Compliance Reflex

The Tool Incumbency Trap

The Agentic Surface Emergence

Vertical Intelligence

Role Architecture Canon

Boardroom-to-Backlog Playbook

AI Security Tools Intelligence

External Validation Signals

Glossary

Methodology & Claim Boundaries

What Companies Really Mean by
“AI Security Engineer”