Phase 12 of 12 · Security Operator

Agent Governance & Security

Phase 12 is the work of defining what agents can do and under whose authority, where humans set threat models, least privilege, and audit trails at the tool boundary.

Define what agents can do, under whose authority, with which guardrails and audit trail.

Decision rules

Each rule connects a real situation to the skill or playbook that fits it. Linked terms open canonical sources.

Decision rules for Agent Governance & Security
Situation	Missing skill	Recommended playbook	Alternatives	Why
Agent-generated code is shipping to production without a security pass on it.	Automated security scanning	codex-security:security-scan	Snyk / Semgrep	Codex-security:security-scan is tuned for agent-authored diffs and blocks on high-severity findings; Snyk and Semgrep are general-purpose and need policy work to be as strict.
An agent can call any tool it has access to using the user's full identity.	Threat modelling	codex-security:threat-model	STRIDE workshop	Codex-security:threat-model defines agent identity, scope and policy at the tool boundary; a STRIDE workshop is the broader org-wide exercise when multiple systems are in scope.
Production agents have tool access in place but no runtime policy enforcing it.	Runtime guardrails	Kong Agent Gateway	AWS Bedrock Guardrails	Kong Agent Gateway sits between agent and tools and denies by default; Bedrock Guardrails is the right pick when the stack is already on AWS and Bedrock is the inference layer.
An agent reads external content as part of its job and could be hijacked through it.	Prompt injection testing	Prompt injection defense	Lakera / HiddenLayer	The prompt-injection-defense playbook tests indirect injection before launch as part of the eval suite; Lakera and HiddenLayer are runtime services that catch attacks in production but don't replace the pre-launch test.

Watch

Engineering practices that make coding agents work

Simon Willison · Pragmatic Engineer Summit · 2026-02 · 30 min · 38k views

Reality

Write-access agents and MCP-style tool use create confused-deputy and indirect prompt-injection risks that standard IAM does not fully solve.

Required skills

Agent threat modelling
Least-privilege tool design
Prompt injection testing
Policy-as-code review
Audit trail design

Viable tools

Failure modes

Confused deputy
Indirect prompt injection
Overprivileged agents
Missing audit trail

Next operating step

Set controls at the tool boundary: agent identity, least-privilege permissions, policy checks, sandboxing, prompt-injection tests, and audit trails.

Working through Agent Governance & Security?

I advise teams on this part of the lifecycle. Get in touch → if you want a direct, vendor-free conversation about what's worth doing next.