Phase 12 of 12 · Security Operator
Agent Governance & Security
Phase 12 is the work of defining what agents can do and under whose authority, where humans set threat models, least privilege, and audit trails at the tool boundary.
Define what agents can do, under whose authority, with which guardrails and audit trail.
Decision rules
Each rule connects a real situation to the skill or playbook that fits it. Linked terms open canonical sources.
| Situation | Missing skill | Recommended playbook | Alternatives | Why |
|---|---|---|---|---|
| Agent-generated code is shipping to production without a security pass on it. | Automated security scanning | codex-security:security-scan | Snyk / Semgrep | Codex-security:security-scan is tuned for agent-authored diffs and blocks on high-severity findings; Snyk and Semgrep are general-purpose and need policy work to be as strict. |
| An agent can call any tool it has access to using the user's full identity. | Threat modelling | codex-security:threat-model | STRIDE workshop | Codex-security:threat-model defines agent identity, scope and policy at the tool boundary; a STRIDE workshop is the broader org-wide exercise when multiple systems are in scope. |
| Production agents have tool access in place but no runtime policy enforcing it. | Runtime guardrails | Kong Agent Gateway | AWS Bedrock Guardrails | Kong Agent Gateway sits between agent and tools and denies by default; Bedrock Guardrails is the right pick when the stack is already on AWS and Bedrock is the inference layer. |
| An agent reads external content as part of its job and could be hijacked through it. | Prompt injection testing | Prompt injection defense | Lakera / HiddenLayer | The prompt-injection-defense playbook tests indirect injection before launch as part of the eval suite; Lakera and HiddenLayer are runtime services that catch attacks in production but don't replace the pre-launch test. |
Watch
Reality
Write-access agents and MCP-style tool use create confused-deputy and indirect prompt-injection risks that standard IAM does not fully solve.
Required skills
- Agent threat modelling
- Least-privilege tool design
- Prompt injection testing
- Policy-as-code review
- Audit trail design
Viable tools
Failure modes
- Confused deputy
- Indirect prompt injection
- Overprivileged agents
- Missing audit trail
Next operating step
Set controls at the tool boundary: agent identity, least-privilege permissions, policy checks, sandboxing, prompt-injection tests, and audit trails.
Working through Agent Governance & Security?
I advise teams on this part of the lifecycle. Get in touch → if you want a direct, vendor-free conversation about what's worth doing next.