Back to Resources

Agentic AI Security Evaluation Checklist

A comprehensive technical framework for assessing the security posture of autonomous AI deployments. Evaluated across 4 mission-critical domains.

Compliance Score0%

0 of 12 controls verified

Identity & Access Control

Just-In-Time (JIT) Identity

Token TTL < 1 hour

Are agents assigned ephemeral, scoped identities rather than persistent long-lived tokens?

Least-Privilege Tooling

OWASP Top 10 for LLM: L1

Are tools restricted by granular ACLs (e.g., read-only Slack access, scoped file paths)?

Privileged Action Gates

MFA-integrated approval

Do high-impact actions (deleting data, moving funds) require explicit human approval via CLI/UI?

Runtime Governance

Syscall Interception

eBPF / Seccomp level monitoring

Are agent-spawned processes monitored at the kernel level for unauthorized activity?

Airlocked Execution

Memory-isolated sandbox

Does the agent execute code in a transient, network-isolated container (Firecracker/gVisor)?

Real-time Interception

Kernel-mode latency < 10ms

Can the security layer block a command (e.g., `rm -rf /`) in < 5ms before execution?

Data Leakage Prevention (DLP)

Neural PII/Secret Detection

DistilBERT/RoBERTa backed NER

Does the system intercept outbound tokens to LLMs and redact PII/Secrets in real-time?

Semantic Context Redaction

Preserved structure vs. raw blocking

Are redactions contextual (e.g., redacting credit card numbers while keeping the structure)?

Sovereign Egress Filtering

mTLS + IP Whitelisting

Is data transmission restricted to authorized API endpoints and VPC-locked providers?

Audit & Forensic Logs

Immutable Token Audit

Write-once/Read-many storage

Is there a non-repudiable log of every prompt, tool call, and response generated by the agent?

Behavioral Drift Analysis

Anomaly detection > 3 z-score

Does the system alert when an agent's command frequency or API usage deviates from baseline?

Chain of Custody

Forensic tracing < 15 mins

Can you trace a compromised secret back to the specific agent and prompt that leaked it?

Need a formal evaluation?

Our engineering team provides deep technical audits for enterprises deploying autonomous agents in mission-critical contexts.

Request Technical Audit