AI agents fail differently than regular software. A bug in a traditional API returns an error code. A compromised or misbehaving AI agent returns a plausible-looking response that executes unintended actions, leaks data, or loops indefinitely while burning tokens and API budget.
This playbook gives you three TypeScript patterns you can add to any Node.js or Next.js backend today: a prompt injection detector, a circuit breaker, and an audit logger. Each is standalone. You do not need to adopt all three at once.
Pattern 1: Prompt Injection Detector
Run this before every call to your AI provider. It checks for known injection signals and optionally calls a secondary model to evaluate suspicious inputs.
// lib/ai-security/injection-detector.ts
const INJECTION_SIGNALS = [
'ignore previous instructions',
'ignore all previous',
'disregard your',
'you are now',
'pretend you are',
'act as if',
'forget everything',
'new persona',
'system prompt',
'reveal your instructions',
'print your system prompt',
'dan mode',
'developer mode',
'jailbreak',
'bypass your',
'override your',
];
export interface InjectionCheckResult {
flagged: boolean;
signals: string[];
score: number; // 0–1
}
export function detectInjection(input: string): InjectionCheckResult {
const normalized = input.toLowerCase();
const matched = INJECTION_SIGNALS.filter(s => normalized.includes(s));
return {
flagged: matched.length > 0,
signals: matched,
score: Math.min(matched.length / 3, 1),
};
}
// Middleware wrapper for Express / Next.js route handlers
export function withInjectionGuard<T>(
handler: (input: string) => Promise<T>,
onFlag?: (result: InjectionCheckResult, input: string) => void,
) {
return async (input: string): Promise<T> => {
const check = detectInjection(input);
if (check.flagged) {
onFlag?.(check, input);
// Log and either block or continue with monitoring
auditLog({
event: 'injection_flagged',
input,
signals: check.signals,
score: check.score,
});
if (check.score >= 0.67) {
throw new Error('Input blocked: potential prompt injection detected');
}
}
return handler(input);
};
}
Where to use it: wrap your agent's main run(userInput) function. For medium-score inputs (score 0.33–0.66), let the call through but tag the session for review. For high-score inputs (score 0.67+), block and return an error.
What it does not catch: sophisticated injections embedded in retrieved documents (RAG attacks), multi-turn injections that build up across messages, and injections in non-English text. Those require a secondary model evaluation, which is the next step once baseline detection is in place.
Pattern 2: Circuit Breaker
A circuit breaker wraps your agent's execution loop and stops it when anomalous behavior accumulates past a threshold.
// lib/ai-security/circuit-breaker.ts
type CircuitState = 'closed' | 'open' | 'half-open';
interface CircuitBreakerConfig {
failureThreshold: number; // open circuit after this many failures
successThreshold: number; // close circuit after this many successes in half-open
timeout: number; // ms before moving from open to half-open
outputPatterns: RegExp[]; // patterns that count as failures in output
}
const DEFAULT_BLOCKED_PATTERNS: RegExp[] = [
/\b\d{3}-\d{2}-\d{4}\b/, // SSN pattern
/sk-[a-zA-Z0-9]{48}/, // OpenAI API key pattern
/anthropic_api_key/i,
/system prompt/i,
/my instructions are/i,
];
export class AICircuitBreaker {
private state: CircuitState = 'closed';
private failures = 0;
private successes = 0;
private lastFailureTime = 0;
private config: CircuitBreakerConfig;
constructor(config: Partial<CircuitBreakerConfig> = {}) {
this.config = {
failureThreshold: 3,
successThreshold: 2,
timeout: 60_000,
outputPatterns: DEFAULT_BLOCKED_PATTERNS,
...config,
};
}
async execute<T>(fn: () => Promise<{ output: string; result: T }>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailureTime > this.config.timeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit open: AI agent temporarily disabled');
}
}
try {
const { output, result } = await fn();
this.checkOutput(output);
this.onSuccess();
return result;
} catch (err) {
this.onFailure(err as Error);
throw err;
}
}
private checkOutput(output: string): void {
for (const pattern of this.config.outputPatterns) {
if (pattern.test(output)) {
throw new Error(`Blocked output pattern detected: ${pattern.source}`);
}
}
}
private onSuccess(): void {
this.failures = 0;
if (this.state === 'half-open') {
this.successes++;
if (this.successes >= this.config.successThreshold) {
this.state = 'closed';
this.successes = 0;
}
}
}
private onFailure(err: Error): void {
this.failures++;
this.lastFailureTime = Date.now();
if (this.failures >= this.config.failureThreshold) {
this.state = 'open';
auditLog({ event: 'circuit_opened', reason: err.message, failures: this.failures });
}
}
get isOpen(): boolean {
return this.state === 'open';
}
}
Usage in your agent loop:
const breaker = new AICircuitBreaker({ failureThreshold: 3, timeout: 120_000 });
async function runAgent(userInput: string) {
return breaker.execute(async () => {
const output = await callModel(userInput);
return { output, result: output };
});
}
Tune the thresholds to your use case: a customer-facing agent that runs hundreds of times per hour needs a lower failure threshold than an internal tool used by five people. Start at failureThreshold: 5, timeout: 60_000 and adjust based on your false-positive rate.
Pattern 3: Structured Audit Logger
Every AI agent call should produce a structured log entry. This is what you query during incident investigation.
// lib/ai-security/audit-logger.ts
import { randomUUID } from 'crypto';
export interface AuditEntry {
requestId: string;
timestamp: string;
userId?: string;
sessionId?: string;
event: string;
model?: string;
input?: string;
output?: string;
toolCalls?: ToolCallRecord[];
inputTokens?: number;
outputTokens?: number;
latencyMs?: number;
signals?: string[];
score?: number;
failures?: number;
reason?: string;
error?: string;
}
interface ToolCallRecord {
tool: string;
args: Record<string, unknown>;
result: unknown;
latencyMs: number;
}
// Replace with your structured logging service (Datadog, Axiom, CloudWatch, etc.)
function writeLog(entry: AuditEntry): void {
// Write to stdout as JSON — your log aggregator picks it up
process.stdout.write(JSON.stringify(entry) + '\n');
}
export function auditLog(partial: Partial<AuditEntry> & { event: string }): void {
writeLog({
requestId: randomUUID(),
timestamp: new Date().toISOString(),
...partial,
});
}
// Wrap a complete agent call with timing and automatic logging
export async function auditedAgentCall<T>(params: {
userId?: string;
sessionId?: string;
model: string;
input: string;
fn: () => Promise<{ output: string; toolCalls?: ToolCallRecord[]; result: T }>;
}): Promise<T> {
const requestId = randomUUID();
const start = Date.now();
auditLog({
requestId,
event: 'agent_call_start',
userId: params.userId,
sessionId: params.sessionId,
model: params.model,
input: params.input,
});
try {
const { output, toolCalls, result } = await params.fn();
const latencyMs = Date.now() - start;
auditLog({
requestId,
event: 'agent_call_complete',
userId: params.userId,
sessionId: params.sessionId,
model: params.model,
output,
toolCalls,
latencyMs,
});
return result;
} catch (err) {
auditLog({
requestId,
event: 'agent_call_error',
userId: params.userId,
sessionId: params.sessionId,
model: params.model,
error: (err as Error).message,
latencyMs: Date.now() - start,
});
throw err;
}
}
Putting It Together
// agent.ts — complete integration
const breaker = new AICircuitBreaker();
export async function runSecureAgent(
userInput: string,
userId: string,
sessionId: string,
): Promise<string> {
// 1. Check for injection
const guardedRun = withInjectionGuard(async (input) => {
// 2. Circuit breaker wraps the model call
return breaker.execute(async () => {
// 3. Audited call logs everything
const result = await auditedAgentCall({
userId,
sessionId,
model: 'claude-sonnet-4-6',
input,
fn: async () => {
const output = await callAnthropicAPI(input);
return { output, result: output };
},
});
return { output: result, result };
});
});
return guardedRun(userInput);
}
Incident Response Steps When the Circuit Opens
When the circuit breaker fires, the agent is down. Here is the response sequence:
Immediate (0–5 minutes):
- Check the audit log for the last 10 entries before the circuit opened
- Identify which output pattern triggered the break or which error repeated
- If data exfiltration is suspected, rotate any API keys the agent had access to
Short-term (5–30 minutes): 4. Pull the full session log for the user session that triggered the incident 5. Determine whether the trigger was a legitimate edge case or an attack 6. If attack: block the user account or IP, preserve logs for investigation 7. If edge case: add the pattern to your test suite, fix, and manually reset the circuit
After the incident: 8. Update the injection signal list if a new pattern was used 9. Adjust circuit breaker thresholds if the false-positive rate is too high 10. Document the incident in your AI governance log (required under EU AI Act for high-risk systems)
Minimum Viable Security for a New AI Agent
If you are starting from zero:
- Audit logging first — you cannot investigate what you did not log
- Injection detection second — blocks the easiest attacks
- Circuit breaker third — prevents runaway failures from compounding
- Output pattern filtering — add as you discover what your agent should never output
Do not ship an agent to production without audit logging. Everything else is recoverable. An incident with no logs is not.
