What is an AI agent security incident?

An AI agent security incident is any event where the agent behaves outside its intended scope, potentially causing data exposure, unauthorized actions, or policy violations. Common types: prompt injection (user input manipulates agent instructions), tool call abuse (agent calls APIs it should not), output exfiltration (agent leaks system prompt or internal data), and runaway loops (agent repeats the same action indefinitely). Unlike traditional software bugs, AI agent incidents often look like normal behavior until you examine the full input-output chain.

How do you detect prompt injection in TypeScript?

Pattern matching on known injection strings is the baseline. Maintain a list of injection signals: 'ignore previous instructions', 'you are now', 'disregard your', 'system prompt', 'DAN mode'. Check user input against this list before sending to the model. More robust: use a separate classifier model call to evaluate whether the input is attempting to redirect the agent. Log all flagged inputs with the original text, timestamp, and user ID before deciding whether to block or allow with monitoring.

What should an AI agent audit log contain?

At minimum: request ID, timestamp, user ID, session ID, the full user input, the full model output, tool calls made (name, arguments, result), model name and version, token counts (input and output), latency, and any safety flags triggered. Store logs outside the application database so a compromised agent cannot tamper with them. Retain for 90 days minimum for incident investigation. Structure logs as JSON so they are queryable.

What is a circuit breaker in AI agent context?

A circuit breaker is a control that automatically halts the agent when anomalous behavior is detected, preventing further damage. Triggers include: consecutive tool call failures above a threshold, output containing blocked patterns (SSNs, internal URLs, system prompt fragments), response time exceeding limits, and token usage far above the session average. When the circuit opens, the agent returns a safe fallback response and logs the incident. The circuit resets after a cooling period or manual review.

What is an AI agent security incident?

An AI agent security incident is any event where the agent behaves outside its intended scope, potentially causing data exposure, unauthorized actions, or policy violations. Common types: prompt injection (user input manipulates agent instructions), tool call abuse (agent calls APIs it should not), output exfiltration (agent leaks system prompt or internal data), and runaway loops (agent repeats the same action indefinitely). Unlike traditional software bugs, AI agent incidents often look like normal behavior until you examine the full input-output chain.

TypeScript AI Agent Security: Incident Resp…

AI agents fail differently than regular software. A bug in a traditional API returns an error code. A compromised or misbehaving AI agent returns a plausible-looking response that executes unintended actions, leaks data, or loops indefinitely while burning tokens and API budget.

This playbook gives you three TypeScript patterns you can add to any Node.js or Next.js backend today: a prompt injection detector, a circuit breaker, and an audit logger. Each is standalone. You do not need to adopt all three at once.

Pattern 1: Prompt Injection Detector

Run this before every call to your AI provider. It checks for known injection signals and optionally calls a secondary model to evaluate suspicious inputs.

// lib/ai-security/injection-detector.ts

const INJECTION_SIGNALS = [
  'ignore previous instructions',
  'ignore all previous',
  'disregard your',
  'you are now',
  'pretend you are',
  'act as if',
  'forget everything',
  'new persona',
  'system prompt',
  'reveal your instructions',
  'print your system prompt',
  'dan mode',
  'developer mode',
  'jailbreak',
  'bypass your',
  'override your',
];

export interface InjectionCheckResult {
  flagged: boolean;
  signals: string[];
  score: number; // 0–1
}

export function detectInjection(input: string): InjectionCheckResult {
  const normalized = input.toLowerCase();
  const matched = INJECTION_SIGNALS.filter(s => normalized.includes(s));

  return {
    flagged: matched.length > 0,
    signals: matched,
    score: Math.min(matched.length / 3, 1),
  };
}

// Middleware wrapper for Express / Next.js route handlers
export function withInjectionGuard<T>(
  handler: (input: string) => Promise<T>,
  onFlag?: (result: InjectionCheckResult, input: string) => void,
) {
  return async (input: string): Promise<T> => {
    const check = detectInjection(input);

    if (check.flagged) {
      onFlag?.(check, input);
      // Log and either block or continue with monitoring
      auditLog({
        event: 'injection_flagged',
        input,
        signals: check.signals,
        score: check.score,
      });

      if (check.score >= 0.67) {
        throw new Error('Input blocked: potential prompt injection detected');
      }
    }

    return handler(input);
  };
}

Where to use it: wrap your agent's main run(userInput) function. For medium-score inputs (score 0.33–0.66), let the call through but tag the session for review. For high-score inputs (score 0.67+), block and return an error.

What it does not catch: sophisticated injections embedded in retrieved documents (RAG attacks), multi-turn injections that build up across messages, and injections in non-English text. Those require a secondary model evaluation, which is the next step once baseline detection is in place.

Pattern 2: Circuit Breaker

A circuit breaker wraps your agent's execution loop and stops it when anomalous behavior accumulates past a threshold.

// lib/ai-security/circuit-breaker.ts

type CircuitState = 'closed' | 'open' | 'half-open';

interface CircuitBreakerConfig {
  failureThreshold: number;    // open circuit after this many failures
  successThreshold: number;    // close circuit after this many successes in half-open
  timeout: number;             // ms before moving from open to half-open
  outputPatterns: RegExp[];    // patterns that count as failures in output
}

const DEFAULT_BLOCKED_PATTERNS: RegExp[] = [
  /\b\d{3}-\d{2}-\d{4}\b/,          // SSN pattern
  /sk-[a-zA-Z0-9]{48}/,             // OpenAI API key pattern
  /anthropic_api_key/i,
  /system prompt/i,
  /my instructions are/i,
];

export class AICircuitBreaker {
  private state: CircuitState = 'closed';
  private failures = 0;
  private successes = 0;
  private lastFailureTime = 0;
  private config: CircuitBreakerConfig;

  constructor(config: Partial<CircuitBreakerConfig> = {}) {
    this.config = {
      failureThreshold: 3,
      successThreshold: 2,
      timeout: 60_000,
      outputPatterns: DEFAULT_BLOCKED_PATTERNS,
      ...config,
    };
  }

  async execute<T>(fn: () => Promise<{ output: string; result: T }>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailureTime > this.config.timeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit open: AI agent temporarily disabled');
      }
    }

    try {
      const { output, result } = await fn();
      this.checkOutput(output);
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure(err as Error);
      throw err;
    }
  }

  private checkOutput(output: string): void {
    for (const pattern of this.config.outputPatterns) {
      if (pattern.test(output)) {
        throw new Error(`Blocked output pattern detected: ${pattern.source}`);
      }
    }
  }

  private onSuccess(): void {
    this.failures = 0;
    if (this.state === 'half-open') {
      this.successes++;
      if (this.successes >= this.config.successThreshold) {
        this.state = 'closed';
        this.successes = 0;
      }
    }
  }

  private onFailure(err: Error): void {
    this.failures++;
    this.lastFailureTime = Date.now();
    if (this.failures >= this.config.failureThreshold) {
      this.state = 'open';
      auditLog({ event: 'circuit_opened', reason: err.message, failures: this.failures });
    }
  }

  get isOpen(): boolean {
    return this.state === 'open';
  }
}

Usage in your agent loop:

const breaker = new AICircuitBreaker({ failureThreshold: 3, timeout: 120_000 });

async function runAgent(userInput: string) {
  return breaker.execute(async () => {
    const output = await callModel(userInput);
    return { output, result: output };
  });
}

Tune the thresholds to your use case: a customer-facing agent that runs hundreds of times per hour needs a lower failure threshold than an internal tool used by five people. Start at failureThreshold: 5, timeout: 60_000 and adjust based on your false-positive rate.

Pattern 3: Structured Audit Logger

Every AI agent call should produce a structured log entry. This is what you query during incident investigation.

// lib/ai-security/audit-logger.ts
import { randomUUID } from 'crypto';

export interface AuditEntry {
  requestId: string;
  timestamp: string;
  userId?: string;
  sessionId?: string;
  event: string;
  model?: string;
  input?: string;
  output?: string;
  toolCalls?: ToolCallRecord[];
  inputTokens?: number;
  outputTokens?: number;
  latencyMs?: number;
  signals?: string[];
  score?: number;
  failures?: number;
  reason?: string;
  error?: string;
}

interface ToolCallRecord {
  tool: string;
  args: Record<string, unknown>;
  result: unknown;
  latencyMs: number;
}

// Replace with your structured logging service (Datadog, Axiom, CloudWatch, etc.)
function writeLog(entry: AuditEntry): void {
  // Write to stdout as JSON — your log aggregator picks it up
  process.stdout.write(JSON.stringify(entry) + '\n');
}

export function auditLog(partial: Partial<AuditEntry> & { event: string }): void {
  writeLog({
    requestId: randomUUID(),
    timestamp: new Date().toISOString(),
    ...partial,
  });
}

// Wrap a complete agent call with timing and automatic logging
export async function auditedAgentCall<T>(params: {
  userId?: string;
  sessionId?: string;
  model: string;
  input: string;
  fn: () => Promise<{ output: string; toolCalls?: ToolCallRecord[]; result: T }>;
}): Promise<T> {
  const requestId = randomUUID();
  const start = Date.now();

  auditLog({
    requestId,
    event: 'agent_call_start',
    userId: params.userId,
    sessionId: params.sessionId,
    model: params.model,
    input: params.input,
  });

  try {
    const { output, toolCalls, result } = await params.fn();
    const latencyMs = Date.now() - start;

    auditLog({
      requestId,
      event: 'agent_call_complete',
      userId: params.userId,
      sessionId: params.sessionId,
      model: params.model,
      output,
      toolCalls,
      latencyMs,
    });

    return result;
  } catch (err) {
    auditLog({
      requestId,
      event: 'agent_call_error',
      userId: params.userId,
      sessionId: params.sessionId,
      model: params.model,
      error: (err as Error).message,
      latencyMs: Date.now() - start,
    });
    throw err;
  }
}

Putting It Together

// agent.ts — complete integration

const breaker = new AICircuitBreaker();

export async function runSecureAgent(
  userInput: string,
  userId: string,
  sessionId: string,
): Promise<string> {

  // 1. Check for injection
  const guardedRun = withInjectionGuard(async (input) => {

    // 2. Circuit breaker wraps the model call
    return breaker.execute(async () => {

      // 3. Audited call logs everything
      const result = await auditedAgentCall({
        userId,
        sessionId,
        model: 'claude-sonnet-4-6',
        input,
        fn: async () => {
          const output = await callAnthropicAPI(input);
          return { output, result: output };
        },
      });

      return { output: result, result };
    });
  });

  return guardedRun(userInput);
}

Incident Response Steps When the Circuit Opens

When the circuit breaker fires, the agent is down. Here is the response sequence:

Immediate (0–5 minutes):

Check the audit log for the last 10 entries before the circuit opened
Identify which output pattern triggered the break or which error repeated
If data exfiltration is suspected, rotate any API keys the agent had access to

Short-term (5–30 minutes): 4. Pull the full session log for the user session that triggered the incident 5. Determine whether the trigger was a legitimate edge case or an attack 6. If attack: block the user account or IP, preserve logs for investigation 7. If edge case: add the pattern to your test suite, fix, and manually reset the circuit

After the incident: 8. Update the injection signal list if a new pattern was used 9. Adjust circuit breaker thresholds if the false-positive rate is too high 10. Document the incident in your AI governance log (required under EU AI Act for high-risk systems)

Minimum Viable Security for a New AI Agent

If you are starting from zero:

Audit logging first — you cannot investigate what you did not log
Injection detection second — blocks the easiest attacks
Circuit breaker third — prevents runaway failures from compounding
Output pattern filtering — add as you discover what your agent should never output

Do not ship an agent to production without audit logging. Everything else is recoverable. An incident with no logs is not.