What is the OpenTelemetry GenAI semantic convention for AI agent spans?

The OpenTelemetry GenAI semantic conventions (currently in Development status, v1.37+) define standard attribute names for LLM spans: gen_ai.system (provider name), gen_ai.request.model, gen_ai.request.max_tokens, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.operation.name. In the latest versions, prompt and completion content is captured via the Logs API as span events rather than span attributes — this allows PII-sensitive content to be filtered at the OTel Collector without touching application code. Note: the older gen_ai.prompt and gen_ai.completion span attributes are deprecated in v1.38 — use the event-based approach or the newer gen_ai.input.messages / gen_ai.output.messages attributes.

How do you store AI agent logs without leaking PII into observability backends?

Apply PII redaction before the span closes — not post-processing after export. Use a regex pipeline (Pattern 2 in this article) that strips emails, SSNs, phone numbers, and credit card numbers from span event content before calling span.end(). For named entities (person names, addresses), use a dedicated NER classifier at the collector level or treat all prompt/completion content as sensitive and route only to compliant backends (not third-party SaaS).

What does an EU AI Act compliant audit trail look like for AI agents?

Under EU AI Act Article 12 (logging obligations for high-risk AI systems), logs must be automatically generated and must enable recording of events relevant for identifying risk situations and supporting post-market monitoring. The minimum retention period is 6 months (Article 26(6)). For biometric identification systems specifically, the law mandates recording start/end times, the reference database used, input data that led to a match, and the identity of reviewers. For general high-risk AI systems, the requirements are less prescriptive but must support risk identification and human oversight. In practice: immutable append-only records with timestamps, session and user identifiers, model version, decision outputs, and confidence scores.

How do you implement structured logging for multi-step AI agent tool calls?

Create a parent span for the agent session and child spans for each step: LLM inference calls, tool invocations, and retrieval operations. Each child span inherits the trace ID and carries a span ID. Link tool call spans to the LLM span that initiated them using the OpenTelemetry link relationship. This produces a complete dependency graph of what the agent did, in what order, and what each step returned — queryable in any OTel-compatible backend (Jaeger, Tempo, Honeycomb, Datadog).

What is the OpenTelemetry GenAI semantic convention for AI agent spans?

The OpenTelemetry GenAI semantic conventions (currently in Development status, v1.37+) define standard attribute names for LLM spans: gen_ai.system (provider name), gen_ai.request.model, gen_ai.request.max_tokens, gen_ai.response.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.operation.name. In the latest versions, prompt and completion content is captured via the Logs API as span events rather than span attributes — this allows PII-sensitive content to be filtered at the OTel Collector without touching application code. Note: the older gen_ai.prompt and gen_ai.completion span attributes are deprecated in v1.38 — use the event-based approach or the newer gen_ai.input.messages / gen_ai.output.messages attributes.

TypeScript AI Agent Logging and Audit Trail…

TypeScript AI Agent Logging and Audit Trail Patterns 2026 — 5 Code Templates

Code editor showing TypeScript observability instrumentation — AI agent logging and audit trail patterns for compliance

Every AI agent decision is a liability if you cannot reconstruct what happened and why. These five patterns give you the logging infrastructure to answer "what did the agent do, with what data, and what was the outcome" — which is what EU AI Act Article 12, SOC 2, and any serious incident investigation will ask for.

Each pattern is standalone — drop in and wire to your existing agent framework.

Pattern 1: OpenTelemetry Span Logging (GenAI Semantic Conventions)

Instrument every LLM call with standard OTel attributes. Backends (Jaeger, Tempo, Honeycomb, Datadog) all consume this without custom parsers.

import { trace, SpanStatusCode, context } from "@opentelemetry/api";

const tracer = trace.getTracer("ai-agent", "1.0.0");

interface LLMCallParams {
  model: string;
  messages: Array<{ role: string; content: string }>;
  maxTokens?: number;
  sessionId: string;
  userId: string;
}

async function tracedLLMCall(
  params: LLMCallParams,
  llmFn: () => Promise<{ content: string; usage: { prompt_tokens: number; completion_tokens: number } }>
) {
  return tracer.startActiveSpan("gen_ai.chat", async (span) => {
    // GenAI semantic conventions — standard attribute names
    span.setAttributes({
      "gen_ai.system": "openai",
      "gen_ai.operation.name": "chat",
      "gen_ai.request.model": params.model,
      "gen_ai.request.max_tokens": params.maxTokens ?? 4096,
      "session.id": params.sessionId,
      "user.id": params.userId,
    });

    // Prompts as span events (can be filtered at collector — not attributes)
    span.addEvent("gen_ai.content.prompt", {
      "gen_ai.prompt": JSON.stringify(params.messages),
    });

    try {
      const result = await llmFn();

      span.setAttributes({
        "gen_ai.response.model": params.model,
        "gen_ai.usage.input_tokens": result.usage.prompt_tokens,
        "gen_ai.usage.output_tokens": result.usage.completion_tokens,
      });

      span.addEvent("gen_ai.content.completion", {
        "gen_ai.completion": result.content,
      });

      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (err) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
      span.recordException(err as Error);
      throw err;
    } finally {
      span.end();
    }
  });
}

Span attributes are indexed and retained forever. Storing prompt and completion content as span attributes means it lands in every backend and retention tier you have — use Logs API events instead so you can filter or drop at the OTel Collector without touching application code. Note: OTel GenAI semantic conventions are in Development status (v1.38). The older gen_ai.prompt/gen_ai.completion span attributes are deprecated — newer implementations use the Logs API or gen_ai.input.messages/gen_ai.output.messages attributes. Verify against the current OTel GenAI spec before shipping.

Pattern 2: PII-Safe Trace Storage

Redact before the span closes. Post-processing after export misses the window before the backend receives the data.

type RedactionRule = { pattern: RegExp; label: string };

const PII_RULES: RedactionRule[] = [
  { pattern: /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi, label: "EMAIL" },
  { pattern: /\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b/g, label: "PHONE" },
  { pattern: /\b\d{3}-\d{2}-\d{4}\b/g, label: "SSN" },
  { pattern: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, label: "CARD" },
];

function redactForTrace(text: string): { safe: string; redactedTypes: string[] } {
  const redactedTypes: string[] = [];
  let safe = text;
  for (const rule of PII_RULES) {
    let hit = false;
    safe = safe.replace(rule.pattern, () => {
      hit = true;
      return `[${rule.label}]`;
    });
    if (hit) redactedTypes.push(rule.label);
  }
  return { safe, redactedTypes };
}

// Wrap Pattern 1's event emission:
function addSafePromptEvent(span: Span, messages: Array<{ role: string; content: string }>) {
  const redacted = messages.map((m) => {
    const { safe, redactedTypes } = redactForTrace(m.content);
    return { role: m.role, content: safe, _redacted: redactedTypes };
  });

  span.addEvent("gen_ai.content.prompt", {
    "gen_ai.prompt": JSON.stringify(redacted),
  });
}

Regex catches structured PII. Named entities — names, addresses — require a NER model. For GDPR Article 5(1)(f), treat all prompt content as personal data by default and route only to backends with a signed DPA.

Pattern 3: Compliance-Ready Audit Record

Immutable append-only log of every agent decision. Covers the core EU AI Act Article 12 requirements: what happened, when, on what input, with what outcome.

interface AuditRecord {
  id: string;              // UUID — immutable primary key
  timestamp: string;       // ISO 8601 UTC
  sessionId: string;
  userId: string;
  agentId: string;
  operationType: string;   // "inference" | "tool_call" | "decision" | "escalation"
  modelVersion: string;
  inputHash: string;       // SHA-256 of input (not the input itself)
  outputSummary: string;   // Safe summary — not raw output
  confidence?: number;
  humanReviewRequired: boolean;
  humanReviewId?: string;  // FK to review queue if escalated
  policyViolation?: string;
  durationMs: number;
  tokenUsage: { input: number; output: number };
}

class AuditLogger {
  private records: AuditRecord[] = []; // Replace with append-only DB write

  async log(record: Omit<AuditRecord, "id" | "timestamp">): Promise<string> {
    const id = crypto.randomUUID();
    const entry: AuditRecord = {
      ...record,
      id,
      timestamp: new Date().toISOString(),
    };

    // In production: INSERT INTO audit_log — never UPDATE or DELETE
    this.records.push(entry);

    return id;
  }

  // Export for regulatory review — returns immutable snapshot
  async exportRange(from: Date, to: Date): Promise<AuditRecord[]> {
    return this.records.filter((r) => {
      const ts = new Date(r.timestamp);
      return ts >= from && ts <= to;
    });
  }
}

// Usage — log every inference decision
const audit = new AuditLogger();

const startMs = Date.now();
const result = await agent.run(userInput);

await audit.log({
  sessionId,
  userId,
  agentId: "contract-analyzer-v2",
  operationType: "decision",
  modelVersion: "gpt-4o-2024-11-20",
  inputHash: await sha256(userInput),
  outputSummary: result.summary, // Safe field — not raw LLM output
  confidence: result.confidence,
  humanReviewRequired: result.confidence < 0.75,
  humanReviewId: result.reviewId,
  durationMs: Date.now() - startMs,
  tokenUsage: result.usage,
});

inputHash instead of raw input: keeps the audit log lean and avoids storing PII in the audit table. If you need to reconstruct the input for investigation, use the hash to look it up in an encrypted input store.

Server rack and data storage infrastructure — AI agent audit logs must be retained for at least 6 months under EU AI Act Article 26(6) for high-risk systems

Pattern 4: Token Usage Tracker Per Session

Feed into cost circuit breaker (Pattern 6 in output validation patterns) and per-user billing.

interface SessionUsage {
  sessionId: string;
  userId: string;
  startedAt: Date;
  calls: Array<{
    ts: Date;
    model: string;
    inputTokens: number;
    outputTokens: number;
    costUsd: number;
  }>;
}

const MODEL_PRICING: Record<string, { input: number; output: number }> = {
  "gpt-4o": { input: 0.0000025, output: 0.00001 },       // per token
  "gpt-4o-mini": { input: 0.00000015, output: 0.0000006 },
  "claude-sonnet-4-6": { input: 0.000003, output: 0.000015 },
};

class SessionTokenTracker {
  private sessions = new Map<string, SessionUsage>();

  record(
    sessionId: string,
    userId: string,
    model: string,
    inputTokens: number,
    outputTokens: number
  ): { sessionTotal: number; sessionCostUsd: number } {
    if (!this.sessions.has(sessionId)) {
      this.sessions.set(sessionId, {
        sessionId, userId, startedAt: new Date(), calls: [],
      });
    }

    const pricing = MODEL_PRICING[model] ?? { input: 0, output: 0 };
    const costUsd =
      inputTokens * pricing.input + outputTokens * pricing.output;

    const session = this.sessions.get(sessionId)!;
    session.calls.push({ ts: new Date(), model, inputTokens, outputTokens, costUsd });

    const sessionTotal = session.calls.reduce(
      (sum, c) => sum + c.inputTokens + c.outputTokens, 0
    );
    const sessionCostUsd = session.calls.reduce((sum, c) => sum + c.costUsd, 0);

    return { sessionTotal, sessionCostUsd };
  }

  getSession(sessionId: string): SessionUsage | undefined {
    return this.sessions.get(sessionId);
  }
}

// Usage — call after every LLM response
const tracker = new SessionTokenTracker();

const { sessionTotal, sessionCostUsd } = tracker.record(
  sessionId, userId, "gpt-4o",
  response.usage.prompt_tokens,
  response.usage.completion_tokens
);

if (sessionCostUsd > 1.0) { // $1 per session limit
  throw new Error(`Session cost limit exceeded: $${sessionCostUsd.toFixed(4)}`);
}

Pattern 5: Human Review Decision Trail

Log every automated decision with confidence and rationale. When a human overrides, record the override. This record satisfies the "human oversight" requirements under EU AI Act high-risk provisions.

type DecisionOutcome = "automated" | "escalated" | "overridden" | "confirmed";

interface DecisionRecord {
  id: string;
  sessionId: string;
  userId: string;
  query: string;
  agentDraft: string;
  agentConfidence: number;
  agentRationale: string;
  outcome: DecisionOutcome;
  humanReviewerId?: string;
  humanDecision?: string;
  overrideReason?: string;
  resolvedAt: string;
}

class DecisionTrail {
  private decisions = new Map<string, DecisionRecord>();

  record(draft: {
    sessionId: string;
    userId: string;
    query: string;
    agentDraft: string;
    agentConfidence: number;
    agentRationale: string;
  }): string {
    const id = crypto.randomUUID();
    this.decisions.set(id, {
      ...draft,
      id,
      outcome: draft.agentConfidence >= 0.8 ? "automated" : "escalated",
      resolvedAt: new Date().toISOString(),
    });
    return id;
  }

  humanOverride(
    decisionId: string,
    reviewerId: string,
    decision: string,
    reason: string
  ): void {
    const record = this.decisions.get(decisionId);
    if (!record) throw new Error(`Decision ${decisionId} not found`);

    // Append-only update — preserve original draft
    this.decisions.set(decisionId, {
      ...record,
      outcome: "overridden",
      humanReviewerId: reviewerId,
      humanDecision: decision,
      overrideReason: reason,
      resolvedAt: new Date().toISOString(),
    });
  }

  humanConfirm(decisionId: string, reviewerId: string): void {
    const record = this.decisions.get(decisionId);
    if (!record) throw new Error(`Decision ${decisionId} not found`);

    this.decisions.set(decisionId, {
      ...record,
      outcome: "confirmed",
      humanReviewerId: reviewerId,
      resolvedAt: new Date().toISOString(),
    });
  }

  // Audit export — all decisions in a time range
  getDecisions(from: Date, to: Date): DecisionRecord[] {
    return Array.from(this.decisions.values()).filter((d) => {
      const ts = new Date(d.resolvedAt);
      return ts >= from && ts <= to;
    });
  }
}

Wiring it all together

const otelTracer = trace.getTracer("ai-agent");
const auditLogger = new AuditLogger();
const tokenTracker = new SessionTokenTracker();
const decisionTrail = new DecisionTrail();

async function instrumentedAgentCall(
  sessionId: string,
  userId: string,
  query: string
): Promise<string> {
  const startMs = Date.now();

  // 1. Traced LLM call (Pattern 1 + 2)
  const result = await tracedLLMCall(
    { model: "gpt-4o", messages: [{ role: "user", content: query }], sessionId, userId },
    () => llmClient.chat({ model: "gpt-4o", messages: [{ role: "user", content: query }] })
  );

  // 2. Token tracking (Pattern 4)
  const { sessionCostUsd } = tokenTracker.record(
    sessionId, userId, "gpt-4o",
    result.usage.prompt_tokens, result.usage.completion_tokens
  );

  // 3. Audit record (Pattern 3)
  await auditLogger.log({
    sessionId, userId,
    agentId: "main-agent",
    operationType: "inference",
    modelVersion: "gpt-4o",
    inputHash: await sha256(query),
    outputSummary: result.content.slice(0, 200),
    humanReviewRequired: false,
    durationMs: Date.now() - startMs,
    tokenUsage: { input: result.usage.prompt_tokens, output: result.usage.completion_tokens },
  });

  // 4. Decision trail (Pattern 5)
  decisionTrail.record({
    sessionId, userId, query,
    agentDraft: result.content,
    agentConfidence: 0.9, // from confidence pattern
    agentRationale: "High-confidence structured output",
  });

  return result.content;
}

TypeScript AI agent output validation patterns — validate before logging; Pattern 6 (cost circuit breaker) pairs with token tracking above
TypeScript AI agent authorization patterns — what the agent is allowed to call; logging proves it stayed within bounds
GPAI enforcement August 2026 — EU AI Act logging obligations that activate for enforcement August 2, 2026

References

OpenTelemetry — AI Agent Observability standards and semantic conventions
OpenTelemetry — GenAI semantic conventions
Arize AI — Best AI Observability Tools for Autonomous Agents in 2026
EU AI Act — Article 12: Logging obligations for high-risk AI systems