Why do AI agents need observability beyond normal application logging?

Because an agent's behavior is non-deterministic and multi-step. A single user request can fan out into several model calls and tool invocations, each with its own inputs, outputs, latency, and cost. Normal request logging captures the HTTP layer but not the agent's internal decisions.

What is the difference between logging and tracing for an AI agent?

Logging records discrete events (the agent started, a tool failed). Tracing connects those events into a causal tree for a single run, so you can see that a user message triggered a model call, which triggered three tool calls, one of which timed out. Tracing carries a shared run ID and parent-child span relationships.

Does AI agent observability help with EU AI Act compliance?

It supports it. The EU AI Act expects providers and deployers of high-risk AI systems to keep automatic logs that allow events to be traced over the system's lifecycle, and to enable human oversight. Trace context, structured event logs, and tool-call records are the technical substrate for those obligations.

How do I track token cost per agent run in TypeScript?

Maintain a per-run accumulator keyed by the run ID, add the prompt and completion tokens from each model response, and multiply by the per-model rate to get cost. The token and cost meter module below does this with a simple in-memory map tied to the trace context, then flushes the totals when the run ends.

Can these observability modules export to Datadog or another backend?

Yes. The fifth module converts the internal spans into OpenTelemetry spans and exports them through the standard OTLP exporter, which Datadog, Honeycomb, Grafana Tempo, and most observability backends accept. Because the internal trace format is decoupled from the exporter, you can also write a custom exporter to any sink.

TypeScript AI Agent Observability and Traci…

When an AI agent does something you did not expect, the first question is always the same: what actually happened? If your answer is a scroll through unstructured console logs, you do not have observability. You have hope.

Agents make that question hard. A single user request fans out into model calls and tool invocations, each non-deterministic, each with its own cost and failure mode. The HTTP access log tells you a request came in and a response went out. It tells you nothing about the five decisions the agent made in between. To govern an agent, you have to be able to reconstruct those decisions, and that requires deliberate instrumentation.

This guide gives you five framework-agnostic TypeScript modules that add that instrumentation. They drop into Express or Next.js, they do not depend on a specific model provider, and each one maps to a governance need: audit trails, incident response, cost control, and the logging that high-risk AI obligations expect.

TL;DR: Five TypeScript modules for AI agent observability: a trace context that correlates every step under one run ID via AsyncLocalStorage, a token and cost meter that accounts spend per run and per model, a structured event logger that emits typed JSON, a tool-call tracer that wraps every tool with timing and errors, and an OpenTelemetry exporter that ships spans to a collector or Datadog. Without traces you cannot reconstruct what an agent did, which breaks audit, incident response, and cost control. Copy-paste, framework-agnostic.

Why observability is a governance control, not just an ops nicety

Observability for normal services is about uptime and latency. For agents it is also about accountability. Three governance functions depend on it directly.

Audit trails. You cannot prove what an agent did, or did not do, without a record that ties each model call and tool action to a single run. Regulators and customers increasingly ask deployers of consequential AI to demonstrate traceability. The EU AI Act expects high-risk systems to keep automatic logs that allow events to be traced across the lifecycle. That is a tracing requirement in everything but name.

Incident response. When an agent leaks data, calls the wrong tool, or burns a budget, your response time is bounded by how fast you can reconstruct the run. Good traces turn a multi-hour forensic exercise into a single query.

Cost control. Agents spend money per token, and an agent that loops or spawns sub-calls can run up a bill quietly. Per-run cost accounting is the only way to attribute spend, alert on anomalies, and enforce a budget.

The modules below are ordered so each builds on the previous one. Start with the trace context; everything else reads from it.

Module 1: trace context with AsyncLocalStorage

The foundation is a single run ID that every other module can read without you threading it through every function call. Node's AsyncLocalStorage carries that context across async boundaries.

// trace-context.ts
import { AsyncLocalStorage } from "node:async_hooks";
import { randomUUID } from "node:crypto";

export interface TraceContext {
  runId: string;
  startedAt: number;
  spans: Span[];
}

export interface Span {
  id: string;
  parentId: string | null;
  name: string;
  startedAt: number;
  endedAt?: number;
  attributes: Record<string, unknown>;
}

const storage = new AsyncLocalStorage<TraceContext>();

export function runWithTrace<T>(fn: () => Promise<T>, runId = randomUUID()): Promise<T> {
  const ctx: TraceContext = { runId, startedAt: Date.now(), spans: [] };
  return storage.run(ctx, fn);
}

export function currentTrace(): TraceContext {
  const ctx = storage.getStore();
  if (!ctx) throw new Error("No active trace. Wrap the agent run in runWithTrace().");
  return ctx;
}

export function startSpan(name: string, attributes: Record<string, unknown> = {}): Span {
  const ctx = currentTrace();
  const span: Span = {
    id: randomUUID(),
    parentId: ctx.spans.at(-1)?.id ?? null,
    name,
    startedAt: Date.now(),
    attributes,
  };
  ctx.spans.push(span);
  return span;
}

export function endSpan(span: Span, attributes: Record<string, unknown> = {}): void {
  span.endedAt = Date.now();
  Object.assign(span.attributes, attributes);
}

Wrap each incoming agent request in runWithTrace, and every downstream module can call currentTrace() to attach to the same run. In Express that is one line of middleware: app.use((req, res, next) => runWithTrace(() => Promise.resolve(next()))).

Code on a screen representing instrumented AI agent tracing in TypeScript

Module 2: token and cost meter

Cost is a first-class observability signal for agents. This meter accumulates tokens and dollars per run and per model, tied to the trace context.

// cost-meter.ts
import { currentTrace } from "./trace-context";

// Per-million-token rates. Keep these in config, not code, in production.
const RATES: Record<string, { inputPerM: number; outputPerM: number }> = {
  "claude-sonnet-4-6": { inputPerM: 3, outputPerM: 15 },
  "gpt-5": { inputPerM: 2.5, outputPerM: 10 },
};

const ledger = new Map<string, { tokensIn: number; tokensOut: number; costUsd: number }>();

export function recordUsage(model: string, tokensIn: number, tokensOut: number): void {
  const { runId } = currentTrace();
  const rate = RATES[model];
  if (!rate) throw new Error(`No rate configured for model ${model}`);
  const cost = (tokensIn / 1_000_000) * rate.inputPerM + (tokensOut / 1_000_000) * rate.outputPerM;
  const prev = ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 };
  ledger.set(runId, {
    tokensIn: prev.tokensIn + tokensIn,
    tokensOut: prev.tokensOut + tokensOut,
    costUsd: prev.costUsd + cost,
  });
}

export function runCost(): { tokensIn: number; tokensOut: number; costUsd: number } {
  const { runId } = currentTrace();
  return ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 };
}

export function flushCost(): void {
  const { runId } = currentTrace();
  ledger.delete(runId);
}

Call recordUsage after each model response using the provider's reported token counts. Read runCost() before responding to enforce a per-run budget, and call flushCost() when the run ends so the map does not grow unbounded. For the budget-enforcement side of this, see the AI spend governance and token budget controls guide.

Module 3: structured event logger

Unstructured logs are not queryable. This logger emits one typed JSON line per event, stamped with the run ID, so your log backend can filter and aggregate.

// event-logger.ts
import { currentTrace } from "./trace-context";

type AgentEvent =
  | { type: "agent.start"; input: string }
  | { type: "llm.call"; model: string; tokensIn: number; tokensOut: number }
  | { type: "tool.call"; tool: string; ok: boolean; ms: number }
  | { type: "agent.end"; ok: boolean; costUsd: number }
  | { type: "agent.error"; message: string };

export function logEvent(event: AgentEvent): void {
  const { runId } = currentTrace();
  const line = JSON.stringify({
    runId,
    ts: new Date().toISOString(),
    ...event,
  });
  // Replace with your transport (pino, Winston, console for dev).
  console.log(line);
}

Because every line carries runId, a single query reconstructs a full run in your log tool. The typed AgentEvent union also forces you to log a consistent shape, which is what makes the logs aggregatable later. For a deeper treatment of tamper-evident records, the logging and audit trail patterns article covers signing and retention.

Module 4: tool-call tracer

Tool calls are where agents touch the real world, so they are the highest-value spans to capture. This wrapper records arguments, result, latency, and errors for any tool function without changing the tool itself.

// trace-tool.ts
import { startSpan, endSpan } from "./trace-context";
import { logEvent } from "./event-logger";

export function traceTool<A extends unknown[], R>(
  name: string,
  fn: (...args: A) => Promise<R>,
) {
  return async (...args: A): Promise<R> => {
    const span = startSpan(`tool:${name}`, { args: redact(args) });
    const startedAt = Date.now();
    try {
      const result = await fn(...args);
      endSpan(span, { ok: true });
      logEvent({ type: "tool.call", tool: name, ok: true, ms: Date.now() - startedAt });
      return result;
    } catch (err) {
      endSpan(span, { ok: false, error: String(err) });
      logEvent({ type: "tool.call", tool: name, ok: false, ms: Date.now() - startedAt });
      throw err;
    }
  };
}

// Never trace raw secrets or full PII payloads. Redact before recording.
function redact(args: unknown[]): unknown[] {
  return args.map((a) =>
    typeof a === "string" && a.length > 200 ? `${a.slice(0, 200)}...[truncated]` : a,
  );
}

Wrap each tool once: const searchDb = traceTool("searchDb", rawSearchDb). Now every call produces a span and a structured event automatically. The redact step matters for governance; observability should never become a new data-leak surface, so truncate long strings and strip secrets before they reach your logs.

Module 5: OpenTelemetry exporter

The first four modules are self-contained. When you are ready to ship traces to a real backend, this exporter converts the internal spans into OpenTelemetry spans, which Datadog, Honeycomb, and Grafana Tempo all accept over OTLP.

// otel-export.ts
import { trace, SpanStatusCode } from "@opentelemetry/api";
import type { TraceContext } from "./trace-context";

const tracer = trace.getTracer("ai-agent");

export function exportTrace(ctx: TraceContext): void {
  for (const span of ctx.spans) {
    const otelSpan = tracer.startSpan(span.name, { startTime: span.startedAt });
    otelSpan.setAttribute("agent.run_id", ctx.runId);
    for (const [key, value] of Object.entries(span.attributes)) {
      otelSpan.setAttribute(key, value as string | number | boolean);
    }
    if (span.attributes.ok === false) {
      otelSpan.setStatus({ code: SpanStatusCode.ERROR });
    }
    otelSpan.end(span.endedAt ?? Date.now());
  }
}

Call exportTrace(currentTrace()) once at the end of a run, after endSpan has closed everything. Configure the standard OTLP exporter in your bootstrap and the spans flow to whatever backend you point it at. Because the internal Span type is decoupled from OpenTelemetry, you can swap in a custom exporter for any sink without touching modules one through four.

How the five fit together

A single agent run uses all five in sequence:

The request handler calls runWithTrace (Module 1), creating the run ID.
agent.start is logged (Module 3).
Each model call records tokens through the cost meter (Module 2) and logs an llm.call event.
Each tool runs through traceTool (Module 4), producing spans and tool.call events.
Before responding, the handler reads runCost() to enforce a budget, logs agent.end, exports the trace (Module 5), and calls flushCost().

The result is that for any run, you can answer what the agent did, what it cost, how long each step took, and where it failed, from structured data rather than guesswork. That is the difference between an agent you operate and an agent you can govern.

Where observability sits in your agent governance

Observability is one layer of a larger control set. It pairs with tool authorization, which decides what an agent is allowed to call, and with output validation, which checks what comes back. Together they form the technical core of an agent governance program.

Authorization decides what is permitted: see the tool authorization patterns.
Validation checks the outputs: see the output validation patterns.
Observability records what happened: this guide.
Incident response acts on what the records reveal: see the incident response playbooks.

For the policy layer that sits above all of this, the AI agent governance policy for small teams gives you the document, and the EU AI Act post-market monitoring guide covers the monitoring obligations these logs help satisfy.

// trace-context.ts import { AsyncLocalStorage } from "node:async_hooks"; import { randomUUID } from "node:crypto"; export interface TraceContext { runId: string; startedAt: number; spans: Span[]; } export interface Span { id: string; parentId: string | null; name: string; startedAt: number; endedAt?: number; attributes: Record<string, unknown>; } const storage = new AsyncLocalStorage<TraceContext>(); export function runWithTrace<T>(fn: () => Promise<T>, runId = randomUUID()): Promise<T> { const ctx: TraceContext = { runId, startedAt: Date.now(), spans: [] }; return storage.run(ctx, fn); } export function currentTrace(): TraceContext { const ctx = storage.getStore(); if (!ctx) throw new Error("No active trace. Wrap the agent run in runWithTrace()."); return ctx; } export function startSpan(name: string, attributes: Record<string, unknown> = {}): Span { const ctx = currentTrace(); const span: Span = { id: randomUUID(), parentId: ctx.spans.at(-1)?.id ?? null, name, startedAt: Date.now(), attributes, }; ctx.spans.push(span); return span; } export function endSpan(span: Span, attributes: Record<string, unknown> = {}): void { span.endedAt = Date.now(); Object.assign(span.attributes, attributes); }

// cost-meter.ts import { currentTrace } from "./trace-context"; // Per-million-token rates. Keep these in config, not code, in production. const RATES: Record<string, { inputPerM: number; outputPerM: number }> = { "claude-sonnet-4-6": { inputPerM: 3, outputPerM: 15 }, "gpt-5": { inputPerM: 2.5, outputPerM: 10 }, }; const ledger = new Map<string, { tokensIn: number; tokensOut: number; costUsd: number }>(); export function recordUsage(model: string, tokensIn: number, tokensOut: number): void { const { runId } = currentTrace(); const rate = RATES[model]; if (!rate) throw new Error(`No rate configured for model ${model}`); const cost = (tokensIn / 1_000_000) * rate.inputPerM + (tokensOut / 1_000_000) * rate.outputPerM; const prev = ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 }; ledger.set(runId, { tokensIn: prev.tokensIn + tokensIn, tokensOut: prev.tokensOut + tokensOut, costUsd: prev.costUsd + cost, }); } export function runCost(): { tokensIn: number; tokensOut: number; costUsd: number } { const { runId } = currentTrace(); return ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 }; } export function flushCost(): void { const { runId } = currentTrace(); ledger.delete(runId); }

// event-logger.ts import { currentTrace } from "./trace-context"; type AgentEvent = | { type: "agent.start"; input: string } | { type: "llm.call"; model: string; tokensIn: number; tokensOut: number } | { type: "tool.call"; tool: string; ok: boolean; ms: number } | { type: "agent.end"; ok: boolean; costUsd: number } | { type: "agent.error"; message: string }; export function logEvent(event: AgentEvent): void { const { runId } = currentTrace(); const line = JSON.stringify({ runId, ts: new Date().toISOString(), ...event, }); // Replace with your transport (pino, Winston, console for dev). console.log(line); }

// trace-tool.ts import { startSpan, endSpan } from "./trace-context"; import { logEvent } from "./event-logger"; export function traceTool<A extends unknown[], R>( name: string, fn: (...args: A) => Promise<R>, ) { return async (...args: A): Promise<R> => { const span = startSpan(`tool:${name}`, { args: redact(args) }); const startedAt = Date.now(); try { const result = await fn(...args); endSpan(span, { ok: true }); logEvent({ type: "tool.call", tool: name, ok: true, ms: Date.now() - startedAt }); return result; } catch (err) { endSpan(span, { ok: false, error: String(err) }); logEvent({ type: "tool.call", tool: name, ok: false, ms: Date.now() - startedAt }); throw err; } }; } // Never trace raw secrets or full PII payloads. Redact before recording. function redact(args: unknown[]): unknown[] { return args.map((a) => typeof a === "string" && a.length > 200 ? `${a.slice(0, 200)}...[truncated]` : a, ); }

// otel-export.ts import { trace, SpanStatusCode } from "@opentelemetry/api"; import type { TraceContext } from "./trace-context"; const tracer = trace.getTracer("ai-agent"); export function exportTrace(ctx: TraceContext): void { for (const span of ctx.spans) { const otelSpan = tracer.startSpan(span.name, { startTime: span.startedAt }); otelSpan.setAttribute("agent.run_id", ctx.runId); for (const [key, value] of Object.entries(span.attributes)) { otelSpan.setAttribute(key, value as string | number | boolean); } if (span.attributes.ok === false) { otelSpan.setStatus({ code: SpanStatusCode.ERROR }); } otelSpan.end(span.endedAt ?? Date.now()); } }

TypeScript AI Agent Observability and Tracing: 5 Paste-Ready Modules (2026)

Why observability is a governance control, not just an ops nicety

Module 1: trace context with AsyncLocalStorage

Module 2: token and cost meter

Module 3: structured event logger

Module 4: tool-call tracer

Module 5: OpenTelemetry exporter

How the five fit together

Where observability sits in your agent governance

TypeScript AI Agent Observability and Tracing: 5 Paste-Ready Modules (2026)

Why observability is a governance control, not just an ops nicety

Module 1: trace context with AsyncLocalStorage

Module 2: token and cost meter

Module 3: structured event logger

Module 4: tool-call tracer

Module 5: OpenTelemetry exporter

How the five fit together

Where observability sits in your agent governance