When an AI agent does something you did not expect, the first question is always the same: what actually happened? If your answer is a scroll through unstructured console logs, you do not have observability. You have hope.
Agents make that question hard. A single user request fans out into model calls and tool invocations, each non-deterministic, each with its own cost and failure mode. The HTTP access log tells you a request came in and a response went out. It tells you nothing about the five decisions the agent made in between. To govern an agent, you have to be able to reconstruct those decisions, and that requires deliberate instrumentation.
This guide gives you five framework-agnostic TypeScript modules that add that instrumentation. They drop into Express or Next.js, they do not depend on a specific model provider, and each one maps to a governance need: audit trails, incident response, cost control, and the logging that high-risk AI obligations expect.
TL;DR: Five TypeScript modules for AI agent observability: a trace context that correlates every step under one run ID via AsyncLocalStorage, a token and cost meter that accounts spend per run and per model, a structured event logger that emits typed JSON, a tool-call tracer that wraps every tool with timing and errors, and an OpenTelemetry exporter that ships spans to a collector or Datadog. Without traces you cannot reconstruct what an agent did, which breaks audit, incident response, and cost control. Copy-paste, framework-agnostic.
Why observability is a governance control, not just an ops nicety
Observability for normal services is about uptime and latency. For agents it is also about accountability. Three governance functions depend on it directly.
Audit trails. You cannot prove what an agent did, or did not do, without a record that ties each model call and tool action to a single run. Regulators and customers increasingly ask deployers of consequential AI to demonstrate traceability. The EU AI Act expects high-risk systems to keep automatic logs that allow events to be traced across the lifecycle. That is a tracing requirement in everything but name.
Incident response. When an agent leaks data, calls the wrong tool, or burns a budget, your response time is bounded by how fast you can reconstruct the run. Good traces turn a multi-hour forensic exercise into a single query.
Cost control. Agents spend money per token, and an agent that loops or spawns sub-calls can run up a bill quietly. Per-run cost accounting is the only way to attribute spend, alert on anomalies, and enforce a budget.
The modules below are ordered so each builds on the previous one. Start with the trace context; everything else reads from it.
Module 1: trace context with AsyncLocalStorage
The foundation is a single run ID that every other module can read without you threading it through every function call. Node's AsyncLocalStorage carries that context across async boundaries.
// trace-context.ts
import { AsyncLocalStorage } from "node:async_hooks";
import { randomUUID } from "node:crypto";
export interface TraceContext {
runId: string;
startedAt: number;
spans: Span[];
}
export interface Span {
id: string;
parentId: string | null;
name: string;
startedAt: number;
endedAt?: number;
attributes: Record<string, unknown>;
}
const storage = new AsyncLocalStorage<TraceContext>();
export function runWithTrace<T>(fn: () => Promise<T>, runId = randomUUID()): Promise<T> {
const ctx: TraceContext = { runId, startedAt: Date.now(), spans: [] };
return storage.run(ctx, fn);
}
export function currentTrace(): TraceContext {
const ctx = storage.getStore();
if (!ctx) throw new Error("No active trace. Wrap the agent run in runWithTrace().");
return ctx;
}
export function startSpan(name: string, attributes: Record<string, unknown> = {}): Span {
const ctx = currentTrace();
const span: Span = {
id: randomUUID(),
parentId: ctx.spans.at(-1)?.id ?? null,
name,
startedAt: Date.now(),
attributes,
};
ctx.spans.push(span);
return span;
}
export function endSpan(span: Span, attributes: Record<string, unknown> = {}): void {
span.endedAt = Date.now();
Object.assign(span.attributes, attributes);
}
Wrap each incoming agent request in runWithTrace, and every downstream module can call currentTrace() to attach to the same run. In Express that is one line of middleware: app.use((req, res, next) => runWithTrace(() => Promise.resolve(next()))).
Module 2: token and cost meter
Cost is a first-class observability signal for agents. This meter accumulates tokens and dollars per run and per model, tied to the trace context.
// cost-meter.ts
import { currentTrace } from "./trace-context";
// Per-million-token rates. Keep these in config, not code, in production.
const RATES: Record<string, { inputPerM: number; outputPerM: number }> = {
"claude-sonnet-4-6": { inputPerM: 3, outputPerM: 15 },
"gpt-5": { inputPerM: 2.5, outputPerM: 10 },
};
const ledger = new Map<string, { tokensIn: number; tokensOut: number; costUsd: number }>();
export function recordUsage(model: string, tokensIn: number, tokensOut: number): void {
const { runId } = currentTrace();
const rate = RATES[model];
if (!rate) throw new Error(`No rate configured for model ${model}`);
const cost = (tokensIn / 1_000_000) * rate.inputPerM + (tokensOut / 1_000_000) * rate.outputPerM;
const prev = ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 };
ledger.set(runId, {
tokensIn: prev.tokensIn + tokensIn,
tokensOut: prev.tokensOut + tokensOut,
costUsd: prev.costUsd + cost,
});
}
export function runCost(): { tokensIn: number; tokensOut: number; costUsd: number } {
const { runId } = currentTrace();
return ledger.get(runId) ?? { tokensIn: 0, tokensOut: 0, costUsd: 0 };
}
export function flushCost(): void {
const { runId } = currentTrace();
ledger.delete(runId);
}
Call recordUsage after each model response using the provider's reported token counts. Read runCost() before responding to enforce a per-run budget, and call flushCost() when the run ends so the map does not grow unbounded. For the budget-enforcement side of this, see the AI spend governance and token budget controls guide.
Module 3: structured event logger
Unstructured logs are not queryable. This logger emits one typed JSON line per event, stamped with the run ID, so your log backend can filter and aggregate.
// event-logger.ts
import { currentTrace } from "./trace-context";
type AgentEvent =
| { type: "agent.start"; input: string }
| { type: "llm.call"; model: string; tokensIn: number; tokensOut: number }
| { type: "tool.call"; tool: string; ok: boolean; ms: number }
| { type: "agent.end"; ok: boolean; costUsd: number }
| { type: "agent.error"; message: string };
export function logEvent(event: AgentEvent): void {
const { runId } = currentTrace();
const line = JSON.stringify({
runId,
ts: new Date().toISOString(),
...event,
});
// Replace with your transport (pino, Winston, console for dev).
console.log(line);
}
Because every line carries runId, a single query reconstructs a full run in your log tool. The typed AgentEvent union also forces you to log a consistent shape, which is what makes the logs aggregatable later. For a deeper treatment of tamper-evident records, the logging and audit trail patterns article covers signing and retention.
Module 4: tool-call tracer
Tool calls are where agents touch the real world, so they are the highest-value spans to capture. This wrapper records arguments, result, latency, and errors for any tool function without changing the tool itself.
// trace-tool.ts
import { startSpan, endSpan } from "./trace-context";
import { logEvent } from "./event-logger";
export function traceTool<A extends unknown[], R>(
name: string,
fn: (...args: A) => Promise<R>,
) {
return async (...args: A): Promise<R> => {
const span = startSpan(`tool:${name}`, { args: redact(args) });
const startedAt = Date.now();
try {
const result = await fn(...args);
endSpan(span, { ok: true });
logEvent({ type: "tool.call", tool: name, ok: true, ms: Date.now() - startedAt });
return result;
} catch (err) {
endSpan(span, { ok: false, error: String(err) });
logEvent({ type: "tool.call", tool: name, ok: false, ms: Date.now() - startedAt });
throw err;
}
};
}
// Never trace raw secrets or full PII payloads. Redact before recording.
function redact(args: unknown[]): unknown[] {
return args.map((a) =>
typeof a === "string" && a.length > 200 ? `${a.slice(0, 200)}...[truncated]` : a,
);
}
Wrap each tool once: const searchDb = traceTool("searchDb", rawSearchDb). Now every call produces a span and a structured event automatically. The redact step matters for governance; observability should never become a new data-leak surface, so truncate long strings and strip secrets before they reach your logs.
Module 5: OpenTelemetry exporter
The first four modules are self-contained. When you are ready to ship traces to a real backend, this exporter converts the internal spans into OpenTelemetry spans, which Datadog, Honeycomb, and Grafana Tempo all accept over OTLP.
// otel-export.ts
import { trace, SpanStatusCode } from "@opentelemetry/api";
import type { TraceContext } from "./trace-context";
const tracer = trace.getTracer("ai-agent");
export function exportTrace(ctx: TraceContext): void {
for (const span of ctx.spans) {
const otelSpan = tracer.startSpan(span.name, { startTime: span.startedAt });
otelSpan.setAttribute("agent.run_id", ctx.runId);
for (const [key, value] of Object.entries(span.attributes)) {
otelSpan.setAttribute(key, value as string | number | boolean);
}
if (span.attributes.ok === false) {
otelSpan.setStatus({ code: SpanStatusCode.ERROR });
}
otelSpan.end(span.endedAt ?? Date.now());
}
}
Call exportTrace(currentTrace()) once at the end of a run, after endSpan has closed everything. Configure the standard OTLP exporter in your bootstrap and the spans flow to whatever backend you point it at. Because the internal Span type is decoupled from OpenTelemetry, you can swap in a custom exporter for any sink without touching modules one through four.
How the five fit together
A single agent run uses all five in sequence:
- The request handler calls
runWithTrace(Module 1), creating the run ID. agent.startis logged (Module 3).- Each model call records tokens through the cost meter (Module 2) and logs an
llm.callevent. - Each tool runs through
traceTool(Module 4), producing spans andtool.callevents. - Before responding, the handler reads
runCost()to enforce a budget, logsagent.end, exports the trace (Module 5), and callsflushCost().
The result is that for any run, you can answer what the agent did, what it cost, how long each step took, and where it failed, from structured data rather than guesswork. That is the difference between an agent you operate and an agent you can govern.
Where observability sits in your agent governance
Observability is one layer of a larger control set. It pairs with tool authorization, which decides what an agent is allowed to call, and with output validation, which checks what comes back. Together they form the technical core of an agent governance program.
- Authorization decides what is permitted: see the tool authorization patterns.
- Validation checks the outputs: see the output validation patterns.
- Observability records what happened: this guide.
- Incident response acts on what the records reveal: see the incident response playbooks.
For the policy layer that sits above all of this, the AI agent governance policy for small teams gives you the document, and the EU AI Act post-market monitoring guide covers the monitoring obligations these logs help satisfy.
Related reading
- TypeScript AI agent logging and audit trail patterns
- TypeScript AI agent security incident response playbooks
- TypeScript AI agent tool authorization patterns
- TypeScript AI agent output validation patterns
- AI spend governance and token budget controls
- AI agent governance policy for small teams
- EU AI Act post-market monitoring (Article 72)
