Auto-instrumentation lets the SDK automatically capture LLM calls without modifying your agent logic. Use watch() to patch a single LLM client instance.

watch() — Instance-Level Patching

watch() monkey-patches a single LLM client instance. Only that specific object is instrumented:

import OpenAI from 'openai';
import { watch } from 'infinium-o2';

const openai = watch(new OpenAI());   // This instance is patched
const other = new OpenAI();            // This instance is NOT patched

watch() returns the same client object, so you can chain it:

const openai = watch(new OpenAI(), { captureContent: true });

Parameters

ParameterTypeDefaultDescription
clientLLM clientrequiredAn LLM provider client instance
options.captureContentbooleanfalseCapture input/output previews (truncated to 500 chars)

Supported Providers

ProviderClientPatched MethodDetection
OpenAIopenai packagechat.completions.createHas chat.completions.create
Anthropic@anthropic-ai/sdkmessages.createHas messages.create
Google Gemini@google/generative-aigenerateContent / generateContentStreamHas generateContent
xAI (Grok)openai with xAI baseURLchat.completions.createbaseURL contains x.ai, grok, or xai

What Gets Captured

With captureContent: false (default):

FieldDescription
provider"openai", "anthropic", "google", "xai"
modelModel name from the API call
promptTokensInput token count
completionTokensOutput token count
latencyNsCall duration in nanoseconds
errorError message (on failure)

With captureContent: true, additionally:

FieldDescription
inputPreviewLast 2 messages, truncated to 500 chars
outputPreviewResponse text, truncated to 500 chars

Privacy

Content capture is opt-in by default to protect sensitive data. Only enable it when you need to see what was sent to/from the LLM. Previews are always truncated to 500 characters.


Streaming Support

All providers handle streaming transparently. When you pass stream: true, the SDK wraps the response iterator to accumulate chunks and extract token counts:

const openai = watch(new OpenAI());

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// Tokens and latency are captured when the stream completes

The stream wrapper:

  • Returns a Proxy-based wrapper that mimics the original async iterator
  • Accumulates chunks to extract total token counts from the final chunk
  • Records latency from start to end of stream consumption
  • Prevents double-recording via finalization guards

Combining with Traces

Auto-captured calls are stored in a TraceContext (backed by Node.js AsyncLocalStorage). They’re automatically incorporated when used inside client.trace():

import OpenAI from 'openai';
import { InfiniumClient, watch } from 'infinium-o2';

const client = new InfiniumClient({ agentId: '...', agentSecret: '...' });
const openai = watch(new OpenAI());

const summarize = client.trace('Summarize Article')(
  async (article: string) => {
    // This LLM call is auto-captured into the active trace
    const resp = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: 'Summarize in 3 bullets.' },
        { role: 'user', content: article },
      ],
    });
    return resp.choices[0].message.content;
  }
);

// The trace includes the LLM call with model, tokens, and latency
const result = await summarize('The Federal Reserve announced...');

How It Works

  1. client.trace() creates a new TraceContext and runs the function inside runWithTraceContext()
  2. watch()-patched methods check for an active TraceContext via getCurrentTraceContext()
  3. If one exists, they record a CapturedLlmCall into it
  4. When the function returns, TraceBuilder.build(traceCtx) converts captured calls into ExecutionStep objects and aggregates LlmUsage
  5. The trace is auto-sent to the API

This is async-safe because AsyncLocalStorage provides per-async-context isolation.


Provider-Specific Notes

OpenAI

import OpenAI from 'openai';
const openai = watch(new OpenAI());

const resp = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

Token counts from response.usage.prompt_tokens and response.usage.completion_tokens.

Anthropic

import Anthropic from '@anthropic-ai/sdk';
const anthropic = watch(new Anthropic());

const resp = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});

Token counts from response.usage.input_tokens and response.usage.output_tokens. Streaming parses message_start, message_delta, and content_block_delta events.

Google Gemini

import { GoogleGenerativeAI } from '@google/generative-ai';
const ai = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = watch(ai.getGenerativeModel({ model: 'gemini-2.0-flash' }));

const resp = await model.generateContent('Explain quantum computing');

Token counts from response.usageMetadata. Handles cumulative token usage (Gemini-specific behavior).

xAI (Grok)

import OpenAI from 'openai';
const xai = watch(new OpenAI({ baseURL: 'https://api.x.ai/v1', apiKey: 'xai-...' }));

const resp = await xai.chat.completions.create({
  model: 'grok-2',
  messages: [{ role: 'user', content: 'Hello' }],
});

xAI uses the OpenAI SDK with a custom baseURL. The SDK detects this automatically and records the provider as "xai".