Auto-Instrumentation — Node.js SDK

Auto-instrumentation lets the SDK automatically capture LLM calls without modifying your agent logic. Use watch() to patch a single LLM client instance.

`watch()` — Instance-Level Patching

watch() monkey-patches a single LLM client instance. Only that specific object is instrumented:

import OpenAI from 'openai';
import { watch } from 'infinium-o2';

const openai = watch(new OpenAI());   // This instance is patched
const other = new OpenAI();            // This instance is NOT patched

watch() returns the same client object, so you can chain it:

const openai = watch(new OpenAI(), { captureContent: true });

Parameters

Parameter	Type	Default	Description
`client`	LLM client	required	An LLM provider client instance
`options.captureContent`	`boolean`	`false`	Capture input/output previews (truncated to 500 chars)

Supported Providers

Provider	Client	Patched Method	Detection
OpenAI	`openai` package	`chat.completions.create`	Has `chat.completions.create`
Anthropic	`@anthropic-ai/sdk`	`messages.create`	Has `messages.create`
Google Gemini	`@google/generative-ai`	`generateContent` / `generateContentStream`	Has `generateContent`
xAI (Grok)	`openai` with xAI `baseURL`	`chat.completions.create`	`baseURL` contains `x.ai`, `grok`, or `xai`

What Gets Captured

With captureContent: false (default):

Field	Description
`provider`	`"openai"`, `"anthropic"`, `"google"`, `"xai"`
`model`	Model name from the API call
`promptTokens`	Input token count
`completionTokens`	Output token count
`latencyNs`	Call duration in nanoseconds
`error`	Error message (on failure)

With captureContent: true, additionally:

Field	Description
`inputPreview`	Last 2 messages, truncated to 500 chars
`outputPreview`	Response text, truncated to 500 chars

Privacy

Content capture is opt-in by default to protect sensitive data. Only enable it when you need to see what was sent to/from the LLM. Previews are always truncated to 500 characters.

Streaming Support

All providers handle streaming transparently. When you pass stream: true, the SDK wraps the response iterator to accumulate chunks and extract token counts:

const openai = watch(new OpenAI());

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// Tokens and latency are captured when the stream completes

The stream wrapper:

Returns a Proxy-based wrapper that mimics the original async iterator
Accumulates chunks to extract total token counts from the final chunk
Records latency from start to end of stream consumption
Prevents double-recording via finalization guards

Combining with Traces

Auto-captured calls are stored in a TraceContext (backed by Node.js AsyncLocalStorage). They’re automatically incorporated when used inside client.trace():

import OpenAI from 'openai';
import { InfiniumClient, watch } from 'infinium-o2';

const client = new InfiniumClient({ agentId: '...', agentSecret: '...' });
const openai = watch(new OpenAI());

const summarize = client.trace('Summarize Article')(
  async (article: string) => {
    // This LLM call is auto-captured into the active trace
    const resp = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: 'Summarize in 3 bullets.' },
        { role: 'user', content: article },
      ],
    });
    return resp.choices[0].message.content;
  }
);

// The trace includes the LLM call with model, tokens, and latency
const result = await summarize('The Federal Reserve announced...');

How It Works

client.trace() creates a new TraceContext and runs the function inside runWithTraceContext()
watch()-patched methods check for an active TraceContext via getCurrentTraceContext()
If one exists, they record a CapturedLlmCall into it
When the function returns, TraceBuilder.build(traceCtx) converts captured calls into ExecutionStep objects and aggregates LlmUsage
The trace is auto-sent to the API

This is async-safe because AsyncLocalStorage provides per-async-context isolation.

Provider-Specific Notes

OpenAI

import OpenAI from 'openai';
const openai = watch(new OpenAI());

const resp = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

Token counts from response.usage.prompt_tokens and response.usage.completion_tokens.

Anthropic

import Anthropic from '@anthropic-ai/sdk';
const anthropic = watch(new Anthropic());

const resp = await anthropic.messages.create({
  model: 'claude-3-5-sonnet-20241022',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello' }],
});

Token counts from response.usage.input_tokens and response.usage.output_tokens. Streaming parses message_start, message_delta, and content_block_delta events.

Google Gemini

import { GoogleGenerativeAI } from '@google/generative-ai';
const ai = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = watch(ai.getGenerativeModel({ model: 'gemini-2.0-flash' }));

const resp = await model.generateContent('Explain quantum computing');

Token counts from response.usageMetadata. Handles cumulative token usage (Gemini-specific behavior).

xAI (Grok)

import OpenAI from 'openai';
const xai = watch(new OpenAI({ baseURL: 'https://api.x.ai/v1', apiKey: 'xai-...' }));

const resp = await xai.chat.completions.create({
  model: 'grok-2',
  messages: [{ role: 'user', content: 'Hello' }],
});

xAI uses the OpenAI SDK with a custom baseURL. The SDK detects this automatically and records the provider as "xai".

watch() — Instance-Level Patching