Auto-instrumentation lets the SDK automatically capture LLM calls without modifying your agent logic. Use watch() to patch a single LLM client instance.
watch() — Instance-Level Patching
watch() monkey-patches a single LLM client instance. Only that specific object is instrumented:
import OpenAI from 'openai';
import { watch } from 'infinium-o2';
const openai = watch(new OpenAI()); // This instance is patched
const other = new OpenAI(); // This instance is NOT patched
watch() returns the same client object, so you can chain it:
const openai = watch(new OpenAI(), { captureContent: true });
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
client | LLM client | required | An LLM provider client instance |
options.captureContent | boolean | false | Capture input/output previews (truncated to 500 chars) |
Supported Providers
| Provider | Client | Patched Method | Detection |
|---|---|---|---|
| OpenAI | openai package | chat.completions.create | Has chat.completions.create |
| Anthropic | @anthropic-ai/sdk | messages.create | Has messages.create |
| Google Gemini | @google/generative-ai | generateContent / generateContentStream | Has generateContent |
| xAI (Grok) | openai with xAI baseURL | chat.completions.create | baseURL contains x.ai, grok, or xai |
What Gets Captured
With captureContent: false (default):
| Field | Description |
|---|---|
provider | "openai", "anthropic", "google", "xai" |
model | Model name from the API call |
promptTokens | Input token count |
completionTokens | Output token count |
latencyNs | Call duration in nanoseconds |
error | Error message (on failure) |
With captureContent: true, additionally:
| Field | Description |
|---|---|
inputPreview | Last 2 messages, truncated to 500 chars |
outputPreview | Response text, truncated to 500 chars |
Privacy
Content capture is opt-in by default to protect sensitive data. Only enable it when you need to see what was sent to/from the LLM. Previews are always truncated to 500 characters.
Streaming Support
All providers handle streaming transparently. When you pass stream: true, the SDK wraps the response iterator to accumulate chunks and extract token counts:
const openai = watch(new OpenAI());
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
// Tokens and latency are captured when the stream completes
The stream wrapper:
- Returns a Proxy-based wrapper that mimics the original async iterator
- Accumulates chunks to extract total token counts from the final chunk
- Records latency from start to end of stream consumption
- Prevents double-recording via finalization guards
Combining with Traces
Auto-captured calls are stored in a TraceContext (backed by Node.js AsyncLocalStorage). They’re automatically incorporated when used inside client.trace():
import OpenAI from 'openai';
import { InfiniumClient, watch } from 'infinium-o2';
const client = new InfiniumClient({ agentId: '...', agentSecret: '...' });
const openai = watch(new OpenAI());
const summarize = client.trace('Summarize Article')(
async (article: string) => {
// This LLM call is auto-captured into the active trace
const resp = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'Summarize in 3 bullets.' },
{ role: 'user', content: article },
],
});
return resp.choices[0].message.content;
}
);
// The trace includes the LLM call with model, tokens, and latency
const result = await summarize('The Federal Reserve announced...');
How It Works
client.trace()creates a newTraceContextand runs the function insiderunWithTraceContext()watch()-patched methods check for an activeTraceContextviagetCurrentTraceContext()- If one exists, they record a
CapturedLlmCallinto it - When the function returns,
TraceBuilder.build(traceCtx)converts captured calls intoExecutionStepobjects and aggregatesLlmUsage - The trace is auto-sent to the API
This is async-safe because AsyncLocalStorage provides per-async-context isolation.
Provider-Specific Notes
OpenAI
import OpenAI from 'openai';
const openai = watch(new OpenAI());
const resp = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});
Token counts from response.usage.prompt_tokens and response.usage.completion_tokens.
Anthropic
import Anthropic from '@anthropic-ai/sdk';
const anthropic = watch(new Anthropic());
const resp = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello' }],
});
Token counts from response.usage.input_tokens and response.usage.output_tokens. Streaming parses message_start, message_delta, and content_block_delta events.
Google Gemini
import { GoogleGenerativeAI } from '@google/generative-ai';
const ai = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = watch(ai.getGenerativeModel({ model: 'gemini-2.0-flash' }));
const resp = await model.generateContent('Explain quantum computing');
Token counts from response.usageMetadata. Handles cumulative token usage (Gemini-specific behavior).
xAI (Grok)
import OpenAI from 'openai';
const xai = watch(new OpenAI({ baseURL: 'https://api.x.ai/v1', apiKey: 'xai-...' }));
const resp = await xai.chat.completions.create({
model: 'grok-2',
messages: [{ role: 'user', content: 'Hello' }],
});
xAI uses the OpenAI SDK with a custom baseURL. The SDK detects this automatically and records the provider as "xai".