How to Add Observability to a Node.js App with OpenTelemetry
Learn how to instrument a Node.js app with OpenTelemetry for traces, metrics, and logs, and build a practical observability setup for production debugging.
Tags
How to Add Observability to a Node.js App with OpenTelemetry
TL;DR
In this tutorial, you will instrument a Node.js service with OpenTelemetry so you can capture traces, metrics, and the request context needed to debug production issues. OpenTelemetry is most useful when you want end-to-end visibility without coupling your application to one monitoring vendor.
Prerequisites
Before you begin, you should have:
- ›a Node.js application running on Express, Fastify, NestJS, or a similar server
- ›a package manager such as npm or pnpm
- ›a destination for telemetry data, such as a local collector or hosted observability platform
- ›basic familiarity with HTTP middleware and environment variables
This tutorial focuses on the instrumentation model, not on a specific vendor backend.
Step 1: Understand What You Want to Observe
Observability work is easier when you define the signals first.
For a Node.js app, the most useful questions are usually:
- ›which requests are slow?
- ›which downstream dependency is causing latency?
- ›where are errors happening?
- ›which routes or services are under abnormal load?
Those questions map naturally to:
- ›traces for request flow
- ›metrics for throughput, latency, and failures
- ›logs for detailed event context
OpenTelemetry helps you collect those signals in a structured, correlated way.
Step 2: Install the Core Packages
For a typical Node.js service, start with the SDK and automatic instrumentation packages.
npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/resources \
@opentelemetry/semantic-conventionsDepending on your setup, you may also add metrics exporters or framework-specific packages later. The important part is to start with a clean baseline.
Step 3: Bootstrap OpenTelemetry Early
Instrumentation should initialize before your application starts serving requests.
// instrumentation.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
const traceExporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_TRACES_ENDPOINT,
});
export const sdk = new NodeSDK({
traceExporter,
instrumentations: [getNodeAutoInstrumentations()],
});Then start it before your server boots:
// server.ts
import { sdk } from "./instrumentation";
import { createServer } from "./app";
async function bootstrap() {
await sdk.start();
const app = await createServer();
app.listen(3000);
}
bootstrap();If OpenTelemetry starts too late, you miss spans around application startup and early requests.
Step 4: Add Service Identity
Telemetry becomes much more useful when the service identifies itself clearly.
import { Resource } from "@opentelemetry/resources";
import { SemanticResourceAttributes } from "@opentelemetry/semantic-conventions";
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: "billing-api",
[SemanticResourceAttributes.SERVICE_VERSION]: "1.0.0",
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]:
process.env.NODE_ENV ?? "development",
});Without consistent service naming, traces from multiple services become much harder to navigate.
Step 5: Let Auto-Instrumentation Cover the Basics
Automatic instrumentation is a good starting point because it captures common framework and client behavior with little code:
- ›incoming HTTP requests
- ›outbound HTTP calls
- ›database drivers
- ›common framework middleware
That gives you immediate visibility into request chains. For many services, this already surfaces a lot of latency and failure patterns.
Step 6: Add Manual Spans Around Business-Critical Work
Auto-instrumentation is helpful, but it does not know your domain boundaries. Add manual spans where business logic matters.
import { trace } from "@opentelemetry/api";
const tracer = trace.getTracer("orders-service");
export async function processOrder(orderId: string) {
return tracer.startActiveSpan("processOrder", async (span) => {
span.setAttribute("order.id", orderId);
try {
await validateOrder(orderId);
await reserveInventory(orderId);
await capturePayment(orderId);
span.setStatus({ code: 1 });
} catch (error) {
span.recordException(error as Error);
span.setStatus({ code: 2 });
throw error;
} finally {
span.end();
}
});
}This is where traces become useful for product-relevant debugging, not just transport-level timing.
Step 7: Correlate Logs and Traces
Traces tell you where a request went. Logs tell you what happened in detail.
The most useful production setup links them through shared context such as:
- ›trace ID
- ›span ID
- ›request ID
- ›user or tenant identifiers where appropriate
When logs and traces are disconnected, debugging remains slower than it needs to be.
Step 8: Decide What Metrics Matter
You do not need every possible metric. Start with a few that map directly to operational decisions:
- ›request latency
- ›error rate
- ›request volume
- ›dependency latency
- ›queue or background job timing
Good observability is not about data volume. It is about making failure modes easier to understand.
Step 9: Be Careful with Sensitive Data
Telemetry pipelines can accidentally collect more than they should. Avoid leaking:
- ›access tokens
- ›raw personal data
- ›payment details
- ›secrets from request bodies or headers
Instrumentation should be designed with privacy and security boundaries in mind. Scrubbing and allow-listing are safer defaults than shipping everything.
Common Pitfalls
Treating OpenTelemetry as Just a Tracing Library
The real value comes from correlated signals and a consistent instrumentation model, not just one pretty trace view.
Adding Instrumentation but No Naming Discipline
If span names, service names, and attributes are inconsistent, the data becomes noisy and hard to use.
Instrumenting Too Much Too Fast
A smaller, deliberate set of meaningful spans is usually better than a flood of low-signal telemetry.
Forgetting Runtime Cost
Instrumentation adds overhead. In most cases it is acceptable, but you should still measure exporter settings, batching behavior, and sampling choices.
The Complete Shape of a Practical Setup
A useful production observability setup often looks like this:
- ›Node SDK bootstrapped before the app starts
- ›auto-instrumentation enabled for HTTP and common dependencies
- ›manual spans around business-critical flows
- ›stable service/resource naming
- ›telemetry exported through OTLP
- ›logs and traces correlated with shared request context
That is enough to make most real Node.js incidents easier to debug.
Next Steps
After the baseline is working, the next improvements usually are:
- ›sampling strategy for high-volume traffic
- ›custom metrics for business-critical events
- ›trace propagation across queues and async jobs
- ›service maps across multiple backend services
- ›alerts tied to latency and error budgets
That is when OpenTelemetry stops being just an instrumentation library and becomes part of how your team operates production systems.
FAQ
What is OpenTelemetry?
OpenTelemetry is an open standard and ecosystem for collecting telemetry such as traces, metrics, and logs from applications and infrastructure.
Why use OpenTelemetry in Node.js apps?
It helps teams trace requests across services, monitor latency and failures, and understand production behavior without locking themselves into one vendor.
Do I need all three: traces, metrics, and logs?
Not always on day one, but a production-grade setup usually becomes much easier to operate when traces, metrics, and logs are correlated.
Collaboration
Need help with a project?
Let's Build It
I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.
Related Articles
How to Build a Backend-for-Frontend (BFF) with Next.js and Node.js
A practical guide to building a Backend-for-Frontend with Next.js and Node.js for API aggregation, auth handling, caching, and frontend-specific data shaping.
How I Structure CI/CD for Next.js, Docker, and GitHub Actions
A practical CI/CD blueprint for Next.js apps using Docker and GitHub Actions, including testing, image builds, deployment stages, cache strategy, and release safety.
OpenTelemetry for Next.js and Node.js
A practical implementation guide for adding OpenTelemetry to Next.js and Node.js apps, including traces, request flow visibility, and production diagnostics.