Blog/Quick Tips & Snippets/Node.js Streams for Processing Large Files
POST
January 05, 2026
LAST UPDATEDJanuary 05, 2026

Node.js Streams for Processing Large Files

Process large files efficiently in Node.js using readable, writable, and transform streams to avoid memory issues and handle data chunk by chunk.

Tags

Node.jsStreamsPerformanceBackend
Node.js Streams for Processing Large Files
2 min read

Node.js Streams for Processing Large Files

TL;DR

Use pipeline() from stream/promises to chain readable, transform, and writable streams together for processing large files. Memory usage stays constant regardless of file size, and error handling is built in.

The Problem

You need to process a 5 GB CSV file — parse rows, transform data, write output. The naive approach crashes:

typescript
import { readFileSync } from 'fs';
 
// This loads 5 GB into memory and crashes with heap overflow
const data = readFileSync('huge-file.csv', 'utf-8');
const rows = data.split('\n').map(parseRow);

Even fs.readFile (the async version) has the same problem — it buffers the entire file in memory before returning. For files larger than available RAM, this is a dead end.

The Solution

Streams process data in chunks. The file is read piece by piece, each chunk is transformed, and results are written incrementally. Peak memory usage stays at a few megabytes regardless of file size.

Use pipeline() for clean stream chaining:

typescript
import { createReadStream, createWriteStream } from 'fs';
import { pipeline } from 'stream/promises';
import { Transform } from 'stream';
 
// Custom transform stream that processes CSV rows
const csvTransform = new Transform({
  objectMode: true,
  transform(chunk: Buffer, encoding, callback) {
    const lines = chunk.toString().split('\n');
    for (const line of lines) {
      if (line.trim()) {
        const [name, email, role] = line.split(',');
        this.push(JSON.stringify({ name, email, role: role?.trim() }) + '\n');
      }
    }
    callback();
  },
});
 
// Pipeline handles errors and cleanup automatically
await pipeline(
  createReadStream('users.csv'),
  csvTransform,
  createWriteStream('users.jsonl')
);
 
console.log('Processing complete');

Handling backpressure is the key advantage of pipeline() over manual .pipe(). If the writable stream is slower than the readable stream (e.g., writing to a slow disk while reading from a fast SSD), pipeline() automatically pauses the readable stream until the writable stream catches up. Without backpressure handling, memory grows unbounded as unwritten chunks accumulate in buffers.

Composing multiple transforms:

typescript
import { createGzip } from 'zlib';
 
// Read CSV -> transform to JSON -> compress -> write
await pipeline(
  createReadStream('data.csv'),
  csvTransform,
  createGzip(),
  createWriteStream('data.jsonl.gz')
);

Each stream in the pipeline processes its chunk and passes the result to the next stream. The data flows through the chain without any single step holding the entire file in memory.

Why This Works

Streams leverage Node.js's event loop to process data incrementally. pipeline() connects streams and manages the lifecycle: it propagates errors from any stream in the chain, destroys all streams on failure (preventing resource leaks), and handles backpressure automatically. Compared to manual .pipe() calls, pipeline() is both safer and more concise — .pipe() does not propagate errors or clean up on failure, which leads to subtle resource leaks in production.

FAQ

Why should I use streams instead of readFile?

readFile loads the entire file into memory, which crashes on large files. Streams process data in small chunks, keeping memory usage constant regardless of file size.

What are the four types of Node.js streams?

Readable (data source), Writable (data destination), Transform (modify data in transit), and Duplex (both readable and writable, like network sockets).

How do I chain multiple stream operations?

Use the pipeline function from stream/promises to chain streams together with automatic error handling and cleanup when any stream in the chain fails.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

SH

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Related Articles

TypeScript Utility Types You Should Know
Feb 10, 20263 min read
TypeScript
Cheatsheet

TypeScript Utility Types You Should Know

Five essential built-in generic utility types in TypeScript that will save you hundreds of lines of code.

Generate Dynamic OG Images in Next.js
Feb 08, 20262 min read
Next.js
OG Images
SEO

Generate Dynamic OG Images in Next.js

Generate dynamic Open Graph images in Next.js using the ImageResponse API with custom fonts, gradients, and data-driven content for social sharing.

GitHub Actions Reusable Workflows: Stop Repeating Yourself
Jan 22, 20263 min read
GitHub Actions
CI/CD
DevOps

GitHub Actions Reusable Workflows: Stop Repeating Yourself

Create reusable GitHub Actions workflows with inputs, secrets, and outputs to eliminate YAML duplication across repositories and teams efficiently.