Blog/AI Automation/Vector Databases and Embeddings: A Practical Guide
POST
July 05, 2025
LAST UPDATEDJuly 05, 2025

Vector Databases and Embeddings: A Practical Guide

Master vector databases and embeddings for AI applications. Compare pgvector, Pinecone, and Weaviate with practical implementation examples and benchmarks.

Tags

AIVector DatabaseEmbeddingspgvectorPinecone
Vector Databases and Embeddings: A Practical Guide
5 min read

Vector Databases and Embeddings: A Practical Guide

This is Part 4 of the AI Automation Engineer Roadmap series. This post builds on concepts from Part 3: Building RAG Pipelines.

TL;DR

Vector databases store and search high-dimensional embeddings at scale, forming the backbone of RAG systems, semantic search, and AI-powered recommendations. This post covers how embeddings represent meaning, the math behind similarity search, and practical comparisons of pgvector, Pinecone, Qdrant, and Chroma -- with production setup guides for each.

Why This Matters

In Part 3, we built a RAG pipeline using pgvector. But vector storage is a deep topic on its own. Choosing the wrong database, index type, or embedding model can mean the difference between sub-100ms queries and multi-second timeouts, between accurate retrieval and missing the most relevant document. As your dataset grows from thousands to millions of vectors, these decisions become critical. This post gives you the knowledge to make them confidently.

Core Concepts

How Embeddings Represent Meaning

An embedding is a list of numbers (a vector) that captures the semantic meaning of text. The key property: similar meanings produce similar vectors.

When you embed "How do I reset my password?" and "I forgot my login credentials," the resulting vectors will be close together in the embedding space, even though the sentences share almost no words. This is what makes semantic search fundamentally different from keyword search.

typescript
import OpenAI from "openai";
 
const openai = new OpenAI();
 
async function demonstrateEmbeddings() {
  const texts = [
    "How do I reset my password?",
    "I forgot my login credentials",
    "What is the weather today?",
  ];
 
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts,
  });
 
  const embeddings = response.data.map((d) => d.embedding);
 
  // Similar meaning = high cosine similarity
  console.log(
    "Password vs Credentials:",
    cosineSimilarity(embeddings[0], embeddings[1]).toFixed(4)
  );
  // ~0.85 -- very similar
 
  console.log(
    "Password vs Weather:",
    cosineSimilarity(embeddings[0], embeddings[2]).toFixed(4)
  );
  // ~0.25 -- very different
}

Cosine Similarity: The Math (Simplified)

Cosine similarity measures the angle between two vectors, ignoring magnitude. It ranges from -1 (opposite) to 1 (identical).

typescript
function cosineSimilarity(a: number[], b: number[]): number {
  if (a.length !== b.length) throw new Error("Vectors must be same length");
 
  let dotProduct = 0;
  let normA = 0;
  let normB = 0;
 
  for (let i = 0; i < a.length; i++) {
    dotProduct += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
 
  return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
 
// Dot product is faster when vectors are already normalized (unit length).
// OpenAI's text-embedding-3-* models return normalized vectors,
// so dot product = cosine similarity for those models.
function dotProduct(a: number[], b: number[]): number {
  let result = 0;
  for (let i = 0; i < a.length; i++) {
    result += a[i] * b[i];
  }
  return result;
}

When to use which metric:

  • Cosine similarity: Default choice. Works regardless of vector normalization.
  • Dot product: Faster. Use when vectors are normalized (OpenAI embeddings are).
  • Euclidean distance (L2): Less common for text. Better for spatial/geometric data.

Dimensionality and Its Impact

Embedding dimensionality is the length of the vector. Higher dimensions capture more nuance but cost more to store and search:

  • text-embedding-3-small: 1536 dimensions (good default)
  • text-embedding-3-large: 3072 dimensions (higher accuracy)
  • Cohere Embed v3: 1024 dimensions
  • BGE-small: 384 dimensions (open-source, self-hosted)

OpenAI's v3 models support Matryoshka embeddings -- you can truncate vectors to fewer dimensions with graceful quality degradation:

typescript
// Generate a lower-dimensional embedding to save storage
const response = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: "Your text here",
  dimensions: 512, // Truncated from 1536 -- still good quality, 66% less storage
});

Hands-On Implementation

pgvector: When Postgres Is Enough

If you already run PostgreSQL, pgvector is the pragmatic choice. No new infrastructure, no new billing, no new ops burden.

typescript
import { Pool } from "pg";
 
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
 
async function setupPgvector() {
  await pool.query("CREATE EXTENSION IF NOT EXISTS vector");
 
  await pool.query(`
    CREATE TABLE IF NOT EXISTS embeddings (
      id BIGSERIAL PRIMARY KEY,
      content TEXT NOT NULL,
      embedding vector(1536) NOT NULL,
      metadata JSONB DEFAULT '{}',
      collection VARCHAR(255) NOT NULL,
      created_at TIMESTAMPTZ DEFAULT NOW()
    )
  `);
 
  // IVFFlat index: faster build, good for < 1M vectors
  // Lists = sqrt(number of rows) is a good starting point
  await pool.query(`
    CREATE INDEX IF NOT EXISTS embeddings_ivfflat_idx
    ON embeddings
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100)
  `);
 
  // HNSW index: slower build, better recall, good for any scale
  // m = max connections per node, ef_construction = build-time search width
  await pool.query(`
    CREATE INDEX IF NOT EXISTS embeddings_hnsw_idx
    ON embeddings
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
  `);
}
 
// Search with metadata filtering
async function searchPgvector(
  queryEmbedding: number[],
  collection: string,
  topK: number = 5,
  metadataFilter?: Record<string, unknown>
) {
  let whereClause = "WHERE collection = $2";
  const params: unknown[] = [`[${queryEmbedding.join(",")}]`, collection];
 
  if (metadataFilter) {
    whereClause += ` AND metadata @> $${params.length + 1}::jsonb`;
    params.push(JSON.stringify(metadataFilter));
  }
 
  const result = await pool.query(
    `SELECT content, metadata,
            1 - (embedding <=> $1::vector) AS similarity
     FROM embeddings
     ${whereClause}
     ORDER BY embedding <=> $1::vector
     LIMIT $${params.length + 1}`,
    [...params, topK]
  );
 
  return result.rows;
}

IVFFlat vs HNSW:

  • IVFFlat: Divides vectors into clusters (lists). Searches only nearby clusters. Faster to build, lower memory, but requires tuning lists and probes.
  • HNSW: Builds a hierarchical graph of vectors. Better recall out of the box, works well at any scale, but uses more memory and takes longer to build.

Use HNSW unless you have a specific reason not to.

Pinecone: Managed and Scalable

Pinecone is a fully managed vector database. You trade control for zero-ops:

typescript
import { Pinecone } from "@pinecone-database/pinecone";
 
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
 
async function setupPinecone() {
  // Create an index (one-time setup, usually done via dashboard)
  await pinecone.createIndex({
    name: "knowledge-base",
    dimension: 1536,
    metric: "cosine",
    spec: {
      serverless: {
        cloud: "aws",
        region: "us-east-1",
      },
    },
  });
}
 
async function upsertToPinecone(
  vectors: Array<{
    id: string;
    values: number[];
    metadata: Record<string, string | number | boolean>;
  }>
) {
  const index = pinecone.index("knowledge-base");
 
  // Pinecone supports batch upserts up to 100 vectors at a time
  const batchSize = 100;
  for (let i = 0; i < vectors.length; i += batchSize) {
    const batch = vectors.slice(i, i + batchSize);
    await index.upsert(batch);
  }
}
 
async function searchPinecone(
  queryEmbedding: number[],
  topK: number = 5,
  filter?: Record<string, unknown>
) {
  const index = pinecone.index("knowledge-base");
 
  const results = await index.query({
    vector: queryEmbedding,
    topK,
    filter,
    includeMetadata: true,
  });
 
  return results.matches?.map((match) => ({
    id: match.id,
    score: match.score,
    metadata: match.metadata,
  }));
}

Qdrant: Self-Hosted Power

Qdrant runs as a Docker container and gives you Pinecone-level features with full control:

typescript
import { QdrantClient } from "@qdrant/js-client-rest";
 
const qdrant = new QdrantClient({ url: "http://localhost:6333" });
 
async function setupQdrant() {
  await qdrant.createCollection("knowledge-base", {
    vectors: {
      size: 1536,
      distance: "Cosine",
    },
    optimizers_config: {
      default_segment_number: 2,
    },
    // Enable on-disk storage for large datasets
    on_disk_payload: true,
  });
 
  // Create payload index for metadata filtering
  await qdrant.createPayloadIndex("knowledge-base", {
    field_name: "category",
    field_schema: "keyword",
  });
}
 
async function searchQdrant(
  queryEmbedding: number[],
  topK: number = 5,
  category?: string
) {
  const results = await qdrant.search("knowledge-base", {
    vector: queryEmbedding,
    limit: topK,
    filter: category
      ? {
          must: [{ key: "category", match: { value: category } }],
        }
      : undefined,
    with_payload: true,
  });
 
  return results.map((result) => ({
    id: result.id,
    score: result.score,
    payload: result.payload,
  }));
}

Chroma: Lightweight and Local

Chroma is perfect for prototyping and small-scale applications:

typescript
import { ChromaClient, OpenAIEmbeddingFunction } from "chromadb";
 
const chroma = new ChromaClient();
 
const embedder = new OpenAIEmbeddingFunction({
  openai_api_key: process.env.OPENAI_API_KEY!,
  openai_model: "text-embedding-3-small",
});
 
async function setupChroma() {
  const collection = await chroma.getOrCreateCollection({
    name: "knowledge-base",
    embeddingFunction: embedder,
    metadata: { "hnsw:space": "cosine" },
  });
 
  return collection;
}
 
async function addToChroma(
  documents: string[],
  ids: string[],
  metadata: Array<Record<string, string>>
) {
  const collection = await setupChroma();
 
  // Chroma handles embedding generation automatically
  await collection.add({
    documents,
    ids,
    metadatas: metadata,
  });
}
 
async function searchChroma(query: string, topK: number = 5) {
  const collection = await setupChroma();
 
  // Chroma embeds the query for you
  const results = await collection.query({
    queryTexts: [query],
    nResults: topK,
  });
 
  return results;
}

Batch Operations for Production Ingestion

When ingesting thousands of documents, batch operations are critical:

typescript
async function batchIngest(
  documents: Array<{ content: string; metadata: Record<string, unknown> }>,
  batchSize: number = 50
) {
  const openai = new OpenAI();
  let totalProcessed = 0;
 
  for (let i = 0; i < documents.length; i += batchSize) {
    const batch = documents.slice(i, i + batchSize);
 
    // Generate embeddings for the batch
    const embeddingResponse = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: batch.map((d) => d.content),
    });
 
    // Build insert data
    const chunks = batch.map((doc, idx) => ({
      content: doc.content,
      embedding: embeddingResponse.data[idx].embedding,
      metadata: doc.metadata,
    }));
 
    // Insert into your vector store (using pgvector example)
    await insertChunks(chunks);
 
    totalProcessed += batch.length;
    console.log(`Processed ${totalProcessed}/${documents.length} documents`);
  }
}

Best Practices

  1. Start with pgvector -- If you run PostgreSQL, add the extension before evaluating dedicated vector databases. It handles millions of vectors with proper indexing.
  2. Use HNSW indexes by default -- Better recall than IVFFlat with less tuning. The memory overhead is worth it.
  3. Normalize your vectors -- If your embedding model does not return normalized vectors, normalize them before storage. This lets you use faster dot product distance.
  4. Index metadata fields -- If you filter by category, tenant, or date alongside vector search, create indexes on those metadata fields.
  5. Monitor recall, not just latency -- A fast query that returns irrelevant results is worse than a slightly slower one that returns the right documents.

Common Pitfalls

  • Mixing embedding models in one collection: If you embed some documents with text-embedding-3-small and others with Cohere, the vectors are incompatible. Similarity scores will be meaningless.
  • Not rebuilding indexes after large ingestions: IVFFlat indexes in pgvector need to be rebuilt after significant data changes. HNSW is more forgiving but still benefits from periodic optimization.
  • Over-indexing small datasets: If you have fewer than 10,000 vectors, exact search (no index) is fast enough. Adding an index adds complexity without meaningful speed improvement.
  • Ignoring dimensionality trade-offs: 3072-dimension embeddings use 2x the storage and are slower to search than 1536-dimension ones. The accuracy gain is often marginal.
  • Not setting up backups: Vector databases hold computed data that is expensive to regenerate. Back them up like any other database.

What's Next

You now understand how embeddings work, how to choose and configure a vector database, and how to build production-grade storage for your RAG pipeline. But retrieval is only half the story. In Part 5: Building AI Agents with Tool Calling, we will take everything you have learned and build autonomous agents that can reason, use tools, and complete multi-step tasks.

FAQ

Should I use pgvector or a dedicated vector database like Pinecone?

Use pgvector if you already run PostgreSQL and need fewer than 10 million vectors. Choose Pinecone or Qdrant for larger scale, managed infrastructure, or when you need advanced filtering and hybrid search out of the box. The key factor is operational overhead -- pgvector means zero new infrastructure, while dedicated databases offer better performance at extreme scale.

What embedding model should I use for my AI application?

OpenAI text-embedding-3-small offers the best cost-to-quality ratio for most use cases. For higher accuracy, use text-embedding-3-large. For open-source, consider Cohere Embed v3 or BGE models. Always test with your actual data -- domain-specific content may perform better with specialized models.

How do vector similarity searches work?

Vector searches convert queries into embeddings, then find the nearest vectors in the database using distance metrics like cosine similarity or dot product, returning the most semantically similar documents. Indexes like HNSW make this fast by building graph structures that allow approximate nearest neighbor search without comparing against every vector.

Collaboration

Need help with a project?

Let's Build It

I help startups and established companies design, build, and scale world-class digital products. From deep technical architecture to pixel-perfect UI — let's bring your vision to life.

SH

Article Author

Sadam Hussain

Senior Full Stack Developer

Senior Full Stack Developer with over 7 years of experience building React, Next.js, Node.js, TypeScript, and AI-powered web platforms.

Related Articles

AI Evaluation for Production Workflows
Mar 21, 20266 min read
AI
Evaluation
LLMOps

AI Evaluation for Production Workflows

Learn how to evaluate AI workflows in production using task-based metrics, human review, regression checks, and business-aligned quality thresholds.

How to Build an AI Workflow in a Production SaaS App
Mar 21, 20267 min read
AI
SaaS
Workflows

How to Build an AI Workflow in a Production SaaS App

A practical guide to designing and shipping AI workflows inside a production SaaS app, with orchestration, fallback logic, evaluation, and user trust considerations.

Building AI Features Safely: Guardrails, Fallbacks, and Human Review
Mar 21, 20266 min read
AI
LLM
Guardrails

Building AI Features Safely: Guardrails, Fallbacks, and Human Review

A production guide to shipping AI features safely with guardrails, confidence thresholds, fallback paths, auditability, and human-in-the-loop review.