Vector Stores

  • Stores vector embeddings and performs similarity search
  • Often also store metadata associated with embeddings

Use cases

  • RAG
  • Recommendation System (one of the method)
    • Song recommendation (Spotify)
    • Movie recommendation (Netflix)
    • Reverse image search (Google)
  • Image Similarity Search
    • Visually similar pins in Pinterest
  • Duplicate detection
    • Near duplicate content/document
    • Duplicate images
  • Anomaly Detection
    • Fraud detection
    • suspicious user behavior
  • Clustering
    • Group news articles by topic
    • Group users

Examples

Vector Native

  • FAISS
    • In-memory library
  • ChromaDB
    • Lightweight local vector db
  • Pinecone
    • Fully managed, cloud native
    • Proprietary
  • Milvus
    • Written in Go/C++
    • Distributed and for Large scale use cases
  • Weaviate
    • Written in Go
    • Supports hybrid search (vector + keyword search)
  • Qdrant
    • Written in Rust
    • Supports filtering and metadata search

Vector Extension

  • PGVector
    • PostgreSQL extension
  • Elasticsearch
    • Search engine with vector search support
    • Supports hybrid search
  • Redis
    • In-memory data store with vector search capability

Indexing

Similarity Metrics

  • Used to compare embeddings
  • Types
    • Cosine Similarity
      • measures angle between vectors
      • Range =
    • L2 Squared
      • measures straight line distance b/w points
      • Euclidean distance (L2 norm or L2 distance)
      • L2 Squared distance
      • Range =
    • Dot product (Inner Product or IP)
      • how much one vector projects onto another
      • Range =
  • https://docs.trychroma.com/docs/collections/configure#single-node
# Langchain example
db = Chroma.from_documents(documents, embeddings_model,
                    collection_metadata={
                        "hnsw:space": "cosine", # 'l2', 'cosine', or 'ip'
                    }
                )
 
db.add_documents(more_docs)
db.delete(ids)
db.similarity_search(query, k) # returns top k similar embeddings