Vector Stores
- Stores vector embeddings and performs similarity search
- Often also store metadata associated with embeddings
Use cases
- RAG
- Recommendation System (one of the method)
- Song recommendation (Spotify)
- Movie recommendation (Netflix)
- Reverse image search (Google)
- Image Similarity Search
- Visually similar pins in Pinterest
- Duplicate detection
- Near duplicate content/document
- Duplicate images
- Anomaly Detection
- Fraud detection
- suspicious user behavior
- Clustering
- Group news articles by topic
- Group users
Examples
Vector Native
- FAISS
- ChromaDB
- Lightweight local vector db
- Pinecone
- Fully managed, cloud native
- Proprietary
- Milvus
- Written in Go/C++
- Distributed and for Large scale use cases
- Weaviate
- Written in Go
- Supports hybrid search (vector + keyword search)
- Qdrant
- Written in Rust
- Supports filtering and metadata search
Vector Extension
- PGVector
- Elasticsearch
- Search engine with vector search support
- Supports hybrid search
- Redis
- In-memory data store with vector search capability
Indexing
Similarity Metrics
- Used to compare embeddings
- Types
- Cosine Similarity
- measures angle between vectors
- d=∣∣A∣∣∣∣B∣∣A⋅B=∑Ai2∑Bi2∑(Ai×Bi)
- Range = [−1,+1]
- L2 Squared
- measures straight line distance b/w points
- Euclidean distance (L2 norm or L2 distance)
- L2 Squared distance
- Range = [0,+∞)
- Dot product (Inner Product or IP)
- how much one vector projects onto another
- A⋅B=∑(Ai×Bi)
- Range = (−∞,+∞)
- https://docs.trychroma.com/docs/collections/configure#single-node
# Langchain example
db = Chroma.from_documents(documents, embeddings_model,
collection_metadata={
"hnsw:space": "cosine", # 'l2', 'cosine', or 'ip'
}
)
db.add_documents(more_docs)
db.delete(ids)
db.similarity_search(query, k) # returns top k similar embeddings