from langchain_ollama import OllamaEmbeddingsembeddings = OllamaEmbeddings( model="embeddinggemma:latest",)# used for query textresult = embeddings.embed_query("Hello world")print(len(result)) # equal to embedding length of the model# print(result)texts = [ "Hello Sonam", "How are you?", "Where are you now"]# used for list of documentsembedded_docs = embeddings.embed_documents(texts)print(len(embedded_docs)) # 3
Semantic Relationship
Embeddings hold semantic relationships
In 2013, Google researchers saw relationships in Word2Vec embeddings
King−Man+Woman≈Queen
Provide a standard interface for reading data from different sources into LangChain’s Document format
Methods
load() – Loads all documents at once.
lazy_load() – Streams documents lazily, useful for large datasets.
Examples
Unstructured
GoogleDriveLoader
YoutubeLoader
Langchain Document Format
Contains
metadata (dict)
page_content (str)
Text Splitting
aka Chunking
Text Splitters break large docs into smaller chunks that will be retrievable individually and fit within model context window limit
Types
Text structure based
RecursiveTextSplitter
Length based
TokenTextSplitter
CharacterTextSplitter
Document structure based
Code Splitter
HTML Splitter
Markdown Splitter
JSON Splitter
text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, # chunk size (characters) chunk_overlap=200, # chunk overlap (characters) length_function= len, # Fn that measures length of chunk add_start_index=True, # track index in original document strip_whitespace=True # string whitespace from start/end of each document)