TokenChunker
Splits text into fixed-size token chunks. Best for maintaining consistent
chunk sizes and working with token-based models.
SentenceChunker
Splits text at sentence boundaries. Perfect for maintaining semantic
completeness at the sentence level.
RecursiveChunker
Recursively chunks documents into smaller chunks. Best for long documents
with well-defined structure.
SemanticChunker
Groups content based on semantic similarity. Best for preserving context and
topical coherence.
LateChunker
Chunks using Late Chunking algorithm, best for higher recall in your RAG
applications.
CodeChunker
Splits code based on its structure using ASTs. Ideal for chunking source
code files.
NeuralChunker
Uses a fine-tuned BERT model to split text based on semantic shifts. Great
for topic-coherent chunks.
SlumberChunker
Agentic chunking using generative models (LLMs) via the Genie interface for
S-tier chunk quality. 🦛🧞
TableChunker
Splits large markdown tables into smaller, manageable chunks by row,
preserving headers. Great for tabular data in RAG and LLM pipelines.
Availability
Different chunkers are available depending on your installation:Chunker | Default | embeddings | "chonkie[all]" | Chonkie JS | API Chunking |
---|---|---|---|---|---|
TokenChunker | |||||
SentenceChunker | |||||
RecursiveChunker | |||||
TableChunker | |||||
CodeChunker | |||||
SemanticChunker | |||||
LateChunker | |||||
NeuralChunker | |||||
SlumberChunker |
Common Interface
All chunkers share a consistent interface:F.A.Q.
Are all the chunkers thread-safe?
Are all the chunkers thread-safe?
Yes, all the chunkers are thread-safe. Though, the performance might vary
since some chunkers use threading under the hood. So, monitor your
performance accordingly.