Chunkers Overview
Overview of the different chunkers available in Chonkie
Chonkie provides multiple chunking strategies to handle different text processing needs. Each chunker in Chonkie is designed to follow the same core principles outlined in the concepts page.
TokenChunker
Splits text into fixed-size token chunks. Best for maintaining consistent chunk sizes and working with token-based models.
WordChunker
Splits text while preserving word boundaries. Ideal when you need human-readable chunks without breaking words.
SentenceChunker
Splits text at sentence boundaries. Perfect for maintaining semantic completeness at the sentence level.
SemanticChunker
Groups content based on semantic similarity. Best for preserving context and topical coherence.
SDPMChunker
Chunks using Semantic Double-Pass Merging (SDPM) algorithm, best for maintaining topical coherence when text has frequent breaks.
Availability
Different chunkers are available depending on your installation:
Chunker | Default | embeddings | ’all’ |
---|---|---|---|
TokenChunker | ✅ | ✅ | ✅ |
WordChunker | ✅ | ✅ | ✅ |
SentenceChunker | ✅ | ✅ | ✅ |
SemanticChunker | ❌ | ✅ | ✅ |
SDPMChunker | ❌ | ✅ | ✅ |
Common Interface
All chunkers share a consistent interface:
F.A.Q.
Was this page helpful?