SentenceChunker
Split text into chunks while preserving sentence boundaries
The SentenceChunker
splits text into chunks while preserving complete sentences, ensuring that each chunk maintains proper sentence boundaries and context.
Installation
SentenceChunker is included in the base installation of Chonkie. No additional dependencies are required.
Initialization
Parameters
Tokenizer to use. Can be a string identifier or a tokenizer instance
Maximum number of tokens per chunk
Number of overlapping tokens between chunks
Minimum tokens per chunk
Minimum number of sentences to include in each chunk
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Supported Tokenizers
SentenceChunker supports multiple tokenizer backends:
-
TikToken (Recommended)
-
AutoTikTokenizer
-
Hugging Face Tokenizers
-
Transformers
Return Type
SentenceChunker returns chunks as SentenceChunk
objects with additional sentence metadata:
Was this page helpful?