Chunkers
WordChunker
Split text into chunks while maintaining word boundaries
The WordChunker
splits text into chunks while preserving word boundaries, ensuring that words stay intact and readable.
Installation
WordChunker is included in the base installation of Chonkie. No additional dependencies are required.
For installation instructions, see the Installation Guide.
Initialization
Parameters
Tokenizer to use. Can be a string identifier or a tokenizer instance
Maximum number of tokens per chunk
Number of overlapping tokens between chunks
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Supported Tokenizers
WordChunker supports multiple tokenizer backends:
-
TikToken (Recommended)
-
AutoTikTokenizer
-
Hugging Face Tokenizers
-
Transformers
Return Type
WordChunker returns chunks as Chunk
objects with the following attributes:
Was this page helpful?