Chunkers
TokenChunker
Split text into fixed-size token chunks with configurable overlap
The TokenChunker
splits text into chunks based on token count, ensuring each chunk stays within specified token limits.
Installation
TokenChunker is included in the base installation of Chonkie. No additional dependencies are required.
For installation instructions, see the Installation Guide.
Initialization
Parameters
Tokenizer to use. Can be a string identifier or a tokenizer instance
Maximum number of tokens per chunk
Number or percentage of overlapping tokens between chunks
Usage
Single Text Chunking
Batch Chunking
Using as a Callable
Supported Tokenizers
TokenChunker supports multiple tokenizer backends:
-
TikToken (Recommended)
-
AutoTikTokenizer
-
Hugging Face Tokenizers
-
Transformers
Return Type
TokenChunker returns chunks as Chunk
objects with the following attributes:
Was this page helpful?