Chonkie Documentation

Chonkie provides multiple chunking strategies to handle different text processing needs. Each chunker in Chonkie is designed to follow the same core principles outlined in the concepts page.

TokenChunker

Splits text into fixed-size token chunks. Best for maintaining consistent chunk sizes and working with token-based models.

SentenceChunker

Splits text at sentence boundaries. Perfect for maintaining semantic completeness at the sentence level.

RecursiveChunker

Recursively chunks documents into smaller chunks. Best for long documents with well-defined structure.

SemanticChunker

Groups content based on semantic similarity. Best for preserving context and topical coherence.

LateChunker

Chunks using Late Chunking algorithm, best for higher recall in your RAG applications.

CodeChunker

Splits code based on its structure using ASTs. Ideal for chunking source code files.

NeuralChunker

Uses a fine-tuned BERT model to split text based on semantic shifts. Great for topic-coherent chunks.

SlumberChunker

Agentic chunking using generative models (LLMs) via the Genie interface for S-tier chunk quality. 🦛🧞

TableChunker

Splits large markdown tables into smaller, manageable chunks by row, preserving headers. Great for tabular data in RAG and LLM pipelines.

Availability

Different chunkers are available depending on your installation:

Chunker	Default	embeddings	`"chonkie[all]"`	Chonkie JS	API Chunking
TokenChunker
SentenceChunker
RecursiveChunker
TableChunker
CodeChunker
SemanticChunker
LateChunker
NeuralChunker
SlumberChunker

Common Interface

All chunkers share a consistent interface:

# Single text chunking
chunks = chunker.chunk(text)

# Batch processing
chunks = chunker.chunk_batch(texts)

# Direct calling
chunks = chunker(text)  # or chunker([text1, text2])

F.A.Q.

Are all the chunkers thread-safe?

Yes, all the chunkers are thread-safe. Though, the performance might vary since some chunkers use threading under the hood. So, monitor your performance accordingly.

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

Chunkers Overview

TokenChunker

SentenceChunker

RecursiveChunker

SemanticChunker

LateChunker

CodeChunker

NeuralChunker

SlumberChunker

TableChunker

Availability

Common Interface

F.A.Q.

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

TokenChunker

SentenceChunker

RecursiveChunker

SemanticChunker

LateChunker

CodeChunker

NeuralChunker

SlumberChunker

TableChunker

​Availability

​Common Interface

​F.A.Q.

Availability

Common Interface

F.A.Q.