Getting Started
Installation
Installing Chonkie and its various components
Chonkie follows a modular approach to dependencies, keeping the base installation lightweight while allowing you to add extra features as needed.
Basic Installation
For basic token and sentence chunking capabilities:
Installation Options
Chonkie provides several installation options to match your specific needs:
Chunker Availability
The following table shows which chunkers are available with different installation options:
Chunker | Default | embeddings | ’all’ |
---|---|---|---|
TokenChunker | ✅ | ✅ | ✅ |
RecursiveChunker | ✅ | ✅ | ✅ |
SentenceChunker | ✅ | ✅ | ✅ |
SemanticChunker | ❌ | ✅ | ❌ |
SDPMChunker | ❌ | ✅ | ❌ |
LateChunker | ❌ | ✅ | ❌ |
CodeChunker | ❌ | ❌ | ✅ |
NeuralChunker | ❌ | ✅ | ✅ |
SlumberChunker | ❌ | ✅ | ✅ |
Embeddings Availability
Different embedding providers are available with different installation options:
Embeddings Provider | Default | ’model2vec' | 'st' | 'openai' | 'semantic' | 'all’ |
---|---|---|---|---|---|---|
Model2VecEmbeddings | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ |
SentenceTransformerEmbeddings | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
OpenAIEmbeddings | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
Dependencies
Here’s what each installation option adds:
Installation Option | Additional Dependencies |
---|---|
Default | autotiktokenizer |
’hub’ | + huggingface-hub, jsonschema |
’viz’ | + rich |
’model2vec’ | + model2vec, numpy |
’st’ | + sentence-transformers, numpy, accelerate |
’openai’ | + openai, tiktoken, numpy |
’cohere’ | + cohere, numpy |
’jina’ | + numpy |
’semantic’ | + model2vec, numpy |
’code’ | + tree-sitter, tree-sitter-language-pack, magika |
’neural’ | + transformers, torch (or tensorflow/flax), sentencepiece |
’genie’ | + pydantic, google-genai |
’all’ | all above dependencies |
(Note: Specific dependencies for [genie]
might vary slightly based on implementation details and chosen models/APIs.)
Important Notes
- We provide separate
semantic
andall
installs pre-packaged that might match other installation options breeding redundancy. This redundancy is intentional to provide users with the best experience and freedom to choose their preferred means. - The
semantic
andall
optional installs may change in future versions, so what you download today may not be the same for tomorrow. - Installing either ‘semantic’ or ‘openai’ extras will enable SemanticChunker and SDPMChunker, as these chunkers can work with any embeddings provider. The difference is in which embedding providers are available for use with these chunkers.