Getting Started
Installation
Installing Chonkie and its various components
Chonkie follows a modular approach to dependencies, keeping the base installation lightweight while allowing you to add extra features as needed.
Basic Installation
For basic token and word chunking capabilities:
Installation Options
Chonkie provides several installation options to match your specific needs:
Chunker Availability
The following table shows which chunkers are available with different installation options:
Chunker | Default | embeddings | ’all’ |
---|---|---|---|
TokenChunker | ✅ | ✅ | ✅ |
WordChunker | ✅ | ✅ | ✅ |
SentenceChunker | ✅ | ✅ | ✅ |
SemanticChunker | ❌ | ✅ | ✅ |
SDPMChunker | ❌ | ✅ | ✅ |
Any of the embeddings availability will enable the SemanticChunker
and SDPMChunker
. You can use the chonkie[semantic]
install for quick access to these features.
Embeddings Availability
Different embedding providers are available with different installation options:
Embeddings Provider | Default | ’model2vec' | 'st' | 'openai' | 'semantic' | 'all’ |
---|---|---|---|---|---|---|
Model2VecEmbeddings | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ |
SentenceTransformerEmbeddings | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
OpenAIEmbeddings | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |
Dependencies
Here’s what each installation option adds:
Installation Option | Additional Dependencies |
---|---|
Default | autotiktokenizer |
’model2vec’ | + model2vec, numpy |
’st’ | + sentence-transformers, numpy |
’openai’ | + openai, tiktoken |
’semantic’ | + model2vec, numpy |
’all’ | all above dependencies |
Important Notes
- We provide separate
semantic
andall
installs pre-packaged that might match other installation options breeding redundancy. This redundancy is intentional to provide users with the best experience and freedom to choose their preferred means. - The
semantic
andall
optional installs may change in future versions, so what you download today may not be the same for tomorrow. - Installing either ‘semantic’ or ‘openai’ extras will enable SemanticChunker and SDPMChunker, as these chunkers can work with any embeddings provider. The difference is in which embedding providers are available for use with these chunkers.
Was this page helpful?