v1.3.0
v1.3.0 Release Highlights ✨
Breaking Changes
-
Unified Chunk Type: All chunkers now return the base
Chunktype instead of specialized types. The specialized chunk types (SentenceChunk,RecursiveChunk,SemanticChunk,CodeChunk, andLateChunk) have been removed entirely. This simplifies the API and improves interoperability between different chunkers and refineries. -
Unified Sentence Type: The
SemanticSentencetype has been removed. The baseSentencetype now includes an optionalembeddingattribute, providing the same functionality with a simpler API. -
New
embeddingAttribute: Both the baseChunkandSentencetypes now include an optionalembeddingattribute that can store embedding vectors (as lists or numpy arrays). This is automatically populated byEmbeddingsRefineryand certain chunkers likeLateChunker.
Migration Guide
If you were relying on specialized chunk attributes:SentenceChunk.sentences→ No longer available in base ChunkSemanticChunk.sentences→ No longer available in base ChunkCodeChunk.nodes→ No longer available in base ChunkRecursiveChunk.level→ No longer available in base ChunkLateChunk→ Use baseChunk(embedding is now part of base type)SemanticSentence→ Use baseSentence(embedding is now part of base type)
Chunk objects with:Import Changes
When importing the Chunk type, use:v1.0.6
v1.0.6 Release Highlights ✨
- New
SlumberChunker: Welcome Chonkie’s very own agentic chunker! Requires thegenieoptional install and aGEMINI_API_KEY. It leveragesGenie, Chonkie’s interface for generative models.
- New
NeuralChunker: Introducing a fully neural approach to chunking! Requires theneuraloptional install. This uses a fine-tuned BERT-like model for fast, high-quality chunking.
autoLanguage Detection forCodeChunker:CodeChunkercan now automatically detect the programming language. Specify the language manually if performance is critical.
- Introducing
Genies: AddedGenieto powerSlumberChunkerand future generative features.Genies are Chonkie’s way to handle multiple generative APIs and model interfaces. The first isGeminiGenie, requiring thegenieoptional install.
v1.0.5
v1.0.5 Release Highlights ✨
This is a quick patch release to includeCodeChunker in the __init__.py for chonkie so it can be properly accessed via from chonkie import CodeChunker.Full Changelog: https://github.com/chonkie-inc/chonkie/compare/v1.0.4…v1.0.5v1.0.4
v1.0.4 Release Highlights ✨
- New
CodeChunker: Introducing theCodeChunker, specialized for handling code files across 100+ programming languages. It understands code structure to provide more meaningful chunks.
JinaAIEmbeddings Support: AddedJinaEmbeddings, enabling their use withSemanticChunkerandSDPMChunker. Just install thejinaoptional install to use it!
OverlapRefinery: Enhance your chunks by adding overlapping context using the newOverlapRefinery. It’s included in the default install and works seamlessly with any chunker.
EmbeddingsRefinery: Compute and attach embeddings directly to your chunks using theEmbeddingsRefinery. Streamline the process of loading chunks into vector databases.
v1.0.3
v1.0.3 Release Highlights ✨
-
Chonkie
Visualizer: Visualize and debug chunks easily via terminal printouts or HTML saves. Understand chunk quality and debug your chunker with visual feedback~ Use theprintmethod to print rich text on your terminal or use thesavemethod to save a highlightedhtmlon your device! It’s very simple to use, just pass in your chunks~ -
Recipes: Chonkie now adds support for
Recipeswhich allow you to use multilingual chunking out-of-the-box, as well as document specific chunking methods. Initial support starts with:en,hi,zh,jpandko, while document typemarkdownis supported too. Use it via thefrom_recipeclass method with any chunker that takes delimiters orRecursiveRules. -
Performance enhancements in
RecursiveChunker,SentenceChunker, andWordTokenizer.
