v1.3.0
v1.3.0 Release Highlights ✨
Breaking Changes
-
Unified Chunk Type: All chunkers now return the base
Chunk
type instead of specialized types. The specialized chunk types (SentenceChunk
,RecursiveChunk
,SemanticChunk
,CodeChunk
, andLateChunk
) have been removed entirely. This simplifies the API and improves interoperability between different chunkers and refineries. -
Unified Sentence Type: The
SemanticSentence
type has been removed. The baseSentence
type now includes an optionalembedding
attribute, providing the same functionality with a simpler API. -
New
embedding
Attribute: Both the baseChunk
andSentence
types now include an optionalembedding
attribute that can store embedding vectors (as lists or numpy arrays). This is automatically populated byEmbeddingsRefinery
and certain chunkers likeLateChunker
.
Migration Guide
If you were relying on specialized chunk attributes:SentenceChunk.sentences
→ No longer available in base ChunkSemanticChunk.sentences
→ No longer available in base ChunkCodeChunk.nodes
→ No longer available in base ChunkRecursiveChunk.level
→ No longer available in base ChunkLateChunk
→ Use baseChunk
(embedding is now part of base type)SemanticSentence
→ Use baseSentence
(embedding is now part of base type)
Chunk
objects with:Import Changes
When importing the Chunk type, use:v1.0.6
v1.0.6 Release Highlights ✨
- New
SlumberChunker
: Welcome Chonkie’s very own agentic chunker! Requires thegenie
optional install and aGEMINI_API_KEY
. It leveragesGenie
, Chonkie’s interface for generative models.
- New
NeuralChunker
: Introducing a fully neural approach to chunking! Requires theneural
optional install. This uses a fine-tuned BERT-like model for fast, high-quality chunking.
auto
Language Detection forCodeChunker
:CodeChunker
can now automatically detect the programming language. Specify the language manually if performance is critical.
- Introducing
Genie
s: AddedGenie
to powerSlumberChunker
and future generative features.Genie
s are Chonkie’s way to handle multiple generative APIs and model interfaces. The first isGeminiGenie
, requiring thegenie
optional install.
v1.0.5
v1.0.5 Release Highlights ✨
This is a quick patch release to includeCodeChunker
in the __init__.py
for chonkie
so it can be properly accessed via from chonkie import CodeChunker
.Full Changelog: https://github.com/chonkie-inc/chonkie/compare/v1.0.4…v1.0.5v1.0.4
v1.0.4 Release Highlights ✨
- New
CodeChunker
: Introducing theCodeChunker
, specialized for handling code files across 100+ programming languages. It understands code structure to provide more meaningful chunks.
JinaAI
Embeddings Support: AddedJinaEmbeddings
, enabling their use withSemanticChunker
andSDPMChunker
. Just install thejina
optional install to use it!
OverlapRefinery
: Enhance your chunks by adding overlapping context using the newOverlapRefinery
. It’s included in the default install and works seamlessly with any chunker.
EmbeddingsRefinery
: Compute and attach embeddings directly to your chunks using theEmbeddingsRefinery
. Streamline the process of loading chunks into vector databases.
v1.0.3
v1.0.3 Release Highlights ✨
-
Chonkie
Visualizer
: Visualize and debug chunks easily via terminal printouts or HTML saves. Understand chunk quality and debug your chunker with visual feedback~ Use theprint
method to print rich text on your terminal or use thesave
method to save a highlightedhtml
on your device! It’s very simple to use, just pass in your chunks~ -
Recipes: Chonkie now adds support for
Recipes
which allow you to use multilingual chunking out-of-the-box, as well as document specific chunking methods. Initial support starts with:en
,hi
,zh
,jp
andko
, while document typemarkdown
is supported too. Use it via thefrom_recipe
class method with any chunker that takes delimiters orRecursiveRules
. -
Performance enhancements in
RecursiveChunker
,SentenceChunker
, andWordTokenizer
.