SDPMChunker
Split text using Semantic Double-Pass Merging for improved context preservation
The SDPMChunker
extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text. This technique is particularly useful for documents with recurring themes or concepts spread apart.
Installation
SDPMChunker requires additional dependencies for semantic capabilities. You can install it with:
Initialization
Parameters
Model identifier or embedding model instance
Mode for grouping sentences, either “cumulative” or “window”
When in the range [0,1], denotes the similarity threshold to consider sentences similar. When in the range (1,100], interprets the given value as a percentile threshold. When set to “auto”, the threshold is automatically calculated.
Maximum tokens per chunk
Number of sentences to consider for similarity threshold calculation
Minimum number of sentences per chunk
Minimum tokens per chunk
Minimum number of characters per sentence
Step size for threshold calculation
Delimiters to split sentences on
Number of chunks to skip when looking for similarities
Usage
Single Text Chunking
Batch Chunking
Supported Embeddings
SDPMChunker supports multiple embedding providers through Chonkie’s embedding system. See the Embeddings Overview for more information.
Return Type
SDPMChunker returns SemanticChunk
objects with optimized storage using slots:
Was this page helpful?