SDPMChunker
Split text using Semantic Double-Pass Merging for improved context preservation
The SDPMChunker
extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text. This technique is particularly useful for documents with recurring themes or concepts spread apart.
Installation
SDPMChunker requires additional dependencies for semantic capabilities. You can install it with:
Initialization
Parameters
Model identifier or embedding model instance
Minimum similarity score (0-1) to consider sentences similar
Percentile-based threshold (0-1) for similarity
Maximum tokens per chunk
Minimum tokens per chunk
Number of sentences to start each chunk with
Number of chunks to skip when looking for similarities
Usage
Single Text Chunking
Batch Chunking
Supported Embeddings
SDPMChunker supports multiple embedding providers through Chonkie’s embedding system. See the Embeddings Overview for more information.
Return Type
SDPMChunker returns SemanticChunk
objects with optimized storage using slots:
Was this page helpful?