Deprecated as of v1.2.0The SDPM (Semantic Double-Pass Merging) functionality has been integrated into the main The new SemanticChunker provides all SDPM capabilities plus additional improvements like Savitzky-Golay filtering for better boundary detection.
SemanticChunker
.Recommended Migration:SDPMChunker
extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text.
Why Use the New SemanticChunker Instead?
The newSemanticChunker
includes all SDPM functionality plus:
- Better performance: Optimized C extensions for faster processing
- Smoother boundaries: Savitzky-Golay filtering for noise reduction
- Cleaner API: Simplified parameter names and improved defaults
- Active development: Ongoing improvements and bug fixes
Legacy Installation
If you need to use the legacy version for compatibility:Legacy Usage
This documentation is preserved for users who need to maintain existing code using SDPMChunker. For new projects, please use the main SemanticChunker.
Basic Initialization
Legacy Parameters
The legacy SDPMChunker uses these parameters (many now renamed in the new SemanticChunker):embedding_model
: Model identifier or embedding instancemode
: “cumulative” or “window” (removed in new version)threshold
: Similarity threshold (0-1) or “auto”chunk_size
: Maximum tokens per chunksimilarity_window
: Sentences for threshold calculationmin_sentences
: Minimum sentences per chunk (nowmin_sentences_per_chunk
)min_chunk_size
: Minimum tokens per chunk (removed in new version)min_characters_per_sentence
: Minimum characters per sentencethreshold_step
: Step size for threshold calculation (removed in new version)skip_window
: Number of chunks to skip when merging
Example Migration
Old Code (Legacy)
New Code (Recommended)
Return Type Changes
Legacy Return Type
The legacy SDPMChunker returnsSemanticChunk
objects with sentence details:
New Return Type
The new SemanticChunker returns simplerChunk
objects:
Full Legacy Documentation
For users who must use the legacy version, the complete original functionality remains available:Support
While the legacy SDPMChunker remains available for backward compatibility, it is no longer actively developed. Please consider migrating to the new SemanticChunker for:- Better performance
- Active bug fixes
- New features
- Ongoing support