> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chonkie.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# SDPM Chunker (Legacy)

> Semantic Double-Pass Merging chunker - now integrated into SemanticChunker

<Warning>
  **Deprecated as of v1.2.0**

  The SDPM (Semantic Double-Pass Merging) functionality has been integrated into the main `SemanticChunker`.

  **Recommended Migration:**

  ```python theme={"system"}
  # Old way (deprecated)
  from chonkie.legacy import SDPMChunker
  chunker = SDPMChunker(skip_window=1)

  # New way (recommended)
  from chonkie import SemanticChunker
  chunker = SemanticChunker(skip_window=1)
  ```

  The new SemanticChunker provides all SDPM capabilities plus additional improvements like Savitzky-Golay filtering for better boundary detection.
</Warning>

The `SDPMChunker` extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text.

## Why Use the New SemanticChunker Instead?

The new `SemanticChunker` includes all SDPM functionality plus:

* **Better performance**: Optimized C extensions for faster processing
* **Smoother boundaries**: Savitzky-Golay filtering for noise reduction
* **Cleaner API**: Simplified parameter names and improved defaults
* **Active development**: Ongoing improvements and bug fixes

## Legacy Installation

If you need to use the legacy version for compatibility:

```bash theme={"system"}
pip install "chonkie[semantic]"
```

Then import from the legacy module:

```python theme={"system"}
from chonkie.legacy import SDPMChunker
```

## Legacy Usage

<Note>
  This documentation is preserved for users who need to maintain existing code using SDPMChunker. For new projects, please use the main [SemanticChunker](./semantic-chunker).
</Note>

### Basic Initialization

```python theme={"system"}
from chonkie.legacy import SDPMChunker

# Legacy initialization
chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    threshold=0.5,                              
    chunk_size=2048,                             
    min_sentences=1,                            
    skip_window=1                               
)
```

### Legacy Parameters

The legacy SDPMChunker uses these parameters (many now renamed in the new SemanticChunker):

* `embedding_model`: Model identifier or embedding instance
* `mode`: "cumulative" or "window" (removed in new version)
* `threshold`: Similarity threshold (0-1) or "auto"
* `chunk_size`: Maximum tokens per chunk
* `similarity_window`: Sentences for threshold calculation
* `min_sentences`: Minimum sentences per chunk (now `min_sentences_per_chunk`)
* `min_chunk_size`: Minimum tokens per chunk (removed in new version)
* `min_characters_per_sentence`: Minimum characters per sentence
* `threshold_step`: Step size for threshold calculation (removed in new version)
* `skip_window`: Number of chunks to skip when merging

### Example Migration

#### Old Code (Legacy)

```python theme={"system"}
from chonkie.legacy import SDPMChunker

chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    mode="window",
    threshold="auto",
    chunk_size=512,
    min_sentences=1,
    min_chunk_size=2,
    skip_window=1
)

chunks = chunker.chunk(text)
for chunk in chunks:
    print(f"Sentences: {len(chunk.sentences)}")
```

#### New Code (Recommended)

```python theme={"system"}
from chonkie import SemanticChunker

chunker = SemanticChunker(
    embedding_model="minishlab/potion-base-32M",
    threshold=0.7,  # Explicit threshold instead of "auto"
    chunk_size=512,
    min_sentences_per_chunk=1,  # Renamed parameter
    skip_window=1  # Same functionality
)

chunks = chunker.chunk(text)
for chunk in chunks:
    print(f"Token count: {chunk.token_count}")
```

## Return Type Changes

### Legacy Return Type

The legacy SDPMChunker returns `SemanticChunk` objects with sentence details:

```python theme={"system"}
@dataclass
class SemanticChunk:
    text: str
    start_index: int
    end_index: int
    token_count: int
    sentences: list[SemanticSentence]  # Detailed sentence information
```

### New Return Type

The new SemanticChunker returns simpler `Chunk` objects:

```python theme={"system"}
@dataclass
class Chunk:
    text: str
    start_index: int
    end_index: int
    token_count: int
    # No sentence details - cleaner and more efficient
```

## Full Legacy Documentation

For users who must use the legacy version, the complete original functionality remains available:

```python theme={"system"}
from chonkie.legacy import SDPMChunker

# All original parameters still work
chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    mode="window",
    threshold="auto",
    chunk_size=2048,
    similarity_window=1,
    min_sentences=1,
    min_chunk_size=2,
    min_characters_per_sentence=12,
    threshold_step=0.01,
    delim=[". ", "! ", "? ", "\n"],
    include_delim="prev",
    skip_window=1
)

# Original methods preserved
chunks = chunker.chunk(text)
batch_chunks = chunker.chunk_batch(texts)
```

## Support

While the legacy SDPMChunker remains available for backward compatibility, it is no longer actively developed. Please consider migrating to the new SemanticChunker for:

* Better performance
* Active bug fixes
* New features
* Ongoing support

For migration assistance, see the [SemanticChunker documentation](./semantic-chunker) or open an issue on our [GitHub repository](https://github.com/chonkie-ai/chonkie).