Chonkie Documentation

Semantic Chunker

curl --request POST \
  --url https://api.example.com/v1/chunk/semantic

{
  "text": "<string>",
  "start_index": 123,
  "end_index": 123,
  "token_count": 123
}

Semantic Chunker

curl --request POST \
  --url https://api.example.com/v1/chunk/semantic

{
  "text": "<string>",
  "start_index": 123,
  "end_index": 123,
  "token_count": 123
}

The Semantic Chunker uses embeddings to identify natural break points in text based on semantic meaning, creating chunks where topic transitions occur.

Examples

Text Input

from chonkie.cloud import SemanticChunker

chunker = SemanticChunker(
    embedding_model="minishlab/potion-base-8M",
    chunk_size=512,
)

text = "Your text here..."
chunks = chunker.chunk(text)

File Input

from chonkie.cloud import SemanticChunker

chunker = SemanticChunker(
    embedding_model="minishlab/potion-base-8M",
    chunk_size=512,
)

# Chunk from file
with open("document.txt", "rb") as f:
    chunks = chunker.chunk(file=f)

Request

Parameters

text

string | string[]

The text to chunk. Can be a single string or an array of strings for batch processing. Either text or file is required.

file

File to chunk. Use multipart/form-data encoding. Either text or file is required.

embedding_model

string

default:"minishlab/potion-base-8M"

The embedding model to use to detect semantic similarity.

tokenizer

string

default:"gpt2"

Tokenizer to use for counting tokens.

chunk_size

integer

default:"512"

Target number of tokens per chunk (soft limit).

threshold

float

default:"0.8"

Threshold for semantic similarity (0-1). Lower values create more chunks.

min_sentences_per_chunk

integer

default:"1"

Minimum number of sentences per chunk.

Response

Returns

Array of Chunk objects with semantically coherent text segments.

text

string

The chunk text content.

start_index

integer

Starting character position in the original text.

end_index

integer

Ending character position in the original text.

token_count

integer

Number of tokens in the chunk.

Recursive Chunker

Late Chunker

⌘I

Getting Started

Chunkers

Refineries

Pipelines

Knowledge Bases

Search

Agents

Parse

Semantic Chunker

Examples

Text Input

File Input

Request

Parameters

Response

Returns

Getting Started

Chunkers

Refineries

Pipelines

Knowledge Bases

Search

Agents

Parse

​Examples

​Text Input

​File Input

​Request

​Parameters

​Response

​Returns

Examples

Text Input

File Input

Request

Parameters

Response

Returns