Skip to main content
{
  "text": "<string>",
  "start_index": 123,
  "end_index": 123,
  "token_count": 123
}
The Overlap Refinery adds overlapping context between adjacent chunks for better continuity.

Request

Parameters

chunks
array
required
Array of chunk objects to add overlap to. Must be in sequential order from the same document.
tokenizer
string
default:"gpt2"
Tokenizer to use for measuring overlap.
context_size / contextSize
number
default:"0.25"
Context size as fraction or token count. Can be a float (0.0-1.0) representing a fraction of chunk size, or an integer representing exact token count.
mode
string
default:"token"
Mode for overlap. Options: “token” or “recursive”.
method
string
default:"suffix"
Method for adding context. Options: “suffix” (add context after chunk) or “prefix” (add context before chunk).
recipe
string
default:"default"
Recipe name for recursive mode. Used when mode is set to “recursive”.
lang
string
default:"en"
Language for recipe in recursive mode.
merge
boolean
default:"true"
Whether to merge overlapping chunks.

Response

Returns

Array of chunks with added overlapping context.
text
string
The chunk text with added overlap.
start_index
integer
Updated starting position reflecting overlap.
end_index
integer
Updated ending position reflecting overlap.
token_count
integer
Updated token count including overlap.

Examples

from chonkie.cloud import TokenChunker, OverlapRefinery

chunker = TokenChunker(chunk_size=100, chunk_overlap=0)
chunks = chunker.chunk("Your text here...")

refinery = OverlapRefinery(context_size=25, tokenizer="gpt2")
refined_chunks = refinery.refine(chunks)