OverlapRefinery enhances chunks by incorporating context from neighboring chunks. This is useful for tasks where maintaining contextual continuity between chunks is important, such as question answering or summarization over long documents. It can add context as a prefix (from the preceding chunk) or a suffix (from the next chunk).
API Reference
To use theOverlapRefinery via the API, check out the API reference documentation.
Initialization
To use theOverlapRefinery, initialize it with the desired parameters. You can specify a tokenizer, context size, overlap mode, method, and other options.
Usage
Use theOverlapRefinery object as a callable or use the refine method to add overlapping context to your chunks.
Parameters
The tokenizer to use for calculating overlap size. Can be a
string identifier (e.g., “character”, “word”, “gpt2”), a callable, or a
chonkie.Tokenizer instance. Defaults to “character”.The size of the overlap context. If an
int, it’s the absolute number of
tokens. If a float (between 0 and 1), it’s the fraction of the maximum chunk
token count.The mode for calculating overlap.
"token" uses the tokenizer directly.
"recursive" uses hierarchical splitting based on rules.The method for adding context.
"suffix" adds context from the next chunk
to the end of the current chunk. "prefix" adds context from the previous
chunk to the beginning of the current chunk.The rules used for splitting text when
mode is "recursive". Defines
delimiters and behavior at different hierarchical levels. See
chonkie.types.RecursiveRules.If
True, the calculated context is directly prepended (for prefix) or
appended (for suffix) to the chunk.text. If False, the context is stored
in chunk.context attribute without modifying chunk.text.If
True, modifies the input list of chunks directly. If False, returns a
new list of modified chunks.