Chonkie Documentation

The EmbeddingsRefinery allows you to add more more information to your chunks by adding embeddings to them. This is useful for downstream tasks like semantic search, clustering, or vector database insertions.

API Reference

To use the EmbeddingsRefinery via the API, check out the API reference documentation.

Initialization

To use the EmbeddingsRefinery, you need to initialize it with an embedding model.

from chonkie import EmbeddingsRefinery

# Initialize with string model identifier
# or an embedding model instance
em_refinery = EmbeddingsRefinery(
    embedding_model="minishlab/potion-base-8M",  # Required
)

Usage

Use the EmbeddingsRefinery object as a callable or the refine method to add embeddings to your chunks.

from chonkie import TokenChunker, EmbeddingsRefinery

test_string = "This is a test string. It will be chunked and embedded."
chunker = TokenChunker()
chunks = chunker(test_string)

# Add embeddings to the chunks
em_refinery = EmbeddingsRefinery(
    embedding_model="minishlab/potion-base-8M",  # Model string or BaseEmbeddings instance
)

chunks_with_embeddings = em_refinery(chunks)

Parameters

embedding_model

Union[str, BaseEmbeddings]

Model identifier or embedding model instance

Overlap Refinery

Handshakes Overview

On this page

API Reference
Initialization
Usage
Parameters

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Utils

Experimental

Embeddings Refinery

API Reference

Initialization

Usage

Parameters

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Utils

Experimental

​API Reference

​Initialization

​Usage

​Parameters

API Reference

Initialization

Usage

Parameters