The EmbeddingsRefinery allows you to add more more information to your chunks by adding embeddings to them. This is useful for downstream tasks like semantic search, clustering, or vector database insertions.

API Reference

To use the EmbeddingsRefinery via the API, check out the API reference documentation.

Initialization

To use the EmbeddingsRefinery, you need to initialize it with an embedding model.

from chonkie import EmbeddingsRefinery

# Initialize with string model identifier
# or an embedding model instance
em_refinery = EmbeddingsRefinery(
    embedding_model="minishlab/potion-base-8M",  # Required
)

Usage

Use the EmbeddingsRefinery object as a callable or the refine method to add embeddings to your chunks.

from chonkie import TokenChunker, EmbeddingsRefinery

test_string = "This is a test string. It will be chunked and embedded."
chunker = TokenChunker()
chunks = chunker(test_string)

# Add embeddings to the chunks
em_refinery = EmbeddingsRefinery(
    embedding_model="minishlab/potion-base-8M",  # Model string or BaseEmbeddings instance
)

chunks_with_embeddings = em_refinery(chunks)

Parameters

embedding_model
Union[str, BaseEmbeddings]

Model identifier or embedding model instance