Chonkie Documentation

The ElasticHandshake class provides seamless integration between Chonkie’s chunking system and Elasticsearch, allowing you to leverage its powerful vector search capabilities. Embed and store your Chonkie chunks in an Elasticsearch index without ever leaving the Chonkie SDK. The handshake automatically handles index creation and the necessary vector field mapping.

Installation

Before using the Elasticsearch handshake, make sure to install the required dependencies:

pip install chonkie[elastic]

Basic Usage

Initialization

from chonkie import ElasticHandshake

# Connects to http://localhost:9200 by default
handshake = ElasticHandshake()

Parameters

client

Optional[Elasticsearch]

default:"None"

An existing elasticsearch.Elasticsearch client instance. If not provided, a new client will be created based on other parameters.

index_name

Union[str, Literal['random']]

default:"random"

Name of the Elasticsearch index to use. If “random”, a unique name will be generated.

embedding_model

Union[str, BaseEmbeddings]

default:"minishlab/potion-retrieval-32M"

The embedding model to use for creating vectors. Can be a model name from Hugging Face or a BaseEmbeddings instance.

hosts

Optional[Union[str, list[str]]]

default:"None"

The URL(s) of the Elasticsearch instance(s) to connect to.

cloud_id

Optional[str]

default:"None"

The Cloud ID for connecting to an Elastic Cloud deployment.

api_key

Optional[str]

default:"None"

The API key for authenticating with Elasticsearch, commonly used for Elastic Cloud.

Writing Chunks to Elasticsearch

from chonkie import ElasticHandshake, SentenceChunker

# Initialize the handshake for your deployment
handshake = ElasticHandshake(
    cloud_id="YOUR_CLOUD_ID",
    api_key="YOUR_API_KEY",
    index_name="my_documents",
)

# Create some chunks
chunker = SentenceChunker()
chunks = chunker.chunk("Chonkie uses the bulk API for efficient indexing. It's fast and reliable!")

# Write chunks to Elasticsearch
handshake.write(chunks)

Searching Chunks in Elasticsearch

You can retrieve the most similar chunks from your Elasticsearch index using the search method, which performs a k-Nearest Neighbor (kNN) vector search.

from chonkie import ElasticHandshake

# Initialize the handshake to connect to your index
handshake = ElasticHandshake(
    hosts="YOUR_CLOUD_ID",
    api_key="YOUR_API_KEY",
    index_name="my_documents",
)

results = handshake.search(query="fast and efficient indexing", limit=2)

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

Elasticsearch Handshake

Installation

Basic Usage

Initialization

Parameters

Writing Chunks to Elasticsearch

Searching Chunks in Elasticsearch

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

​Installation

​Basic Usage

​Initialization

​Parameters

​Writing Chunks to Elasticsearch

​Searching Chunks in Elasticsearch

Installation

Basic Usage

Initialization

Parameters

Writing Chunks to Elasticsearch

Searching Chunks in Elasticsearch