Skip to main content
The LanceDBHandshake class provides seamless integration between Chonkie’s chunking system and LanceDB, a serverless vector database built on Apache Arrow. Embed and store your Chonkie chunks in LanceDB — locally or in the cloud — without ever leaving the Chonkie SDK.

Installation

Before using the LanceDB handshake, make sure to install the required dependencies:
pip install chonkie[lancedb]

Basic Usage

Initialization

from chonkie import LanceDBHandshake

# Default: in-memory LanceDB, auto-generated table name
handshake = LanceDBHandshake()

Writing Chunks to LanceDB

from chonkie import LanceDBHandshake, SemanticChunker

# Initialize the handshake
handshake = LanceDBHandshake(uri="./my_lancedb", table_name="my_documents")

# Create some chunks
chunker = SemanticChunker()
chunks = chunker("Chonkie loves to chonk your texts!")

# Write chunks to LanceDB
handshake.write(chunks)

Searching Chunks in LanceDB

You can retrieve the most similar chunks from your LanceDB table using the search method:
from chonkie import LanceDBHandshake

handshake = LanceDBHandshake(uri="./my_lancedb", table_name="my_documents")

results = handshake.search(query="chonk your texts", limit=5)
for result in results:
    print(result["score"], result["text"])

Parameters

connection
Optional[lancedb.DBConnection]
default:"None"
An existing LanceDB connection. If not provided, a new connection is created using uri.
uri
Union[str, os.PathLike]
default:"memory://"
URI of the LanceDB database. Use "memory://" for an ephemeral in-memory database, a local directory path for persistent storage, or a db:// URI for LanceDB Cloud.
table_name
Union[str, Literal['random']]
default:"random"
Name of the table to write chunks to. If "random", a unique name is auto-generated.
embedding_model
Union[str, BaseEmbeddings]
default:"minishlab/potion-retrieval-32M"
Embedding model to use. Can be a model name string or a BaseEmbeddings instance.