Skip to main content
The MongoDBHandshake class provides integration between Chonkie’s chunking system and MongoDB, a popular NoSQL database. Embed and store your Chonkie chunks in MongoDB directly from the Chonkie SDK.

Installation

Before using the MongoDB handshake, make sure to install the required dependencies:
pip install chonkie[mongodb]

Initialization

from chonkie import MongoDBHandshake

# Initialize with default settings (auto-generated database/collection)
handshake = MongoDBHandshake(uri="mongodb://localhost:27017")

Parameters

client
Optional[pymongo.MongoClient]
default:"None"
MongoDB client instance. If not provided, a new client will be created based on other parameters.
uri
Optional[str]
default:"None"
MongoDB connection URI.
username
Optional[str]
default:"None"
MongoDB username for authentication.
password
Optional[str]
default:"None"
MongoDB password for authentication.
hostname
Optional[str]
default:"None"
MongoDB host address.
port
Optional[Union[int, str]]
default:"None"
MongoDB port number.
db_name
Union[str, Literal['random']]
default:"random"
Name of the database to use. If “random”, a unique name will be generated.
collection_name
Union[str, Literal['random']]
default:"random"
Name of the collection to use. If “random”, a unique name will be generated.
embedding_model
Union[str, BaseEmbeddings]
default:"minishlab/potion-retrieval-32M"
Embedding model to use. Can be a model name or a BaseEmbeddings instance.
**kwargs
Dict[str, Any]
default:"{}"
Additional keyword arguments to pass to the MongoDB client or collection creation.

Writing Chunks to MongoDB

from chonkie import MongoDBHandshake, SemanticChunker    

# Initialize the handshake
handshake = MongoDBHandshake(
    uri="mongodb://localhost:27017",
    db_name="my_documents",
    collection_name="my_collection"
)

# Create some chunks
chunker = SemanticChunker()
chunks = chunker.chunk("Chonkie loves to chonk your texts!")

# Write chunks to MongoDB
handshake.write(chunks)

Searching Chunks in MongoDB

You can retrieve the most similar chunks from your MongoDB collection using the search method:
from chonkie import MongoDBHandshake

# Initialize the handshake
handshake = MongoDBHandshake(
    uri="mongodb://localhost:27017",
    db_name="my_documents",
    collection_name="my_collection"
)
results = handshake.search(query="chonk your texts", limit=2)
for result in results:
    print(result["score"], result["text"])
I