Chonkie Documentation

The PgvectorHandshake class provides seamless integration between Chonkie’s chunking system and PostgreSQL with pgvector. It uses the vecs client library from Supabase underneath to provide a higher-level API with automatic indexing, metadata filtering, and simplified connection management.

Store your Chonkie chunks in PostgreSQL with vector embeddings and perform semantic search without ever leaving the Chonkie SDK.

Installation

Before using the Pgvector handshake, make sure to install the required dependencies:

pip install chonkie[pgvector]

You’ll also need PostgreSQL with the pgvector extension installed:

-- Connect to your database and enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

Initialization

from chonkie import PgvectorHandshake

# Initialize with individual connection parameters
handshake = PgvectorHandshake(
    host="localhost",
    port=5432,
    database="your_database",
    user="your_user",
    password="your_password",
    collection_name="chonkie_chunks"
)

# Or use a connection string
handshake = PgvectorHandshake(
    connection_string="postgresql://user:password@localhost:5432/database"
)

# Or use an existing vecs client
import vecs
client = vecs.create_client("postgresql://user:password@localhost:5432/database")
handshake = PgvectorHandshake(client=client, collection_name="my_collection")

Usage

Writing Chunks

Store your chunked text in PostgreSQL with vector embeddings

from chonkie import PgvectorHandshake, RecursiveChunker

# Initialize the handshake
handshake = PgvectorHandshake(
    host="localhost",
    database="my_database",
    user="my_user",
    password="my_password"
)

# Create some chunks
chunker = RecursiveChunker(chunk_size=512)
chunks = chunker.chunk("Chonkie makes PostgreSQL vector search easy!")

# Write chunks to PostgreSQL
handshake.write(chunks)

Searching Chunks

Find similar chunks using vector similarity search

# Search for similar chunks
results = handshake.search(
    query="PostgreSQL vector search",
    limit=5
)

for result in results:
    print(f"Text: {result['text']}")
    print(f"Similarity: {result['similarity']:.3f}")
    print("---")

Creating Indexes

Optimize search performance with vector indexes

Parameters

client

Optional[vecs.Client]

default:"None"

An existing vecs.Client instance. If provided, other connection parameters are ignored.

host

str

default:"localhost"

PostgreSQL host address.

port

int

default:"5432"

PostgreSQL port number.

database

str

default:"postgres"

PostgreSQL database name.

user

str

default:"postgres"

PostgreSQL username.

password

str

default:"postgres"

PostgreSQL password.

connection_string

Optional[str]

default:"None"

Full PostgreSQL connection string. If provided, individual connection parameters are ignored.

collection_name

str

default:"chonkie_chunks"

Name of the collection to store chunks in.

embedding_model

Union[str, BaseEmbeddings]

default:"minishlab/potion-retrieval-32M"

Embedding model to use. Can be a model name or a BaseEmbeddings instance.

vector_dimensions

Optional[int]

default:"None"

Number of dimensions for the vector embeddings. If not provided, will be inferred from the embedding model.

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Utils

Experimental

Pgvector Handshake

Installation

Initialization

Usage

Parameters

Getting Started

Chunkers

Embeddings

Refinery

Handshakes

Utils

Experimental

​Installation

​Initialization

​Usage

​Parameters

Installation

Initialization

Usage

Parameters