Ever found yourself making a RAG bot yet again (your 2,342,148th one), only to realize you’re stuck having to write chunking with bloated software library X or the painfully feature-less library Y? WHY CAN’T THIS JUST BE SIMPLE, UGH?

What if all you had to do was install, import and run chunking?

Well, look no further than Chonkie! (chonkie boi is a gud boi)

Feature-rich

All the CHONKs you’d ever need for your RAG applications

Easy to use

Install, Import, CHONK - it’s that simple!

Lightning Fast

CHONK at the speed of light! zooooom

Wide Support

Supports all your favorite tokenizer, model and API CHONKs

Lightweight

No bloat, just CHONK - only 9.7MB base installation

Cute Mascot

psst it’s a pygmy hippo btw! Moto Moto approved

Quick Start

Get started with Chonkie in three simple steps: Install, Import and CHONK!

Installation

pip install chonkie

Want more features? :

pip install chonkie[all]

Chonkie follows a special approach to dependencies, keeping the base installation lightweight while allowing you to add extra features as and when needed. Please check the Installation page for more details.

Usage

Here’s a basic example to get you started:

# First import the chunker you want from Chonkie 
from chonkie import TokenChunker

# Initialize the chunker
chunker = TokenChunker() # defaults to using GPT2 tokenizer

# Here's some text to chunk
text = """
Woah! Chonkie, the chunking library is so cool! I love the tiny hippo hehe.
"""

# Chunk some text
chunks = chunker(text)

# Access chunks
for chunk in chunks:
    print(f"Chunk: {chunk.text}")
    print(f"Tokens: {chunk.token_count}")

See more usage examples in the Cookbook!

Documentation

Ready to learn more about Chonkie?