Skip to main content
The TeraflopAIChunker uses the TeraflopAI Segmentation API to split text into semantically meaningful segments. It is especially useful for domain-specific segmentation such as legal documents.

Installation

TeraflopAI Chunker requires the teraflopai Python package:
pip install "chonkie[teraflopai]"
For general installation instructions, see the Installation Guide.

Initialization

from chonkie import TeraflopAIChunker

# Using an API key (or set the TERAFLOPAI_API_KEY environment variable)
chunker = TeraflopAIChunker(api_key="your_api_key_here")

Parameters

client
Optional[TeraflopAI]
default:"None"
An existing TeraflopAI client instance. If provided, url and api_key are ignored.
url
str
The URL for the TeraflopAI segmentation API endpoint.
api_key
Optional[str]
default:"None"
The API key for authentication. If not provided, it will be read from the TERAFLOPAI_API_KEY environment variable.
tokenizer
Union[str, TokenizerProtocol]
default:"character"
The tokenizer used to compute token counts for returned chunks.

Usage

Single Text Chunking

text = """UNITED STATES of America, Appellee, v. Daniel Dee VEON, Appellant.
No. 72-1889.
United States Court of Appeals, Ninth Circuit.
Feb. 12, 1973."""

chunks = chunker.chunk(text)

for chunk in chunks:
    print(f"Chunk text: {chunk.text}")
    print(f"Token count: {chunk.token_count}")
    print(f"Start index: {chunk.start_index}")
    print(f"End index: {chunk.end_index}")

Batch Chunking

texts = [
    "First document to segment.",
    "Second document with more content to segment.",
]

batch_results = chunker(texts)

for i, chunks in enumerate(batch_results):
    print(f"Document {i}: {len(chunks)} chunks")

Using with Environment Variable

export TERAFLOPAI_API_KEY="your_api_key_here"
from chonkie import TeraflopAIChunker

# No need to pass api_key — it will be read from the environment
chunker = TeraflopAIChunker()
chunks = chunker.chunk("Your text here.")

How It Works

  1. The text is sent to the TeraflopAI Segmentation API endpoint.
  2. The API returns a list of text segments.
  3. Each segment is converted into a Chonkie Chunk object with proper start_index, end_index, and token_count fields.
The TeraflopAI Segmentation API performs the segmentation on the server side. This chunker requires an active internet connection and a valid API key.