Chonkie Documentation

The TableChunker splits large markdown tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.

API Reference

Use the recursive endpoint to access table chunking functionality. On the API, the table chunker operates as part of the recursive chunker, allowing you to process documents containing inline tables while ensuring that table structures remain intact across chunk boundaries.

Installation

TableChunker is included in the base installation of Chonkie. No additional dependencies are required.

For installation instructions, see the Installation Guide.

Initialization

from chonkie import TableChunker

# Basic initialization with default parameters
chunker = TableChunker(
	tokenizer="character",  # Default tokenizer (or use "gpt2", etc.)
	chunk_size=2048         # Maximum tokens or characters per chunk
)

Parameters

tokenizer

Union[str, Callable, Any]

default:"character"

Tokenizer to use. Can be a string identifier (“character”, “word”, “gpt2”, etc.) or a tokenizer instance.

chunk_size

int

default:"2048"

Maximum number of tokens or characters per chunk.

Usage

from chonkie import TableChunker

table = """
| Name   | Age | City     |
|--------|-----|----------|
| Alice  | 30  | New York |
| Bob    | 25  | London   |
| Carol  | 28  | Paris    |
| Dave   | 35  | Berlin   |
"""

chunker = TableChunker(chunk_size=16)
chunks = chunker.chunk(table)
for chunk in chunks:
	print(chunk.text)

Example Output

Each chunk is a valid markdown table segment, always including the header. For the example above and chunk_size=16, you might get:

| Name  | Age | City     |
| ----- | --- | -------- |
| Alice | 30  | New York |
| Bob   | 25  | London   |

| Name  | Age | City   |
| ----- | --- | ------ |
| Carol | 28  | Paris  |
| Dave  | 35  | Berlin |

Methods

chunk(table: str) -> List[Chunk]: Chunk a markdown table string.
chunk_document(document: Document) -> Document: Chunk all tables in a MarkdownDocument.

Notes

Requires at least a header, separator, and one data row.
If the table fits within the chunk size, it is returned as a single chunk.
For advanced use, pass a custom tokenizer for token-based chunking.

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

Table Chunker

API Reference

Installation

Initialization

Parameters

Usage

Example Output

Methods

Notes

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

​API Reference

​Installation

​Initialization

​Parameters

​Usage

​Example Output

​Methods

​Notes

API Reference

Installation

Initialization

Parameters

Usage

Example Output

Methods

Notes