Skip to main content
The TableChunker splits large markdown tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.

API Reference

Use the recursive endpoint to access table chunking functionality. On the API, the table chunker operates as part of the recursive chunker, allowing you to process documents containing inline tables while ensuring that table structures remain intact across chunk boundaries.

Installation

TableChunker is included in the base installation of Chonkie. No additional dependencies are required.
For installation instructions, see the Installation Guide.

Initialization

from chonkie import TableChunker

# Basic initialization with default parameters
chunker = TableChunker(
	tokenizer="character",  # Default tokenizer (or use "gpt2", etc.)
	chunk_size=2048         # Maximum tokens or characters per chunk
)

Parameters

tokenizer
Union[str, Callable, Any]
default:"character"
Tokenizer to use. Can be a string identifier (“character”, “word”, “gpt2”, etc.) or a tokenizer instance.
chunk_size
int
default:"2048"
Maximum number of tokens or characters per chunk.

Usage

from chonkie import TableChunker

table = """
| Name   | Age | City     |
|--------|-----|----------|
| Alice  | 30  | New York |
| Bob    | 25  | London   |
| Carol  | 28  | Paris    |
| Dave   | 35  | Berlin   |
"""

chunker = TableChunker(chunk_size=16)
chunks = chunker.chunk(table)
for chunk in chunks:
	print(chunk.text)

Example Output

Each chunk is a valid markdown table segment, always including the header. For the example above and chunk_size=16, you might get:
| Name  | Age | City     |
| ----- | --- | -------- |
| Alice | 30  | New York |
| Bob   | 25  | London   |
| Name  | Age | City   |
| ----- | --- | ------ |
| Carol | 28  | Paris  |
| Dave  | 35  | Berlin |

Methods

  • chunk(table: str) -> List[Chunk]: Chunk a markdown table string.
  • chunk_document(document: Document) -> Document: Chunk all tables in a MarkdownDocument.

Notes

  • Requires at least a header, separator, and one data row.
  • If the table fits within the chunk size, it is returned as a single chunk.
  • For advanced use, pass a custom tokenizer for token-based chunking.

See also: Chunkers Overview
I