TableChunker splits large markdown tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.
API Reference
Use therecursive endpoint to access table chunking functionality.
On the API, the table chunker operates as part of the recursive chunker,
allowing you to process documents containing inline tables while ensuring
that table structures remain intact across chunk boundaries.
Installation
TableChunker is included in the base installation of Chonkie. No additional dependencies are required.For installation instructions, see the Installation
Guide.
Initialization
Parameters
Tokenizer to use. Default is “row”. Can be a string identifier (“row”, “character”, “word”, “gpt2”, “byte”,
etc.) or a tokenizer instance.
Maximum number of rows (if tokenizer=“row”) or tokens/characters per chunk.
Usage
Methods
chunk(table: str) -> list[Chunk]: Chunk a markdown table string.chunk_document(document: Document) -> Document: Chunk all tables in aMarkdownDocument.
Notes
- Requires at least a header, separator, and one data row.
- If the table fits within the chunk size, it is returned as a single chunk.
- For advanced use, pass a custom tokenizer for token-based chunking.
See also: Chunkers Overview
