TableChunker
splits large markdown tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.
API Reference
Use therecursive
endpoint to access table chunking functionality.
On the API, the table chunker operates as part of the recursive chunker,
allowing you to process documents containing inline tables while ensuring
that table structures remain intact across chunk boundaries.
Installation
TableChunker is included in the base installation of Chonkie. No additional dependencies are required.For installation instructions, see the Installation
Guide.
Initialization
Parameters
Tokenizer to use. Can be a string identifier (“character”, “word”, “gpt2”,
etc.) or a tokenizer instance.
Maximum number of tokens or characters per chunk.
Usage
Example Output
Each chunk is a valid markdown table segment, always including the header. For the example above andchunk_size=16
, you might get:
Methods
chunk(table: str) -> List[Chunk]
: Chunk a markdown table string.chunk_document(document: Document) -> Document
: Chunk all tables in aMarkdownDocument
.
Notes
- Requires at least a header, separator, and one data row.
- If the table fits within the chunk size, it is returned as a single chunk.
- For advanced use, pass a custom tokenizer for token-based chunking.
See also: Chunkers Overview