Split markdown or HTML tables into manageable chunks by row, preserving headers.
The TableChunker splits large markdown or HTML tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.
Use the recursive endpoint to access table chunking functionality.On the API, the table chunker operates as part of the recursive chunker,
allowing you to process documents containing inline tables while ensuring
that table structures remain intact across chunk boundaries.
from chonkie import TableChunker# Basic initialization custom parameterschunker = TableChunker( tokenizer="row", # Chunk by rows, valid only for TableChunker chunk_size=3 # Maximum number of rows per chunk (not including header))
import { TableChunker } from "@chonkiejs/core";// Basic initialization with custom parametersconst chunker = await TableChunker.create({ tokenizer: "row", // Chunk by rows, valid only for TableChunker chunkSize: 3 // Maximum number of rows per chunk (not including header)});
from chonkie import TableChunkertable = """| Name | Age | City ||--------|-----|----------|| Alice | 30 | New York || Bob | 25 | London || Carol | 28 | Paris || Dave | 35 | Berlin |"""chunker = TableChunker(tokenizer="row", chunk_size=3)chunks = chunker.chunk(table)for chunk in chunks: print(chunk.text)# Each chunk is a valid markdown table segment, always including the header. For the example above and `chunk_size=3`, you might get:# >>># | Name | Age | City |# |--------|-----|----------|# | Alice | 30 | New York |# | Bob | 25 | London |# | Carol | 28 | Paris |# | Name | Age | City |# |--------|-----|----------|# | Dave | 35 | Berlin |
import { TableChunker } from "@chonkiejs/core";const table = `| Name | Age | City ||--------|-----|----------|| Alice | 30 | New York || Bob | 25 | London || Carol | 28 | Paris || Dave | 35 | Berlin |`;const chunker = await TableChunker.create({ tokenizer: "row", chunkSize: 3 });const chunks = await chunker.chunk(table);for (const chunk of chunks) { console.log(chunk.text);}
Supports both standard Markdown pipe tables and HTML <table> elements.
Requires at least a header, separator, and one data row (for Markdown) or at least one <tr> data row for HTML tables (with optional <thead> and <tbody> structure).
If the table fits within the chunk size, it is returned as a single chunk.
For advanced use, pass a custom tokenizer for token-based chunking.