> ## Documentation Index > Fetch the complete documentation index at: https://docs.chonkie.ai/llms.txt > Use this file to discover all available pages before exploring further. # Table Chunker > Split markdown or HTML tables into manageable chunks by row, preserving headers. The `TableChunker` splits large markdown or HTML tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines. ## API Reference Use the `recursive` endpoint to access table chunking functionality. On the API, the table chunker operates as part of the recursive chunker, allowing you to process documents containing inline tables while ensuring that table structures remain intact across chunk boundaries. ## Installation TableChunker is included in the base installation of Chonkie. No additional dependencies are required. For installation instructions, see the [Installation Guide](/oss/installation). ## Initialization ```python row chunker theme={"system"} from chonkie import TableChunker # Basic initialization custom parameters chunker = TableChunker( tokenizer="row", # Chunk by rows, valid only for TableChunker chunk_size=3 # Maximum number of rows per chunk (not including header) ) ``` ```python token chunker theme={"system"} from chonkie import TableChunker # Basic initialization chunker = TableChunker( tokenizer="character", # using Character chunker (or you can use "gpt2", ...) chunk_size=16 # Maximum number of tokens/characters per chunk ) ``` ```javascript row chunker theme={"system"} import { TableChunker } from "@chonkiejs/core"; // Basic initialization with custom parameters const chunker = await TableChunker.create({ tokenizer: "row", // Chunk by rows, valid only for TableChunker chunkSize: 3 // Maximum number of rows per chunk (not including header) }); ``` ```javascript token chunker theme={"system"} import { TableChunker } from "@chonkiejs/core"; // Basic initialization const chunker = await TableChunker.create({ tokenizer: "character", // using Character chunker chunkSize: 16 // Maximum number of tokens/characters per chunk }); ``` ## Parameters Tokenizer to use. Default is "row". Can be a string identifier ("row", "character", "word", "gpt2", "byte", etc.) or a tokenizer instance. Maximum number of rows (if tokenizer="row") or tokens/characters per chunk. ## Usage ```python Markdown (Row-Based) theme={"system"} from chonkie import TableChunker table = """ | Name | Age | City | |--------|-----|----------| | Alice | 30 | New York | | Bob | 25 | London | | Carol | 28 | Paris | | Dave | 35 | Berlin | """ chunker = TableChunker(tokenizer="row", chunk_size=3) chunks = chunker.chunk(table) for chunk in chunks: print(chunk.text) # Each chunk is a valid markdown table segment, always including the header. For the example above and `chunk_size=3`, you might get: # >>> # | Name | Age | City | # |--------|-----|----------| # | Alice | 30 | New York | # | Bob | 25 | London | # | Carol | 28 | Paris | # | Name | Age | City | # |--------|-----|----------| # | Dave | 35 | Berlin | ``` ```python Markdown (Token-Based) theme={"system"} from chonkie import TableChunker table = """ | Name | Age | City | |--------|-----|----------| | Alice | 30 | New York | | Bob | 25 | London | | Carol | 28 | Paris | | Dave | 35 | Berlin | """ chunker = TableChunker(tokenizer="character",chunk_size=16) chunks = chunker.chunk(table) for chunk in chunks: print(chunk.text) # Each chunk is a valid markdown table segment, always including the header. For the example above and `chunk_size=16`, you might get: # >>> # | Name | Age | City | # | ----- | --- | -------- | # | Alice | 30 | New York | # | Bob | 25 | London | # | Name | Age | City | # | ----- | --- | ------ | # | Carol | 28 | Paris | # | Dave | 35 | Berlin | ``` ```python HTML Tables theme={"system"} from chonkie import TableChunker html_table = """

ID	Status
1	Active
2	Pending
3	Inactive
4	Active

""" # HTML tables are chunked while preserving , , and tags chunker = TableChunker(tokenizer="row", chunk_size=2) chunks = chunker.chunk(html_table) for chunk in chunks: print(f"--- HTML Chunk ---\n{chunk.text}\n") ``` ```javascript Markdown (Row-Based) theme={"system"} import { TableChunker } from "@chonkiejs/core"; const table = ` | Name | Age | City | |--------|-----|----------| | Alice | 30 | New York | | Bob | 25 | London | | Carol | 28 | Paris | | Dave | 35 | Berlin | `; const chunker = await TableChunker.create({ tokenizer: "row", chunkSize: 3 }); const chunks = await chunker.chunk(table); for (const chunk of chunks) { console.log(chunk.text); } ``` ```javascript Markdown (Token-Based) theme={"system"} import { TableChunker } from "@chonkiejs/core"; const table = ` | Name | Age | City | |--------|-----|----------| | Alice | 30 | New York | | Bob | 25 | London | | Carol | 28 | Paris | | Dave | 35 | Berlin | `; const chunker = await TableChunker.create({ tokenizer: "character", chunkSize: 16 }); const chunks = await chunker.chunk(table); for (const chunk of chunks) { console.log(chunk.text); } ``` ```javascript HTML Tables theme={"system"} import { TableChunker } from "@chonkiejs/core"; const htmlTable = `

ID	Status
1	Active
2	Pending
3	Inactive
4	Active

`; // HTML tables are chunked while preserving , , and tags const chunker = await TableChunker.create({ tokenizer: "row", chunkSize: 2 }); const chunks = await chunker.chunk(htmlTable); for (const chunk of chunks) { console.log(`--- HTML Chunk ---\n${chunk.text}\n`); } ``` ## Methods * `chunk(table: str) -> list[Chunk]`: Chunk a markdown table string. * `chunk_document(document: Document) -> Document`: Chunk all tables in a `MarkdownDocument`. ## Notes * Supports both standard Markdown pipe tables and HTML `

` elements. * Requires at least a header, separator, and one data row (for Markdown) or at least one `` data row for HTML tables (with optional `` and `` structure). * If the table fits within the chunk size, it is returned as a single chunk. * For advanced use, pass a custom tokenizer for token-based chunking. *** See also: [Chunkers Overview](/oss/chunkers/overview)