> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chonkie.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Table Chunker

> Split markdown or HTML tables into manageable chunks by row, preserving headers.

The `TableChunker` splits large markdown or HTML tables into smaller, manageable chunks by row, always preserving the header. This is especially useful for processing, indexing, or embedding tabular data in LLM and RAG pipelines.

## API Reference

Use the `recursive` endpoint to access table chunking functionality.

On the API, the table chunker operates as part of the recursive chunker,
allowing you to process documents containing inline tables while ensuring
that table structures remain intact across chunk boundaries.

## Installation

TableChunker is included in the base installation of Chonkie. No additional dependencies are required.

<Info>
  For installation instructions, see the [Installation
  Guide](/oss/installation).
</Info>

## Initialization

<Tabs>
  <Tab title="Python">
    <CodeGroup>
      ```python row chunker theme={"system"}
      from chonkie import TableChunker

      # Basic initialization custom parameters
      chunker = TableChunker(
      	tokenizer="row", # Chunk by rows, valid only for TableChunker
      	chunk_size=3 # Maximum number of rows per chunk (not including header)
      )
      ```

      ```python token chunker theme={"system"}
      from chonkie import TableChunker

      # Basic initialization
      chunker = TableChunker(
        tokenizer="character",  # using Character chunker (or you can use "gpt2", ...)
        chunk_size=16 # Maximum number of tokens/characters per chunk
      )
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript">
    <CodeGroup>
      ```javascript row chunker theme={"system"}
      import { TableChunker } from "@chonkiejs/core";

      // Basic initialization with custom parameters
      const chunker = await TableChunker.create({
        tokenizer: "row", // Chunk by rows, valid only for TableChunker
        chunkSize: 3      // Maximum number of rows per chunk (not including header)
      });
      ```

      ```javascript token chunker theme={"system"}
      import { TableChunker } from "@chonkiejs/core";

      // Basic initialization
      const chunker = await TableChunker.create({
        tokenizer: "character",  // using Character chunker
        chunkSize: 16            // Maximum number of tokens/characters per chunk
      });
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Parameters

<ParamField path="tokenizer" type="Union[ Literal[&#x22;row&#x22;, &#x22;character&#x22;], str, Callable[[str], int], Any]" default="row">
  Tokenizer to use. Default is "row". Can be a string identifier ("row", "character", "word", "gpt2", "byte",
  etc.) or a tokenizer instance.
</ParamField>

<ParamField path="chunk_size" type="int" default="3">
  Maximum number of rows (if tokenizer="row") or tokens/characters per chunk.
</ParamField>

## Usage

<Tabs>
  <Tab title="Python">
    <CodeGroup>
      ```python Markdown (Row-Based) theme={"system"}
      from chonkie import TableChunker

      table = """
      | Name   | Age | City     |
      |--------|-----|----------|
      | Alice  | 30  | New York |
      | Bob    | 25  | London   |
      | Carol  | 28  | Paris    |
      | Dave   | 35  | Berlin   |
      """

      chunker = TableChunker(tokenizer="row", chunk_size=3)
      chunks = chunker.chunk(table)
      for chunk in chunks:
      	print(chunk.text)

      # Each chunk is a valid markdown table segment, always including the header. For the example above and `chunk_size=3`, you might get:
      # >>>
      # | Name   | Age | City     |
      # |--------|-----|----------|
      # | Alice  | 30  | New York |
      # | Bob    | 25  | London   |
      # | Carol  | 28  | Paris    |

      # | Name   | Age | City     |
      # |--------|-----|----------|
      # | Dave   | 35  | Berlin   |
      ```

      ```python Markdown (Token-Based) theme={"system"}
      from chonkie import TableChunker

      table = """
      | Name   | Age | City     |
      |--------|-----|----------|
      | Alice  | 30  | New York |
      | Bob    | 25  | London   |
      | Carol  | 28  | Paris    |
      | Dave   | 35  | Berlin   |
      """

      chunker = TableChunker(tokenizer="character",chunk_size=16)
      chunks = chunker.chunk(table)
      for chunk in chunks:
      	print(chunk.text)

      # Each chunk is a valid markdown table segment, always including the header. For the example above and `chunk_size=16`, you might get:
      # >>>
      # | Name  | Age | City     |
      # | ----- | --- | -------- |
      # | Alice | 30  | New York |
      # | Bob   | 25  | London   |

      # | Name  | Age | City   |
      # | ----- | --- | ------ |
      # | Carol | 28  | Paris  |
      # | Dave  | 35  | Berlin |
      ```

      ```python HTML Tables theme={"system"}
      from chonkie import TableChunker

      html_table = """
      <table>
        <thead>
          <tr><th>ID</th><th>Status</th></tr>
        </thead>
        <tbody>
          <tr><td>1</td><td>Active</td></tr>
          <tr><td>2</td><td>Pending</td></tr>
          <tr><td>3</td><td>Inactive</td></tr>
          <tr><td>4</td><td>Active</td></tr>
        </tbody>
      </table>
      """

      # HTML tables are chunked while preserving <table>, <thead>, and <tbody> tags
      chunker = TableChunker(tokenizer="row", chunk_size=2)
      chunks = chunker.chunk(html_table)

      for chunk in chunks:
          print(f"--- HTML Chunk ---\n{chunk.text}\n")
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript">
    <CodeGroup>
      ```javascript Markdown (Row-Based) theme={"system"}
      import { TableChunker } from "@chonkiejs/core";

      const table = `
      | Name   | Age | City     |
      |--------|-----|----------|
      | Alice  | 30  | New York |
      | Bob    | 25  | London   |
      | Carol  | 28  | Paris    |
      | Dave   | 35  | Berlin   |
      `;

      const chunker = await TableChunker.create({ tokenizer: "row", chunkSize: 3 });
      const chunks = await chunker.chunk(table);
      for (const chunk of chunks) {
        console.log(chunk.text);
      }
      ```

      ```javascript Markdown (Token-Based) theme={"system"}
      import { TableChunker } from "@chonkiejs/core";

      const table = `
      | Name   | Age | City     |
      |--------|-----|----------|
      | Alice  | 30  | New York |
      | Bob    | 25  | London   |
      | Carol  | 28  | Paris    |
      | Dave   | 35  | Berlin   |
      `;

      const chunker = await TableChunker.create({ tokenizer: "character", chunkSize: 16 });
      const chunks = await chunker.chunk(table);
      for (const chunk of chunks) {
        console.log(chunk.text);
      }
      ```

      ```javascript HTML Tables theme={"system"}
      import { TableChunker } from "@chonkiejs/core";

      const htmlTable = `
      <table>
        <thead>
          <tr><th>ID</th><th>Status</th></tr>
        </thead>
        <tbody>
          <tr><td>1</td><td>Active</td></tr>
          <tr><td>2</td><td>Pending</td></tr>
          <tr><td>3</td><td>Inactive</td></tr>
          <tr><td>4</td><td>Active</td></tr>
        </tbody>
      </table>
      `;

      // HTML tables are chunked while preserving <table>, <thead>, and <tbody> tags
      const chunker = await TableChunker.create({ tokenizer: "row", chunkSize: 2 });
      const chunks = await chunker.chunk(htmlTable);

      for (const chunk of chunks) {
        console.log(`--- HTML Chunk ---\n${chunk.text}\n`);
      }
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Methods

* `chunk(table: str) -> list[Chunk]`: Chunk a markdown table string.
* `chunk_document(document: Document) -> Document`: Chunk all tables in a `MarkdownDocument`.

## Notes

* Supports both standard Markdown pipe tables and HTML `<table>` elements.
* Requires at least a header, separator, and one data row (for Markdown) or at least one `<tr>` data row for HTML tables (with optional `<thead>` and `<tbody>` structure).
* If the table fits within the chunk size, it is returned as a single chunk.
* For advanced use, pass a custom tokenizer for token-based chunking.

***

See also: [Chunkers Overview](/oss/chunkers/overview)
