Skip to main content
{
  "text": "<string>",
  "start_index": 123,
  "end_index": 123,
  "token_count": 123
}
The Code Chunker splits source code at logical boundaries (functions, classes, methods) while preserving code structure and syntax.

Examples

Text Input

from chonkie.cloud import CodeChunker

chunker = CodeChunker(
    language="python",
    chunk_size=512
)

code = "def example():\n    return True"
chunks = chunker.chunk(code)

File Input

from chonkie.cloud import CodeChunker

chunker = CodeChunker(
    language="python",
    chunk_size=512
)

# Chunk from file
with open("script.py", "rb") as f:
    chunks = chunker.chunk(file=f)

Request

Parameters

text
string | string[]
The code to chunk. Can be a single string or an array of strings for batch processing. Either text or file is required.
file
file
Code file to chunk. Use multipart/form-data encoding. Either text or file is required.
language
string
default:"python"
Programming language of the code. Supports: python, javascript, typescript, java, cpp, etc.
tokenizer
string
default:"gpt2"
Tokenizer to use for counting tokens.
chunk_size
integer
default:"512"
Maximum number of tokens per chunk.

Response

Returns

Array of Chunk objects, each containing:
text
string
The chunk text content.
start_index
integer
Starting character position in the original text.
end_index
integer
Ending character position in the original text.
token_count
integer
Number of tokens in the chunk.
I