Skip to main content
The TextChef processes plain text files and returns structured Document objects for further processing.

Installation

TextChef is included in the base installation of Chonkie. No additional dependencies are required.
For installation instructions, see the Installation Guide.

Initialization

from chonkie import TextChef

# Simple initialization - no parameters required
chef = TextChef()

Methods

process()

Process a text file and return a Document object.

Parameters

path
Union[str, Path]
required
Path to the text file (string or Path object)

Returns

Document object containing the file content

process_batch()

Process multiple text files at once.

Parameters

paths
List[Union[str, Path]]
required
List of file paths to process

Returns

List[Document] where each Document contains a file’s contents.

Usage

from chonkie import TextChef

# Initialize the chef
chef = TextChef()

# Process a text file
doc = chef.process("example.txt")

# Access the content
print(doc.content)
print(f"Document ID: {doc.id}")

Integration with Chunkers

TextChef is designed to work seamlessly with Chonkie’s chunkers:
from chonkie import TextChef, TokenChunker

# Step 1: Load text file
chef = TextChef()
doc = chef.process("article.txt")

# Step 2: Chunk the content
chunker = TokenChunker(chunk_size=512, chunk_overlap=50)
chunks = chunker.chunk(doc.content)

# Step 3: Store chunks back in the document
doc.chunks = chunks

# Now your document has both content and chunks
print(f"Document {doc.id}:")
print(f"  Content: {len(doc.content)} characters")
print(f"  Chunks: {len(doc.chunks)}")

Encoding

TextChef reads files with UTF-8 encoding by default, ensuring proper handling of:
  • Unicode characters
  • International text
  • Special symbols
  • Emoji and other non-ASCII characters
All text is read as strings and preserved exactly as it appears in the source file.
I