Skip to main content
Fetchers connect different data sources to Chonkie’s pipeline system, enabling seamless data ingestion from various sources.

What are Fetchers?

Fetchers are the first step in the CHOMP pipeline (CHef -> CHunker -> Refinery -> Porter/Handshake). They retrieve data from different sources and pass it to the next pipeline stage for processing. Fetchers make it easy to:
  • Load files from local storage
  • Fetch documents from cloud storage (coming soon)
  • Retrieve data from databases (coming soon)
  • Connect to APIs and web sources (coming soon)

Installation

Fetchers are included with the base Chonkie installation:
pip install chonkie

Using Fetchers in Pipelines

Fetchers integrate seamlessly with the Pipeline API:
from chonkie.pipeline import Pipeline

# Single file
doc = (Pipeline()
    .fetch_from("file", path="document.txt")
    .process_with("text")
    .chunk_with("recursive", chunk_size=512)
    .run())

# Directory with multiple files
docs = (Pipeline()
    .fetch_from("file", dir="./docs", ext=[".txt", ".md"])
    .process_with("text")
    .chunk_with("recursive", chunk_size=512)
    .run())

Available Fetchers

More fetchers are coming soon! We’re working on cloud storage, database, and API fetchers.
I