Chonkie Documentation

What are Fetchers?
Installation
Using Fetchers in Pipelines
Available Fetchers

Fetchers connect different data sources to Chonkie’s pipeline system, enabling seamless data ingestion from various sources.

What are Fetchers?

Fetchers are the first step in the CHOMP pipeline (CHef -> CHunker -> Refinery -> Porter/Handshake). They retrieve data from different sources and pass it to the next pipeline stage for processing. Fetchers make it easy to:

Load files from local storage
Fetch documents from cloud storage (coming soon)
Retrieve data from databases (coming soon)
Connect to APIs and web sources (coming soon)

Installation

Fetchers are included with the base Chonkie installation:

pip install chonkie

Using Fetchers in Pipelines

Fetchers integrate seamlessly with the Pipeline API:

from chonkie.pipeline import Pipeline

# Single file
doc = (Pipeline()
    .fetch_from("file", path="document.txt")
    .process_with("text")
    .chunk_with("recursive", chunk_size=512)
    .run())

# Directory with multiple files
docs = (Pipeline()
    .fetch_from("file", dir="./docs", ext=[".txt", ".md"])
    .process_with("text")
    .chunk_with("recursive", chunk_size=512)
    .run())

Available Fetchers

FileFetcher

Fetch files from local filesystem - single files or entire directories.

More fetchers are coming soon! We’re working on cloud storage, database, and API fetchers.

MarkdownChef

FileFetcher

⌘I

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

Fetchers Overview

What are Fetchers?

Installation

Using Fetchers in Pipelines

Available Fetchers

FileFetcher

Getting Started

Chefs

Fetchers

Chunkers

Embeddings

Refinery

Handshakes

Porters

Utils

Experimental

Deprecated

Changelog

​What are Fetchers?

​Installation

​Using Fetchers in Pipelines

​Available Fetchers

FileFetcher

What are Fetchers?

Installation

Using Fetchers in Pipelines

Available Fetchers