Chonkie Documentation

What Are Pipelines?

A pipeline is a named, reusable configuration that describes a sequence of chunking and refinement steps. Instead of passing the same configuration on every request, you define it once and reference it by ID. A pipeline step is either:

chunk — runs a chunker (e.g. "semantic", "token", "recursive")
refine — runs a refinery (e.g. "embeddings", "overlap")

Create a Pipeline

POST /v1/pipelines

curl -X POST http://localhost:8000/v1/pipelines \
  -H "Content-Type: application/json" \
  -d '{
    "name": "rag-chunker",
    "description": "Semantic chunking with embeddings for RAG",
    "steps": [
      {
        "type": "chunk",
        "chunker": "semantic",
        "config": {"chunk_size": 512, "threshold": 0.5}
      },
      {
        "type": "refine",
        "refinery": "embeddings",
        "config": {"embedding_model": "text-embedding-3-small"}
      }
    ]
  }'

Response (201 Created):

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "rag-chunker",
  "description": "Semantic chunking with embeddings for RAG",
  "config": {
    "steps": [
      {"type": "chunk", "chunker": "semantic", "refinery": null, "config": {"chunk_size": 512, "threshold": 0.5}},
      {"type": "refine", "chunker": null, "refinery": "embeddings", "config": {"embedding_model": "text-embedding-3-small"}}
    ]
  },
  "created_at": "2026-02-20T10:00:00.000000",
  "updated_at": "2026-02-20T10:00:00.000000"
}

Request Parameters

name

string

required

Unique pipeline name. Used as a human-readable identifier.

description

string

Optional description of what this pipeline does.

steps

PipelineStep[]

required

Ordered list of steps to execute. Each step has:

type: "chunk" or "refine"
chunker: chunker name (for chunk steps, e.g. "semantic", "token")
refinery: refinery name (for refine steps, e.g. "embeddings", "overlap")
config: step-specific parameters (same fields as the individual endpoints)

List Pipelines

GET /v1/pipelines

curl http://localhost:8000/v1/pipelines

Returns all pipelines ordered by creation date (newest first).

Get a Pipeline

GET /v1/pipelines/{pipeline_id}

curl http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000

Update a Pipeline

PUT /v1/pipelines/{pipeline_id} You can update name, description, or steps independently:

curl -X PUT http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000 \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Updated description",
    "steps": [
      {
        "type": "chunk",
        "chunker": "recursive",
        "config": {"chunk_size": 1024, "recipe": "markdown"}
      }
    ]
  }'

Delete a Pipeline

DELETE /v1/pipelines/{pipeline_id}

curl -X DELETE http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000

Returns 204 No Content on success.

Pipeline Examples

Basic Token Chunking

{
  "name": "token-basic",
  "steps": [
    {"type": "chunk", "chunker": "token", "config": {"chunk_size": 512}}
  ]
}

Markdown Documents with Overlap

{
  "name": "markdown-with-overlap",
  "description": "Recursive markdown chunking with overlap context",
  "steps": [
    {
      "type": "chunk",
      "chunker": "recursive",
      "config": {"chunk_size": 512, "recipe": "markdown"}
    },
    {
      "type": "refine",
      "refinery": "overlap",
      "config": {"context_size": 0.2, "method": "suffix"}
    }
  ]
}

Full RAG Pipeline

{
  "name": "full-rag",
  "description": "Semantic chunking + overlap + embeddings",
  "steps": [
    {
      "type": "chunk",
      "chunker": "semantic",
      "config": {"chunk_size": 512, "threshold": 0.5}
    },
    {
      "type": "refine",
      "refinery": "overlap",
      "config": {"context_size": 0.1}
    },
    {
      "type": "refine",
      "refinery": "embeddings",
      "config": {"embedding_model": "voyage-large-2"}
    }
  ]
}

Storage

Pipelines are stored in a local SQLite database (data/chonkie.db). The database is created automatically on first startup. When using Docker, mount ./data:/app/data to persist the database across container restarts.

Execute a Pipeline

POST /v1/pipelines/{pipeline_id}/execute Runs the pipeline steps sequentially on the provided text. Each chunk step produces chunks; each refine step enriches them. Returns the final list of chunks.

curl -X POST http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000/execute \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document text goes here. It will be chunked and refined."}'

Response:

[
  {
    "id": "chnk_abc123",
    "text": "Your document text goes here.",
    "start_index": 0,
    "end_index": 29,
    "token_count": 29,
    "context": null,
    "embedding": null
  },
  {
    "id": "chnk_def456",
    "text": "It will be chunked and refined.",
    "start_index": 30,
    "end_index": 61,
    "token_count": 31,
    "context": null,
    "embedding": null
  }
]

Batch Execution

Submit a list of strings to process multiple documents in one request. The response is a list of lists — one inner list per input document.

curl -X POST http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000/execute \
  -H "Content-Type: application/json" \
  -d '{"text": ["First document.", "Second document.", "Third document."]}'

Request Parameters

text

string | string[]

required

Text or list of texts to process through the pipeline.

Error Responses

Status	Cause
`404`	Pipeline ID not found
`400`	Pipeline has no steps, a `refine` step appears before any `chunk` step, or a step is missing required fields
`500`	A step failed at runtime (e.g. missing extra, model error)

Documentation Index

​What Are Pipelines?

​Create a Pipeline

​List Pipelines

​Get a Pipeline

​Update a Pipeline

​Delete a Pipeline

​Pipeline Examples

​Basic Token Chunking

​Markdown Documents with Overlap

​Full RAG Pipeline

​Storage

​Execute a Pipeline

​Batch Execution

​Error Responses

What Are Pipelines?

Create a Pipeline

List Pipelines

Get a Pipeline

Update a Pipeline

Delete a Pipeline

Pipeline Examples

Basic Token Chunking

Markdown Documents with Overlap

Full RAG Pipeline

Storage

Execute a Pipeline

Batch Execution

Error Responses