Skip to main content

What Are Pipelines?

A pipeline is a named, reusable configuration that describes a sequence of chunking and refinement steps. Instead of passing the same configuration on every request, you define it once and reference it by ID. A pipeline step is either:
  • chunk — runs a chunker (e.g. "semantic", "token", "recursive")
  • refine — runs a refinery (e.g. "embeddings", "overlap")

Create a Pipeline

POST /v1/pipelines
curl -X POST http://localhost:8000/v1/pipelines \
  -H "Content-Type: application/json" \
  -d '{
    "name": "rag-chunker",
    "description": "Semantic chunking with embeddings for RAG",
    "steps": [
      {
        "type": "chunk",
        "chunker": "semantic",
        "config": {"chunk_size": 512, "threshold": 0.5}
      },
      {
        "type": "refine",
        "refinery": "embeddings",
        "config": {"embedding_model": "text-embedding-3-small"}
      }
    ]
  }'
Response (201 Created):
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "rag-chunker",
  "description": "Semantic chunking with embeddings for RAG",
  "config": {
    "steps": [
      {"type": "chunk", "chunker": "semantic", "refinery": null, "config": {"chunk_size": 512, "threshold": 0.5}},
      {"type": "refine", "chunker": null, "refinery": "embeddings", "config": {"embedding_model": "text-embedding-3-small"}}
    ]
  },
  "created_at": "2026-02-20T10:00:00.000000",
  "updated_at": "2026-02-20T10:00:00.000000"
}
name
string
required
Unique pipeline name. Used as a human-readable identifier.
description
string
Optional description of what this pipeline does.
steps
PipelineStep[]
required
Ordered list of steps to execute. Each step has:
  • type: "chunk" or "refine"
  • chunker: chunker name (for chunk steps, e.g. "semantic", "token")
  • refinery: refinery name (for refine steps, e.g. "embeddings", "overlap")
  • config: step-specific parameters (same fields as the individual endpoints)

List Pipelines

GET /v1/pipelines
curl http://localhost:8000/v1/pipelines
Returns all pipelines ordered by creation date (newest first).

Get a Pipeline

GET /v1/pipelines/{pipeline_id}
curl http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000

Update a Pipeline

PUT /v1/pipelines/{pipeline_id} You can update name, description, or steps independently:
curl -X PUT http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000 \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Updated description",
    "steps": [
      {
        "type": "chunk",
        "chunker": "recursive",
        "config": {"chunk_size": 1024, "recipe": "markdown"}
      }
    ]
  }'

Delete a Pipeline

DELETE /v1/pipelines/{pipeline_id}
curl -X DELETE http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000
Returns 204 No Content on success.

Pipeline Examples

Basic Token Chunking

{
  "name": "token-basic",
  "steps": [
    {"type": "chunk", "chunker": "token", "config": {"chunk_size": 512}}
  ]
}

Markdown Documents with Overlap

{
  "name": "markdown-with-overlap",
  "description": "Recursive markdown chunking with overlap context",
  "steps": [
    {
      "type": "chunk",
      "chunker": "recursive",
      "config": {"chunk_size": 512, "recipe": "markdown"}
    },
    {
      "type": "refine",
      "refinery": "overlap",
      "config": {"context_size": 0.2, "method": "suffix"}
    }
  ]
}

Full RAG Pipeline

{
  "name": "full-rag",
  "description": "Semantic chunking + overlap + embeddings",
  "steps": [
    {
      "type": "chunk",
      "chunker": "semantic",
      "config": {"chunk_size": 512, "threshold": 0.5}
    },
    {
      "type": "refine",
      "refinery": "overlap",
      "config": {"context_size": 0.1}
    },
    {
      "type": "refine",
      "refinery": "embeddings",
      "config": {"embedding_model": "voyage-large-2"}
    }
  ]
}

Storage

Pipelines are stored in a local SQLite database (data/chonkie.db). The database is created automatically on first startup. When using Docker, mount ./data:/app/data to persist the database across container restarts.

Execute a Pipeline

POST /v1/pipelines/{pipeline_id}/execute Runs the pipeline steps sequentially on the provided text. Each chunk step produces chunks; each refine step enriches them. Returns the final list of chunks.
curl -X POST http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000/execute \
  -H "Content-Type: application/json" \
  -d '{"text": "Your document text goes here. It will be chunked and refined."}'
Response:
[
  {
    "id": "chnk_abc123",
    "text": "Your document text goes here.",
    "start_index": 0,
    "end_index": 29,
    "token_count": 29,
    "context": null,
    "embedding": null
  },
  {
    "id": "chnk_def456",
    "text": "It will be chunked and refined.",
    "start_index": 30,
    "end_index": 61,
    "token_count": 31,
    "context": null,
    "embedding": null
  }
]

Batch Execution

Submit a list of strings to process multiple documents in one request. The response is a list of lists — one inner list per input document.
curl -X POST http://localhost:8000/v1/pipelines/550e8400-e29b-41d4-a716-446655440000/execute \
  -H "Content-Type: application/json" \
  -d '{"text": ["First document.", "Second document.", "Third document."]}'
text
string | string[]
required
Text or list of texts to process through the pipeline.

Error Responses

StatusCause
404Pipeline ID not found
400Pipeline has no steps, a refine step appears before any chunk step, or a step is missing required fields
500A step failed at runtime (e.g. missing extra, model error)