Basics

Pipelines let you transform, validate, and enrich data between the ingestion API and the registry. They execute sequential processors, each operating on the output of previous steps, so that your collections stay consistent while adding derived insights.

Pipeline overview

Single pipeline per collection — Each collection can declare one pipeline. You can sequence as many processors as needed inside it.
Versioned definitions — Updating a pipeline creates a new version (for example clinics.2). In-flight ingestions finish on the version that was active when they started.
Deterministic execution — Steps run in order. A failure stops the pipeline and the ingestion fails.

Work in progress

Pipeline creation APIs are rolling out. Contact Clinia support if you need early access. You can already monitor executions.

Dataflow through the pipeline

Ingestion request hits a source (bulk, bundle, or single writes).
Pipeline dispatch loads the latest pipeline version configured on the target collection.
Processor chain runs sequentially. Each step can:
- Enrich the payload (segmenters, vectorizers)
- Mutate properties (address augmentation, Clinia functions, OCR)
- Validate intermediate results before continuing
Default schema validation executes after all processors to ensure the resource still complies with its profile.
Persistence writes the transformed data into the registry and emits receipts for observability.

Design your processor order to minimize expensive work and to fail fast when validation issues occur.

Creating a pipeline

curl -X PUT "https://$CLINIA_WORKSPACE/sources/my-external-system/v1/collections/my-collection/pipeline" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "steps": [
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "abstract",
        "modelId": "clinia-segment-v1",
        "propertyKey": "passages"
      }
    }
  ]
}'

Each step follows the same structure:

{
  "type": "<PROCESSOR_TYPE>",
  "<processor-type-in-lowercase>": {
    // Processor configuration
  },
  "trigger": {
    // Optional trigger configuration
  }
}

Triggers

Triggers control whether a processor runs for a specific payload.

Operator trigger

{
  "trigger": {
    "type": "OPERATOR_TRIGGER",
    "operator": {
      // Uses the same DSL as search queries
    }
  }
}

Use operators to inspect the current payload or results from previous steps before executing costly processors.

Always trigger

{
  "trigger": {
    "type": "ALWAYS_TRIGGER",
    "onlyOnTriggeredPipeline": false
  }
}

This is the implicit default. Set onlyOnTriggeredPipeline to true when a processor should run only if a prior step has already executed (for example, conditional validation).

Schema validation

Pipelines include an automatic validation pass at the end of the chain.
Add explicit Schema Validator steps earlier to fail fast before expensive processing or to validate post-mutation states.

{
  "steps": [
    {
      "type": "SCHEMA_VALIDATOR",
      "schemaValidator": {}
    }
  ]
}

Validation uses the rules defined in your collection profile (required fields, cardinality, vocabulary bindings, etc.).

Monitoring pipelines

Use the pipeline execution APIs to audit and debug ingestion flows:

Get pipeline execution for a specific execution ID.
Query pipeline executions to build dashboards or human-in-the-loop queues.

curl -X GET "https://$CLINIA_WORKSPACE/sources/my-external-system/v1/pipelines/executions/{pipelineId}" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN"

Every endpoint accepts withOperationBody=true if you need to inspect the payload that triggered the execution.

Next steps:

See built-in processors for enrichment, validation, and OCR options.
Explore custom processors to extend the pipeline with bespoke logic.

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

Pipeline overview

Dataflow through the pipeline

Creating a pipeline

Triggers

Operator trigger

Always trigger

Schema validation

Monitoring pipelines

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

​Pipeline overview

​Dataflow through the pipeline

​Creating a pipeline

​Triggers

​Operator trigger

​Always trigger

​Schema validation

​Monitoring pipelines

Pipeline overview

Dataflow through the pipeline

Creating a pipeline

Triggers

Operator trigger

Always trigger

Schema validation

Monitoring pipelines