Skip to main content
Pipelines let you transform, validate, and enrich data between the ingestion API and the registry. They execute sequential processors, each operating on the output of previous steps, so that your collections stay consistent while adding derived insights.

Pipeline overview

  • Single pipeline per collection — Each collection can declare one pipeline. You can sequence as many processors as needed inside it.
  • Versioned definitions — Updating a pipeline creates a new version (for example clinics.2). In-flight ingestions finish on the version that was active when they started.
  • Deterministic execution — Steps run in order. A failure stops the pipeline and the ingestion fails.
Pipeline creation APIs are rolling out. Contact Clinia support if you need early access. You can already monitor executions.

Dataflow through the pipeline

  1. Ingestion request hits a source (bulk, bundle, or single writes).
  2. Pipeline dispatch loads the latest pipeline version configured on the target collection.
  3. Processor chain runs sequentially. Each step can:
    • Enrich the payload (segmenters, vectorizers)
    • Mutate properties (address augmentation, Clinia functions, OCR)
    • Validate intermediate results before continuing
  4. Default schema validation executes after all processors to ensure the resource still complies with its profile.
  5. Persistence writes the transformed data into the registry and emits receipts for observability.
Design your processor order to minimize expensive work and to fail fast when validation issues occur.

Creating a pipeline

curl -X PUT "https://$CLINIA_WORKSPACE/sources/my-external-system/v1/collections/my-collection/pipeline" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "steps": [
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "abstract",
        "modelId": "clinia-chunk.1",
        "propertyKey": "passages"
      }
    }
  ]
}'
Each step follows the same structure:
{
  "type": "<PROCESSOR_TYPE>",
  "<processor-type-in-lowercase>": {
    // Processor configuration
  },
  "trigger": {
    // Optional trigger configuration
  }
}

Triggers

Triggers control whether a processor runs for a specific payload.

Operator trigger

{
  "trigger": {
    "type": "OPERATOR_TRIGGER",
    "operator": {
      // Uses the same DSL as search queries
    }
  }
}
Use operators to inspect the current payload or results from previous steps before executing costly processors.

Always trigger

{
  "trigger": {
    "type": "ALWAYS_TRIGGER",
    "onlyOnTriggeredPipeline": false
  }
}
This is the implicit default. Set onlyOnTriggeredPipeline to true when a processor should run only if a prior step has already executed (for example, conditional validation).

Schema validation

  • Pipelines include an automatic validation pass at the end of the chain.
  • Add explicit Schema Validator steps earlier to fail fast before expensive processing or to validate post-mutation states.
{
  "steps": [
    {
      "type": "SCHEMA_VALIDATOR",
      "schemaValidator": {}
    }
  ]
}
Validation uses the rules defined in your collection profile (required fields, cardinality, vocabulary bindings, etc.).

Monitoring pipelines

Use the pipeline execution APIs to audit and debug ingestion flows:
curl -X GET "https://$CLINIA_WORKSPACE/sources/my-external-system/v1/pipelines/executions/{pipelineId}" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN"
Every endpoint accepts withOperationBody=true if you need to inspect the payload that triggered the execution.
Next steps:
I