Vectorizer
Transforms text into numerical embeddings so you can run hybrid search. Configuration:inputProperty
— Path to the text field or the output of a previous processor.modelId
— Embedding model to use (for examplemte-base.1
ormte-base-knowledge.1
).propertyKey
— Destination sub-property that stores the resulting vector.
Segmenter
Splits long text into passages before vectorization so semantic queries stay targeted.Optical Character Recognition
Currently available on object collections. The processor reads the uploaded file attached to the object.
markdown
field alongside the original binary.
Address Augmenter
Work in progress
Work in progress
The Address Augmenter relies on Clinia’s geocoding service and is rolling out in stages.
Actionable
Work in progress
Work in progress
Lets you pause the pipeline and route the payload to human reviewers. Additional documentation will follow.
Schema Validator
Adds an explicit validation checkpoint mid-pipeline. This reuses the rules defined in your profiles and complements the implicit validation that occurs at the end. Use it to:- Stop the pipeline before an expensive processor when data is incomplete.
- Re-validate after a mutation step to ensure enriched data stays compliant.
Mutating vs. enriching processors
- Mutating processors (Address Augmenter, Clinia Function) replace the input property with enriched data. Update your schema first so the new shape passes validation.
- Enriching processors (Segmenter, Vectorizer, OCR) add derived properties under
enrichedProperties
. They keep the original field intact while making extra data available to partitions.