Highlighting

It is important to be able to show your user why a certain result is relevant to their search, highlighting can fill this role in a few different ways.

Vector

Highlights in a vector search context only make sense if the search is done on arrays (usually segmented symbol properties). content; the highlight will return the most relevant array items along with their scores, the resource property will return the entire property content. This allows you to show the user which part of the array matched the query. Highlights are returned for the three most relevant items for every matched resource. Matched resources are found through HNSW (Hierarchical Navigable Small World) search.

Consider the threshold an implementation detail, it may change from model to model and Clinia takes care of fine-tuning it to return relevant results.The score will vary between 0 and 1. It is the cosine similarity between the passage and the knn operator value.

// Example highlighting property of a query hit where path.to.property is an array property
{
  "path.to.property": [
    {
      "data": "passage matched by the query",
      "path": "path.to.property.0", // where 0 is the index of the passage when highlighting a segmented property
      "score": 0.9, // score range can vary from model to model, use this in relation to other highlights
      "type": "vector" // discriminator against textual highlights
    }
  ]
}

Textual

Highlights for match operators directly return the matched text with <em> tags around the matched terms.

{
  "path.to.property": [
    {
      "highlight": "There is a matched <em>word</em> in this passage",
      "type": "textual"
    }
  ]
}

Examples

The following setup is required to run the subsequent code snippets, but you can skip it and still follow along with the rest of the tutorial if you prefer! In short, it sets up and ingests data for a profile with three properties:

title: a symbol property
abstract: a symbol property that is segmented into passages and vectorized
content: an array of objects, each object having two symbol properties: sectionTitle and text. The text property is vectorized.

Required setup

To correctly showcase highlights, we need to set up our environment for an hybrid search. For the sake of this tutorial, we will be using the prestigious-journal source, the articles profile and the abstracts partition.

Create a source

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/sources/prestigious-journal" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "registry"
}'

Create a profile

curl -X PUT "https://$CLINIA_WORKSPACE/sources/prestigious-journal/v1/profiles/articles" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "ROOT",
  "properties": {
    "title": {
      "type": "symbol"
    },
    "abstract": {
      "type": "symbol"
    },
    "content": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "text": {
            "type": "symbol"
          },
          "sectionTitle": {
            "type": "symbol"
          }
        }
      }
    }
  }
}'

Create an embedding pipeline

curl -X PUT "https://$CLINIA_WORKSPACE/sources/prestigious-journal/v1/collections/articles/pipeline" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "steps": [
    {
      "segmenter": {
        "inputProperty": "abstract",
        "modelId": "clinia-segment-v1",
        "propertyKey": "passages"
      },
      "type": "SEGMENTER"
    },
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "abstract.passages",
        "modelId": "mte-base-v1",
        "propertyKey": "vector"
      }
    },
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "content.text",
        "modelId": "mte-base-v1",
        "propertyKey": "vector"
      }
    }
  ]
}'

Partition

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/partitions/abstracts" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "modules": {
    "search": "HEALTH_GRADE_SEARCH"
  },
  "source": {
    "type": "DATA_SOURCE",
    "key": "prestigious-journal",
    "collections": [
      {
        "key": "articles"
      }
    ]
  }
}'

Add sample data

curl -X POST "https://$CLINIA_WORKSPACE/sources/prestigious-journal/v1/resources/bulk" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "operations": [
    {
      "action": "CREATE",
      "create": {
        "type": "articles",
        "data": {
          "title": "Metabolic Resilience Index: Continuous Multi-Sensor Signatures for Early Detection of Dysmetabolic Risk",
          "abstract": "Researchers proposing an integrated “Metabolic Resilience Index” argue that subtle shifts in glucose variability precede overt fasting hyperglycemia. In their conceptual framework, circadian misalignment amplifies low‑grade inflammation through maladaptive cortisol rhythms. They describe how wearable sensor data—heart rate variability, peripheral temperature, and sleep fragmentation—can triangulate emerging autonomic imbalance. The model asserts that postprandial spikes combined with elevated nocturnal glucose plateaus predict mitochondrial oxidative stress. Mitochondrial efficiency is inferred indirectly via delayed recovery of resting heart rate after mild exertion. The authors layer in gut microbiome diversity metrics, noting decreased short‑chain fatty acid proxy scores in tandem with rising inflammatory cytokine panels. They suggest that composite biomarker clustering outperforms any single lab value for early metabolic syndrome detection. Beta cell “whisper distress” is depicted as a phase where insulin pulsatility dampens before fasting labs appear abnormal. Subjective fatigue ratings correlate with sleep efficiency dips on days of higher glycemic excursions. A proposed dashboard flags when rolling 7‑day variability in glucose exceeds a personalized threshold. Cortisol awakening response flattening is labeled a sentinel of hypothalamic–pituitary axis strain. The narrative links microglial priming to systemic inflammatory tone during chronic circadian disruption. A feedback loop is illustrated where inflammation impairs mitochondrial turnover, worsening energetic flexibility. They emphasize that early intervention windows are missed when clinicians focus solely on annual fasting labs. Proposed interventions include light timing hygiene and meal distribution realignment rather than immediate pharmacology. The concept paper also mentions that residual post-lunch glycemic tails predict evening cravings. A pilot simulation shows that reducing late eating narrows nocturnal glucose variability bands. The framework highlights that continuous metrics enable detection of inflection points, not just static abnormalities. An uncertainty layer is added to prevent overconfidence in noisy wearable signals. Finally, the authors call for federated learning to refine the composite inflammation and glucose variability signatures across diverse populations.",
          "content": [
            {
              "sectionTitle": "Methods",
              "text": "We conducted a longitudinal study involving 500 participants monitored over 12 months using wearable sensors that tracked heart rate variability, glucose levels, sleep patterns, and physical activity."
            },
            {
              "sectionTitle": "Methods",
              "text": "Data were collected in real-time and analyzed using machine learning algorithms to identify patterns indicative of metabolic resilience or vulnerability."
            },
            {
              "sectionTitle": "Results",
              "text": "Preliminary findings indicate that individuals with lower MRI scores exhibited higher variability in glucose levels and reduced heart rate variability, correlating with increased inflammatory markers."
            },
            {
              "sectionTitle": "Conclusion",
              "text": "Notably, these changes were detectable weeks before traditional clinical indicators of metabolic syndrome appeared."
            }
          ]
        }
      }
    }
  ]
}'

Wait for ingestion to complete

Look at the Task API guide to better understand how to poll for task status. This is not done here since it cannot be expressed as a single curl command.

Vector search on an Array property

Vector search using the knn operator on the segmented and vectorized abstract.passages.vector property. One highlight will be returned for the 3 most relevant (highest score) passages.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "knn": {
      "abstract.passages.vector": {
        "value": "circadian inflammation"
      }
    }
  },
  "highlighting": [
    "abstract.passages.vector"
  ]
}'

// response shortened to .hits[0].highlighting
{
  "highlighting": {
    "abstract.passages": [
      {
        "data": "Researchers proposing an integrated “Metabolic Resilience Index” argue that subtle shifts in glucose variability precede overt fasting hyperglycemia. In their conceptual framework, circadian misalignment amplifies low‑grade inflammation through maladaptive cortisol rhythms. They describe how wearable sensor data—heart rate variability, peripheral temperature, and sleep fragmentation—can triangulate emerging autonomic imbalance. The model asserts that postprandial spikes combined with elevated nocturnal glucose plateaus predict mitochondrial oxidative stress. Mitochondrial efficiency is inferred indirectly via delayed recovery of resting heart rate after mild exertion. The authors layer in gut microbiome diversity metrics, noting decreased short‑chain fatty acid proxy scores in tandem with rising inflammatory cytokine panels. They suggest that composite biomarker clustering outperforms any single lab value for early metabolic syndrome detection. Beta cell “whisper distress” is depicted as a phase where insulin pulsatility dampens before fasting labs appear abnormal. Subjective fatigue ratings correlate with sleep efficiency dips on days of higher glycemic excursions. A proposed dashboard flags when rolling 7‑day variability in glucose exceeds a personalized threshold. Cortisol awakening response flattening is labeled a sentinel of hypothalamic–pituitary axis strain. The narrative links microglial priming to systemic inflammatory tone during chronic circadian disruption. A feedback loop is illustrated where inflammation impairs mitochondrial turnover, worsening energetic flexibility.",
        "path": "abstract.passages.0",
        "score": 0.9013,
        "type": "vector"
      },
      {
        "data": " Proposed interventions include light timing hygiene and meal distribution realignment rather than immediate pharmacology. The concept paper also mentions that residual post-lunch glycemic tails predict evening cravings. A pilot simulation shows that reducing late eating narrows nocturnal glucose variability bands. The framework highlights that continuous metrics enable detection of inflection points, not just static abnormalities. An uncertainty layer is added to prevent overconfidence in noisy wearable signals. Finally, the authors call for federated learning to refine the composite inflammation and glucose variability signatures across diverse populations.",
        "path": "abstract.passages.2",
        "score": 0.8862,
        "type": "vector"
      },
      {
        "data": " They emphasize that early intervention windows are missed when clinicians focus solely on annual fasting labs.",
        "path": "abstract.passages.1",
        "score": 0.8699,
        "type": "vector"
      }
    ]
  }
}

Textual search on an Array property

Textual search can also be highlighted in a segmented property. In which case, a different highlight will be returned for every segment that matches the query.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "match": {
      "abstract.passages": {
        "value": "glucose",
        "type": "word"
      }
    }
  },
  "highlighting": [
    "abstract.passages"
  ]
}'

// response shortened to .hits[0].highlighting
{
  "highlighting": {
    "abstract.passages": [
      {
        "highlight": "Researchers proposing an integrated “Metabolic Resilience Index” argue that subtle shifts in <em>glucose</em>",
        "type": "textual"
      },
      {
        "highlight": "The model asserts that postprandial spikes combined with elevated nocturnal <em>glucose</em> plateaus predict",
        "type": "textual"
      },
      {
        "highlight": "A proposed dashboard flags when rolling 7‑day variability in <em>glucose</em> exceeds a personalized threshold",
        "type": "textual"
      },
      {
        "highlight": "A pilot simulation shows that reducing late eating narrows nocturnal <em>glucose</em> variability bands.",
        "type": "textual"
      },
      {
        "highlight": "Finally, the authors call for federated learning to refine the composite inflammation and <em>glucose</em> variability",
        "type": "textual"
      }
    ]
  }
}

Textual search on a Text property

Highlights also work on entire (not segmented) properties. There will always be a single textual highlight per full property.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "match": {
      "title": {
        "value": "metabolic",
        "type": "word"
      }
    }
  },
  "highlighting": [
    "title"
  ]
}'

// response shortened to .hits[0].highlighting
{
  "highlighting": {
    "title": [
      {
        "highlight": "<em>Metabolic</em> Resilience Index: Continuous Multi-Sensor Signatures for Early Detection of Dysmetabolic Risk",
        "type": "textual"
      }
    ]
  }
}

Multiple textual queries on the same Text property

Textual highlights combine all match queries on a same property.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "and": [
      {
        "match": {
          "title": {
            "value": "metabolic",
            "type": "word"
          }
        }
      },
      {
        "match": {
          "title": {
            "value": "Dysmetabolic",
            "type": "word"
          }
        }
      }
    ]
  },
  "highlighting": [
    "title"
  ]
}'

// response shortened to .hits[0].highlighting
{
  "highlighting": {
    "title": [
      {
        "highlight": "<em>Metabolic</em> Resilience Index: Continuous Multi-Sensor Signatures for Early Detection of <em>Dysmetabolic</em> Risk",
        "type": "textual"
      }
    ]
  }
}

Multiple vector queries on the same Array property

The following query does not work. You cannot request highlights for a vector property that is queried by two different knn operators since there would be no way to know which highlight corresponds to which knn operator.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "or": [
      {
        "knn": {
          "abstract.passages.vector": {
            "value": "circadian inflammation"
          }
        }
      },
      {
        "knn": {
          "abstract.passages.vector": {
            "value": "glucose"
          }
        }
      }
    ]
  },
  "highlighting": [
    "abstract.passages.vector"
  ]
}'

Textual search on an Array of objects

You can also request highlights on an array of objects as long as your operator matches a textual property inside the object! In this example, the content property is an array of objects, each object having two symbol properties: sectionTitle and text.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/abstracts/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "query": {
    "match": {
      "content.text": {
        "value": "glucose",
        "type": "word"
      }
    }
  },
  "highlighting": [
    "content.text"
  ]
}'

// response shortened to .hits[0].highlighting
{
  "highlighting": {
    "content.text": [
      {
        "highlight": "participants monitored over 12 months using wearable sensors that tracked heart rate variability, <em>glucose</em>",
        "type": "textual"
      },
      {
        "highlight": "Preliminary findings indicate that individuals with lower MRI scores exhibited higher variability in <em>glucose</em>",
        "type": "textual"
      }
    ]
  }
}

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

Vector

Textual

Examples

Vector search on an Array property

Textual search on an Array property

Textual search on a Text property

Multiple textual queries on the same Text property

Multiple vector queries on the same Array property

Textual search on an Array of objects

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

​Vector

​Textual

​Examples

​Vector search on an Array property

​Textual search on an Array property

​Textual search on a Text property

​Multiple textual queries on the same Text property

​Multiple vector queries on the same Array property

​Textual search on an Array of objects

Vector

Textual

Examples

Vector search on an Array property

Textual search on an Array property

Textual search on a Text property

Multiple textual queries on the same Text property

Multiple vector queries on the same Array property

Textual search on an Array of objects