Hybrid Search

Hybrid search blends multiple retrieval methods in a single query—typically lexical (keyword) and semantic (vector/embedding) search—to return results that are both precise and comprehensive. Instead of choosing one approach, you combine signals so users get exact matches for explicit terms plus conceptually related content for intent. This page focuses specifically on the match parameter of the query object. For a higher‑level overview of how search works in Clinia, see Search Parameters.

Hybrid search is most effective when your data contains both well‑structured fields (names, codes, IDs) and unstructured text (notes, descriptions), or when queries mix exact terms and fuzzy natural language.

Why use hybrid search

Broader recall for natural language. Vectors capture synonyms and intent (“heart attack” ≈ “myocardial infarction”).
Higher precision for critical terms. Lexical matching anchors on exact fields (identifiers, codes, brands, dosages).
Resilience to query drift. If a user’s terms don’t exist verbatim, semantic matching still retrieves plausible candidates, while keyword matching prevents off‑topic drift.
Better ranking quality. Fusing multiple relevance signals typically outperforms either signal alone on heterogeneous data.

When to use it

Consumer or clinical search where language varies (“stomach bug” vs ICD/LOINC terms).
Knowledge, directory, and chart search where documents blend metadata and narrative.
Safety‑critical workflows where you must honor exact constraints (coverage, specialty, location) yet still support natural‑language discovery.

If your corpus is small, highly structured, and queries are consistent, pure lexical search may suffice. If queries are open‑ended with few exact identifiers, semantic‑only may be competitive. Most real‑world health data benefits from hybrid.

Choosing `or` vs `and` in your query logic

Hybrid queries commonly wrap sub‑queries in boolean operators. Choosing or vs and controls recall vs precision: Use or to broaden recall. Return results that match either lexical or semantic criteria. Best for exploration or conversational search use cases, where missing a relevant item is more costly than including a few loosely related results that the agent won’t account for anyway. Use and to enforce precision. Require that results satisfy both lexical and semantic conditions (for example, match a specialty code and be semantically similar to the narrative).

Guidelines:

Start with or at the query stage for discovery, combine with strict filter constraints (coverage, geography) to keep results focused.
Use and when you must guarantee a hard match (e.g., an identifier, specialty, or vocabulary binding) and want semantic relevance inside that slice.
Prefer and for short, ambiguous queries that otherwise yield too many broad semantic matches; prefer or for longer, specific queries where either signal could be sufficient.

How `knn` interacts with boolean logic

Roughly speaking, knn returns a fixed‑size nearest‑neighbor candidate set (top‑K) from the chosen vector field, ordered by semantic similarity to the query vector. When you combine knn with and, you are filtering that candidate bucket: only items in the knn set that also satisfy the other clauses remain eligible, and their semantic similarity continues to influence ranking alongside any lexical scores. With or, you take the union of candidates from all clauses and blend ranking signals, so results that satisfy both semantic and lexical conditions typically rank higher than those matching only one. Top‑level filter constraints gate eligibility; they do not add positive score.

Where hybrid search fits in Clinia

Use lexical operators (e.g., match, term, range) over structured fields to anchor precision.
Use semantic operators (e.g., knn over vector fields) to capture intent and synonyms.
Use hybrid search to improve recall when there is a lexical gap.

Examples

The examples below show explicit query objects with operators and values written out manually. In a real app, when users type natural language, you may be better served by a semanticQuery. When specifying operators manually, you need to already know “what” you are looking for. The knn operator will work best with good match operators and filter constraints that you already know.

Provider directory — discovery (OR)

Mix consumer phrasing with structured constraints; maximize recall. Patients rarely use taxonomy codes. They type “heart doctor near me that takes Blue Cross.” Hybrid search lets you keep hard eligibility filters (plan, distance, status) while still understanding lay language via semantic similarity, reducing zero‑result queries and time‑to‑find. Using or here allows us to increase recall by allowing semantic matches on the bio property even if the specialty term is not explicitly mentioned. In practice, providers that match both clauses (specialty + bio) will rank higher, but we allow more candidates to be considered. This is very useful if there might be very few exact matches for the speciality term alone.

Set up

This example uses a minimal provider directory dataset with vectorized bios.

Create a source

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/sources/provider-demo" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "registry"
}'

Create a profile

curl -X PUT "https://$CLINIA_WORKSPACE/sources/provider-demo/v1/profiles/providers" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "ROOT",
  "properties": {
    "name": {
      "type": "symbol"
    },
    "specialty": {
      "type": "object",
      "properties": {
        "code": {
          "type": "symbol"
        },
        "text": {
          "type": "symbol"
        }
      }
    },
    "bio": {
      "type": "object",
      "properties": {
        "text": {
          "type": "symbol"
        }
      }
    },
    "insurances": {
      "type": "array",
      "items": {
        "type": "symbol"
      }
    },
    "location": {
      "type": "geopoint"
    },
    "status": {
      "type": "symbol"
    }
  }
}'

Create an embedding pipeline

curl -X PUT "https://$CLINIA_WORKSPACE/sources/provider-demo/v1/collections/providers/pipeline" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "steps": [
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "bio.text",
        "modelId": "mte-base-v1",
        "propertyKey": "vector"
      }
    }
  ]
}'

Create a partition

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/partitions/directory" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "modules": {
    "search": "HEALTH_GRADE_SEARCH"
  },
  "source": {
    "type": "DATA_SOURCE",
    "key": "provider-demo",
    "collections": [
      {
        "key": "providers"
      }
    ]
  }
}'

Add sample data

curl -X POST "https://$CLINIA_WORKSPACE/sources/provider-demo/v1/resources/bulk" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "operations": [
    {
      "action": "CREATE",
      "create": {
        "type": "providers",
        "data": {
          "name": "Dr. Alice Smith",
          "specialty": {
            "code": "207RC0000X",
            "text": "Cardiology"
          },
          "bio": {
            "text": "Cardiologist focusing on prevention and cardiac rehab."
          },
          "insurances": [
            "Blue Cross",
            "ACME Health"
          ],
          "location": {
            "latitude": 45.5,
            "longitude": -73.56
          },
          "status": "active"
        }
      }
    },
    {
      "action": "CREATE",
      "create": {
        "type": "providers",
        "data": {
          "name": "Dr. Bob Chen",
          "specialty": {
            "code": "207RC0000X",
            "text": "Cardiology"
          },
          "bio": {
            "text": "Interventional cardiologist treating chest pain and CAD."
          },
          "insurances": [
            "Contoso",
            "Blue Cross"
          ],
          "location": {
            "latitude": 45.49,
            "longitude": -73.58
          },
          "status": "active"
        }
      }
    },
    {
      "action": "CREATE",
      "create": {
        "type": "providers",
        "data": {
          "name": "Dr. Carla Nguyen",
          "specialty": {
            "code": "207N00000X",
            "text": "Dermatology"
          },
          "bio": {
            "text": "Dermatologist managing eczema, psoriasis, and chronic skin rashes."
          },
          "insurances": [
            "Blue Cross"
          ],
          "location": {
            "latitude": 45.5,
            "longitude": -73.55
          },
          "status": "active"
        }
      }
    }
  ]
}'

curl -X POST "https://$CLINIA_WORKSPACE/partitions/directory/v1/collections/providers/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    // Eligibility and safety constraints
    "filter": {
      "and": [
        { "any": { "insurances": ["Blue Cross"] } }, // partial intersection with plan networks
        // 25km radius around Ottawa, Canada
        { "geoDistance": { "location": { "coordinates": { "latitude": 45.5017, "longitude": -73.5673 }, "radius": 25000 } } },
        // exclude inactive providers
        { "eq": { "status": "active" } }
      ]
    },
    "query": {
      // Hybrid recall: keyword specialty OR semantic bio intent
      "or": [
        { "match": { "specialty.text": { "value": "cardio", "type": "wordPrefix" } } },
        { "knn": { "bio.text.vector": { "value": "heart doctor for chest pain" } } }
      ]
    }
  }'

Provider directory — precision (AND)

The and operator ensures that all results will match the requested speciality code (match operator) while knn finds the most semantically relevant providers within that set. Using and here gives us precision at the cost of some recall (semantically related specialities will be excluded). In referral routing, you must honor exact specialty taxonomy codes for compliance and payer rules. Hybrid search narrows results to the right specialty first, then uses semantic signals to surface the most relevant providers within that compliant set.

Set up

Same setup as the previous example.

curl -X POST "https://$CLINIA_WORKSPACE/partitions/directory/v1/collections/providers/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    // Hard constraints: only return active providers
    "filter": {
      "and": [
        { "eq": { "status": "active" } }
      ]
    },
    // Inside the slice, rank by semantic fit
    "query": {
      "and": [
        // Filter out anything outside of cardio* specialties
        { "match": { "specialty.text": { "value": "cardio", "type": "wordPrefix" } } },
        // Topical alignment. Only specialists that have a semantically related bio will match
        { "knn": { "bio.text.vector": { "value": "cardiac rehabilitation and prevention" } } }
      ]
    }
  }'

Knowledge search — clinical Q&A

Combine precise mentions with conceptual similarity over structured or unstructured content. Clinical content and biomedical literature mix attributes that carry canonical entities (title, tags, concepts, related terms, acronyms etc.) and attributes that contain answers and insights, oftentime relying on varying appelations and synonyms to convey meaning. Hybrid search lets you catch exact these entity mentions while retrieving conceptually related content, improving answer quality by balancing recall and precision.

Set up

This example uses a lightweight knowledge base with chunked content.

Create a source

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/sources/knowledge-base" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "registry"
}'

Create a profile

curl -X PUT "https://$CLINIA_WORKSPACE/sources/knowledge-base/v1/profiles/articles" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "type": "ROOT",
  "properties": {
    "title": {
      "type": "symbol"
    },
    "paragraphs": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "text": {
            "type": "symbol"
          }
        }
      }
    }
  }
}'

Create an embedding pipeline

curl -X PUT "https://$CLINIA_WORKSPACE/sources/knowledge-base/v1/collections/articles/pipeline" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "steps": [
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "paragraphs.text",
        "modelId": "mte-base-v1",
        "propertyKey": "vector"
      }
    }
  ]
}'

Create a partition

curl -X PUT "https://$CLINIA_WORKSPACE/catalog/v1/partitions/knowledge" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "modules": {
    "search": "HEALTH_GRADE_SEARCH"
  },
  "source": {
    "type": "DATA_SOURCE",
    "key": "knowledge-base",
    "collections": [
      {
        "key": "articles"
      }
    ]
  }
}'

Add sample data

curl -X POST "https://$CLINIA_WORKSPACE/sources/knowledge-base/v1/resources/bulk" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "operations": [
    {
      "action": "CREATE",
      "create": {
        "type": "articles",
        "data": {
          "title": "Atrial Fibrillation Management",
          "paragraphs": [
            {
              "text": "Rate control vs rhythm control in elderly patients."
            },
            {
              "text": "Anticoagulation decision support using CHA2DS2-VASc."
            }
          ]
        }
      }
    },
    {
      "action": "CREATE",
      "create": {
        "type": "articles",
        "data": {
          "title": "Heart Failure Guidelines",
          "paragraphs": [
            {
              "text": "Initiation of GDMT and titration strategy."
            },
            {
              "text": "Diuretic dosing and monitoring of congestion."
            }
          ]
        }
      }
    }
  ]
}'

curl -X POST "https://$CLINIA_WORKSPACE/partitions/knowledge/v1/collections/articles/query" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": {
      // Exact entity in title OR conceptual match in body
      "or": [
        { "match": { "title": { "value": "Atrial Fibrillation", "type": "phrase" } } },
        { "knn": { "paragraphs.text.vector": { "value": "how to manage afib in elderly patients" } } }
      ]
    }
  }'

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

Why use hybrid search

When to use it

Choosing `or` vs `and` in your query logic

Guidelines:

How `knn` interacts with boolean logic

Where hybrid search fits in Clinia

Examples

Provider directory — discovery (OR)

Provider directory — precision (AND)

Knowledge search — clinical Q&A

About Clinia

Core Concepts

Search

Configuring Data Sources

Configuring Partitions

Managing Data

Master Data Management

Terminology

Identity and Access Management

Agents

​Why use hybrid search

​When to use it

​Choosing or vs and in your query logic

​Guidelines:

​How knn interacts with boolean logic

​Where hybrid search fits in Clinia

​Examples

​Provider directory — discovery (OR)

​Provider directory — precision (AND)

​Knowledge search — clinical Q&A

Why use hybrid search

When to use it

Choosing `or` vs `and` in your query logic

Guidelines:

How `knn` interacts with boolean logic

Where hybrid search fits in Clinia

Examples

Provider directory — discovery (OR)

Provider directory — precision (AND)

Knowledge search — clinical Q&A