Skip to main content

Journey Overview

Knowledge Search supports providers in retrieving longer-form, evidence-based content aimed at building or updating clinical understanding. Knowledge Search surfaces current research, synthesized concepts, and educational materials not directly tied to step-by-step clinical decisions. It is particularly useful for staying up to date with emerging knowledge, informing clinical reasoning, and exploring topics beyond rigid care protocols.
If the above resonates with your use case, you are at the right place. Let’s dive into how you can deploy your very own article search experience using Clinia’s technology.

Getting Started

To get a broad understanding of the components within our data fabric, you can refer to our platform overview. To get started in this journey, you will need:
  1. A Clinia workspace
  2. A Clinia service account (API Key)
  3. Ability to execute HTTP requests
  4. Some data to ingest
We will go over a simple single collection use case for now. Make sure to refer to the documentation if you want to tailor the Data Fabric configuration to your exact use case. For the sake of this journey, we’ll leverage the following sample article data:
{
    "doi": "10.1186/1750-1172-8-79",
    "abstract": "# Olmsted Syndrome: Exploration of the Immunological Phenotype\n\nThis study explores the immunological aspects of **Olmsted Syndrome (OS)**, a rare congenital skin disorder characterized by severe, mutilating keratoderma and periorificial hyperkeratotic lesions. The condition is typically associated with recurrent skin infections.\n\n## Key Findings\n\n- **Genetic Mutation:**  \n  The patient, an 18-year-old male, exhibited a previously unreported de novo TRPV3 mutation (*Gly573Ala*), known to cause hyperactivation of the TRPV3 ion channel. This supports TRPV3's central role in OS.\n\n- **Clinical Presentation:**  \n  - Early-onset, progressive dermatological symptoms  \n  - Severe physical impairment (e.g., digit contractures, alopecia)  \n  - Lesions responsive to environmental triggers  \n  - Resistance to various treatment modalities\n\n- **Immunological Profile:**  \n  - Frequent dermal infections (*e.g., Candida albicans*)  \n  - Elevated serum IgE (hyper IgE)  \n  - Chronic peripheral eosinophilia  \n  - Increased follicular helper T (Tfh) cells in peripheral blood  \n  - Normal T cell responses and granulocyte function\n\n## Conclusions\n\nThis is the first comprehensive analysis showing systemic immune dysregulation in Olmsted Syndrome. The findings raise two potential disease models:\n\n1. **Primary keratinocyte-driven pathology:**  \n   Hyperactive TRPV3 disrupts the skin barrier, leading to secondary immune dysregulation.\n\n2. **Primary immunological dysfunction:**  \n   TRPV3 mutations affect Langerhans cells, triggering localized autoimmunity and subsequent skin symptoms.\n\nThe study emphasizes the need for further research to clarify OS mechanisms and guide potential immunomodulatory treatments."
    "content": "...",
    "title": "Olmsted syndrome: exploration of the immunological phenotype"
}

Workspace Configuration

To leverage semantic search and Clinia’s query understanding capabilities, you will need a collection with a vectorizer ingestion pipeline.

Create a Data Source

Documentation Currently, the only data source type available is a Registry. To create your {name} data source, run the following request:
curl --location --request PUT 'https://api.{workspaceId}.clinia.cloud/catalog/v1/sources/my-data' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
    "type": "registry"
}'

Create your article Profile

Documentation Before ingesting your data, we need to define the schema of the properties representing your data model. Given our sample article, here is what the article profile should look like:
curl --location --globoff --request PUT 'https://api.{workspaceId}.clinia.cloud/sources/my-data/v1/collections/article' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
    "type": "resources",
    "profile": {
      "properties": {
        "doi": { "type": "symbol" },
        "title": { "type": "symbol" },
        "content": { "type": "markdown" },
        "abstract": { "type": "markdown" }
      }
    }
  }'
Source profiles definition uses the Clinia Data Types System and relies on composition to allow a truly flexible data modelling experience.

Ingestion Pipeline

Documentation Now into the fun stuff. To leverage semantic search capabilities, you will need to augment your raw data using our various processors. To properly support semantic search, we will need a Vectorizer processor to create semantic representations of the article passages. The Vectorizer takes as input symbol data types and returns vectors (arrays of float-value points) representing your data in the vector space. This vector space is built in such a way that semantically related ideas or sentences (e.g. “diabetes” and “hyperglycemia”) are closer together and dissimilar ideas (e.g. “banana” and “psychologist”) are farther apart. In the context of knowledge search, we will focus on processing the content and the abstract property to be meaningful and interpretable. Think of it this way: for which attributes is keyword search limiting? Here is an example of that might look like for article search:
{
  "steps": [
    // will create a semantic representation of the 'content' into 'content.vector'
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "content",
        "propertyKey": "vector",
        "modelId": "text-embedding-004",
        "provider": "google",
        "dimensions": 768
      }
    },
  // will create a semantic representation of the 'abstract' into 'abstract.vector'
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "abstract",
        "propertyKey": "vector",
        "modelId": "text-embedding-004",
        "provider": "google",
        "dimensions": 768
      }
    }
  ]
}
You can add this pipeline to your collection with the following request:
curl --location --globoff --request PUT 'https://api.{workspaceId}.clinia.cloud/catalog/sources/my-data/v1/collections/article/pipeline' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
    "steps": [
      {
        "type": "VECTORIZER",
        "vectorizer": {
          "inputProperty": "content",
          "propertyKey": "vector",
          "provider": "google", 
          "modelId": "text-embedding-004",
          "dimensions": 768
        }
      },
      {
        "type": "VECTORIZER",
        "vectorizer": {
          "inputProperty": "abstract",
          "propertyKey": "vector",
          "provider": "google", 
          "modelId": "text-embedding-004",
          "dimensions": 768
        }
      }
    ]
  }'
Now that your ingestion pipeline and steps are set up, your data source is ready to receive data. Incoming records will be processed through the pipeline and their data augmented before being persisted in the system.

Ingesting data

Once everything is configured, you can create your records using our Standard or Bulk API. Using the Bulk API, here is what that can look like:
curl --location --globoff --request PUT 'https://api.{workspaceId}.clinia.cloud/catalog/sources/my-data/v1/resources/bulk' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
  "operations": [
    {
      "action": "CREATE",
      "create": {
     "type": "article",
       "doi": "10.1186/1750-1172-8-79",
       "abstract": "# Olmsted Syndrome: Exploration of the Immunological Phenotype\n\nThis study explores the immunological aspects of **Olmsted Syndrome (OS)**, a rare congenital skin disorder characterized by severe, mutilating keratoderma and periorificial hyperkeratotic lesions. The condition is typically associated with recurrent skin infections.\n\n## Key Findings\n\n- **Genetic Mutation:**  \n  The patient, an 18-year-old male, exhibited a previously unreported de novo TRPV3 mutation (*Gly573Ala*), known to cause hyperactivation of the TRPV3 ion channel. This supports TRPV3s central role in OS.\n\n- **Clinical Presentation:**  \n  - Early-onset, progressive dermatological symptoms  \n  - Severe physical impairment (e.g., digit contractures, alopecia)  \n  - Lesions responsive to environmental triggers  \n  - Resistance to various treatment modalities\n\n- **Immunological Profile:**  \n  - Frequent dermal infections (*e.g., Candida albicans*)  \n  - Elevated serum IgE (hyper IgE)  \n  - Chronic peripheral eosinophilia  \n  - Increased follicular helper T (Tfh) cells in peripheral blood  \n  - Normal T cell responses and granulocyte function\n\n## Conclusions\n\nThis is the first comprehensive analysis showing systemic immune dysregulation in Olmsted Syndrome. The findings raise two potential disease models:\n\n1. **Primary keratinocyte-driven pathology:**  \n   Hyperactive TRPV3 disrupts the skin barrier, leading to secondary immune dysregulation.\n\n2. **Primary immunological dysfunction:**  \n   TRPV3 mutations affect Langerhans cells, triggering localized autoimmunity and subsequent skin symptoms.\n\nThe study emphasizes the need for further research to clarify OS mechanisms and guide potential immunomodulatory treatments."
       "content": "...",
       "title": "Olmsted syndrome: exploration of the immunological phenotype"
   }
  ]
}'
You can use the taskId from the response that the request above will give you to track the status of the bulk ingestion task request. Use this request to do so:
curl -X GET "https://$CLINIA_WORKSPACE/sources/my-data/v1/tasks/{taskId}?withReceipts=true" \
  -H "X-Clinia-API-Key: $CLINIA_TOKEN" \
  -H "Content-Type: application/json"
Once fully processed, the task will be marked as successful and the records available for search in your collection.

Searching your collection

Documentation Once the ingestion is complete, you are now ready to search your collection! You can use the Search API. Here is one example of a query that uses the knn operator for your semantic fields:
curl --request POST \
     --url https://api.{workspaceId}.clinia.cloud/sources/my-data/v1/collections/article/query \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
  {
    "perPage": 20,
    "query": {
      "or": [
        {
          "knn": {
            "content.vector": {
              "value": "What is the disease of Olmsted?"
            }
          }
        },
        {
          "knn": {
            "abstract.vector": {
              "value": "What is the disease of Olmsted?"
            }
          }
        }
      ]
    },
    "highlighting": ["content.vector", "abstract.vector"]
  }'
You can find details about the API response here. The API also comes with highlighting support, to tell you why a given result was relevant. Using highlighting, you will be able to tell which of the fields or passages within each article hit was most relevant. This is particularly useful for display purposes, but also to generate the best answer possible using our Summarization API.