Skip to main content

Journey Overview

Knowledge Search supports providers in retrieving longer-form, evidence-based content aimed at building or updating clinical understanding. Knowledge Search surfaces current research, synthesized concepts, and educational materials not directly tied to step-by-step clinical decisions. It is particularly useful for staying up to date with emerging knowledge, informing clinical reasoning, and exploring topics beyond rigid care protocols.
If the above resonates with your use case, you are at the right place. Let’s dive into how you can deploy your very own article search experience using Clinia’s Health-Grade Search (HGS) technology.

Getting Started

To get a broad understanding of the components within our data fabric, you can refer to our platform overview. To get started in this journey, you will need:
  1. A Clinia workspace
  2. A Clinia service account (API Key)
  3. Ability to execute HTTP requests
  4. Some data to ingest
We will go over a simple single collection use case for now. Make sure to refer to the documentation if you want to tailor the Data Fabric configuration to your exact use case. For the sake of this journey, we’ll leverage the following sample article data:
{
    "doi": "10.1186/1750-1172-8-79",
    "abstract": "# Olmsted Syndrome: Exploration of the Immunological Phenotype\n\nThis study explores the immunological aspects of **Olmsted Syndrome (OS)**, a rare congenital skin disorder characterized by severe, mutilating keratoderma and periorificial hyperkeratotic lesions. The condition is typically associated with recurrent skin infections.\n\n## Key Findings\n\n- **Genetic Mutation:**  \n  The patient, an 18-year-old male, exhibited a previously unreported de novo TRPV3 mutation (*Gly573Ala*), known to cause hyperactivation of the TRPV3 ion channel. This supports TRPV3's central role in OS.\n\n- **Clinical Presentation:**  \n  - Early-onset, progressive dermatological symptoms  \n  - Severe physical impairment (e.g., digit contractures, alopecia)  \n  - Lesions responsive to environmental triggers  \n  - Resistance to various treatment modalities\n\n- **Immunological Profile:**  \n  - Frequent dermal infections (*e.g., Candida albicans*)  \n  - Elevated serum IgE (hyper IgE)  \n  - Chronic peripheral eosinophilia  \n  - Increased follicular helper T (Tfh) cells in peripheral blood  \n  - Normal T cell responses and granulocyte function\n\n## Conclusions\n\nThis is the first comprehensive analysis showing systemic immune dysregulation in Olmsted Syndrome. The findings raise two potential disease models:\n\n1. **Primary keratinocyte-driven pathology:**  \n   Hyperactive TRPV3 disrupts the skin barrier, leading to secondary immune dysregulation.\n\n2. **Primary immunological dysfunction:**  \n   TRPV3 mutations affect Langerhans cells, triggering localized autoimmunity and subsequent skin symptoms.\n\nThe study emphasizes the need for further research to clarify OS mechanisms and guide potential immunomodulatory treatments."
    "content": "...",
    "title": "Olmsted syndrome: exploration of the immunological phenotype"
}

Workspace Configuration

To leverage semantic search and Clinia’s query understanding capabilities, you will need a HGS partition instead of our Standard partition offering.

Create a Data Source

Documentation Currently, the only data source type available is a Registry. To create your {name} data source, run the following request:
curl --location --request PUT 'https://{workspaceId}.clinia.cloud/catalog/v1/sources/{name}' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
    "type": "registry"
}'

Create your article Profile

Documentation Before ingesting your data, we need to define the schema of the properties representing your data model. Given our sample article, here is what the article profile should look like:
curl --location --globoff --request PUT 'https://{workspaceId}.clinia.cloud/sources/my_data/v1/profiles/article' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
    "type": "ROOT",
    "properties": {
        "doi": {
            "type": "symbol"
        },
        "title": {
            "type": "symbol"
        },
        "content": {
            "type": "markdown"
        },
        "abstract": {
            "type": "markdown"
        }
    }
}'
Source profiles definition uses the Clinia Data Types System and relies on composition to allow a truly flexible data modelling experience.

Create your Data Partition

Documentation Once your profile is created, we can now create the data partition, which is a virtual, searchable view of your data. Let’s create a article_search partition for your article search application. Specifying the HEALTH_GRADE_SEARCH is key to indicating that we want this partition to be built so that we can use semantic search functionalities instead of regular keyword search.
curl --location --globoff --request PUT 'https://{workspaceId}.clinia.cloud/catalog/v1/partitions/article_search' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
  "modules": {
    "search": "HEALTH_GRADE_SEARCH"
  },
  "source": {
    "type": "DATA_SOURCE",
    "key": "my_data",
    "collections": [
      {
        "key": "article"
      }
    ]
  }
}'

Ingestion Pipeline

Documentation Now into the fun stuff. To leverage Clinia’s HGS engine, you will need to augment your raw data using our various processors. For knowledge search, we usually deal with very large pieces of content. While dense retrievers are stronger and stronger on long documents, the challenge in a RAG system is not to return the right article, but to refer to the most relevant chunk or passage of text within the article to feed to the LLM in a token-efficient manner. You can use the Segmenter processor to do just that. The Segmenter takes as a input a symbol or markdown data type and returns a list of chunks. Internally, chunks are object datatypes, with a text symbol property that we can then search on — or apply other processors too! To properly support semantic search, we will also need a Vectorizer processor to create semantic representations of the article passages. We recommend using our mte-base-knowledge model to do this as it was expressely trained on medical knowledge and desined to work well in clinical workflows. The Vectorizer takes as input symbol data types and returns vectors (arrays of float-value points) representing your data in the vector space. This vector space is built in such a way that semantically related ideas or sentences (e.g. “diabetes” and “hyperglycemia”) are closer together and dissimilar ideas (e.g. “banana” and “psychologist”) are farther apart. In the context of knowledge search, we will focus on processing the content and the abstract into meaningful and interpretable chunks. Think of it this way: for which attributes is keyword search limiting? Here is an example of that might look like for article search:
{
  "steps": [
    // will chunk the 'content' into 'content.chunks'
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "content",
        "propertyKey": "chunks",
        "modelId": "clinia-chunk.1"
      }
    },
    // will chunk the 'abstract' into 'abstract.chunks'
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "abstract",
        "propertyKey": "chunks",
        "modelID": "clinia-chunk.1"
      }
    },
    // will create a semantic representation of the 'content.chunks' into 'content.chunks.vector'
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "content.chunks"
        "propertyKey": "vector",
        "modelID": "mte-base-knowledge.1"
      }
    },
		// will create a semantic represention of the 'abstract.chunks' into 'abstract.chunks.vector'
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "abstract.chunks",
        "propertyKey": "vector",
        "modelID": "mte-base-knowledge.1"
      }
    }
  ]
}
You can add this pipeline to your collection with the following request:
curl --location --globoff --request PUT 'https://{workspaceId}.clinia.cloud/catalog/sources/my_data/v1/collections/article/pipeline' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
  "steps": [
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "content",
        "propertyKey": "chunks",
        "modelID": "clinia-chunk.1"
      }
    },
    {
      "type": "SEGMENTER",
      "segmenter": {
        "inputProperty": "abstract",
        "propertyKey": "chunks",
        "modelID": "clinia-chunk.1"
      }
    },
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "content.chunks",
        "propertyKey": "vector",
        "modelID": "mte-base-knowledge.1"
      }
    },
    {
      "type": "VECTORIZER",
      "vectorizer": {
        "inputProperty": "abstract.chunks",
        "propertyKey": "vector",
        "modelID": "mte-base-knowledge.1"
      }
    }
  ]
}'
Now that your ingestion pipeline and steps are set up, your data source is ready to receive data. Incoming records will be processed through the pipeline and their data augmented before being persisted in the system.

Ingesting data

Once everything is configured, you can create your records using our Standard or Bulk API. Using the Bulk API, here is what that can look like:
curl --location --globoff --request PUT 'https://{workspaceId}.clinia.cloud/catalog/sources/my_data/v1/resources/bulk' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE' \
--data '{
  "operations": [
    {
      "action": "CREATE",
      "create": {
					"type": "article",
			    "doi": "10.1186/1750-1172-8-79",
			    "abstract": "# Olmsted Syndrome: Exploration of the Immunological Phenotype\n\nThis study explores the immunological aspects of **Olmsted Syndrome (OS)**, a rare congenital skin disorder characterized by severe, mutilating keratoderma and periorificial hyperkeratotic lesions. The condition is typically associated with recurrent skin infections.\n\n## Key Findings\n\n- **Genetic Mutation:**  \n  The patient, an 18-year-old male, exhibited a previously unreported de novo TRPV3 mutation (*Gly573Ala*), known to cause hyperactivation of the TRPV3 ion channel. This supports TRPV3s central role in OS.\n\n- **Clinical Presentation:**  \n  - Early-onset, progressive dermatological symptoms  \n  - Severe physical impairment (e.g., digit contractures, alopecia)  \n  - Lesions responsive to environmental triggers  \n  - Resistance to various treatment modalities\n\n- **Immunological Profile:**  \n  - Frequent dermal infections (*e.g., Candida albicans*)  \n  - Elevated serum IgE (hyper IgE)  \n  - Chronic peripheral eosinophilia  \n  - Increased follicular helper T (Tfh) cells in peripheral blood  \n  - Normal T cell responses and granulocyte function\n\n## Conclusions\n\nThis is the first comprehensive analysis showing systemic immune dysregulation in Olmsted Syndrome. The findings raise two potential disease models:\n\n1. **Primary keratinocyte-driven pathology:**  \n   Hyperactive TRPV3 disrupts the skin barrier, leading to secondary immune dysregulation.\n\n2. **Primary immunological dysfunction:**  \n   TRPV3 mutations affect Langerhans cells, triggering localized autoimmunity and subsequent skin symptoms.\n\nThe study emphasizes the need for further research to clarify OS mechanisms and guide potential immunomodulatory treatments."
			    "content": "...",
			    "title": "Olmsted syndrome: exploration of the immunological phenotype"
			}
  ]
}'
You can use the bulkId from the response that the request above will give you to track the status of the bulk ingestion request. Use this request to do so:
curl --location --globoff --request GET 'https://{workspaceId}.clinia.cloud/catalog/sources/my_data/v1/resources/bulk/bulkId?withReceipts=true' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'X-Clinia-Api-Key: YOUR_API_KEY_HERE'
Once fully processed, the bulk task with be marked as successful and the records propagated to the relevant partitions.

Querying your HGS Partition

Documentation Once the ingestion is complete, you are now ready to use your HGS partition! You can use the Search API. Here is one example of a query that uses the knn operator for your semantic fields and a match operator for a name:
curl --request POST \
     --url https://{workspaceId}.clinia.cloud/partitions/article_search/v1/collections/article/query \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "perPage": 20,
  "query":{  
		  "or": [    
		    {  
		      "knn": {
		        "content.chunks.vector": {
		        	"value": "What is the disease of Olmsted?"
		        }
		      }  
		    },
		    {  
		      "knn": {
		        "abstract.chunks.vector": {
		        	"value": "What is the disease of Olmsted?"
		        }
		      }  
		    }  
		  ]  
		},
	"highlighting": ["content.chunks.vector", "abstract.chunks.vector"]
}
'
You can find details about the API response here. The API also comes with highlighting support, to tell you why a given result was relevant. Using highlighting, you will be able to tell which of the chunks or passages within each article hit was most relevant. This is particularly useful for display purposes, but also to generate the best answer possible using our Summarization API.

Retrieval-Augmented Generation

IN PREVIEW. API SUGGEST TO CHANGE
Retrieval-Augmented Generation (RAG) is the process of feeding context cues previously retrieved by your system to an LLM in order to get a answer summary grounded in your content. To do so, you can use our stateless Summarization API. The Summarization API abstracts away the pains of prompt engineering by providing pre-defined, health-grade generative recipes. In this case, we are interested in our knowledge-answer recipe. This recipe supports different parameters that will be used to fill in the underlying prompt template. Currently, the knowledge-answer recipe supports 3 parameters: Persona , Passages , and of course, the user Query . Let’s call the API using some results fetched by the Search API:
curl --request POST \
     --url https://{workspaceId}.clinia.cloud/ai/v1/summarize \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
    "task": "knowledge-answer",
    "params": {
        "Persona": "doctor",
        "Passages": [
            "# Olmsted Syndrome: Exploration of the Immunological Phenotype\n\nThis study explores the immunological aspects of **Olmsted Syndrome (OS)**, a rare congenital skin disorder characterized by severe, mutilating keratoderma and periorificial hyperkeratotic lesions",
	          "...",
	          "..."
        ],
        "Query": "What is the disease of Olmsted?"
    }
}
'
Using this “synced” endpoint, you can expect to wait a few seconds to get the full length of the answer. Fear not, the same endpoint also support streaming; simply add the accept: text/event-stream header to your API call and you are all set! We typically recommend streaming as it feels much better in terms of user experience, but we understand that some use cases might not benefit from it (e.g. if you want to generate record summaries nightly).

Query Understanding

IN PREVIEW. API SUGGEST TO CHANGE
You might be asking yourself how to determine which search operator to use, or which path you should target. Clinia’s Search API also proposes a semanticQuery parameter in which you can select the type of search experience you are implementing at query time and let Clinia’s Query Intelligence do the rest for you using the provided query and profile properties.
curl --request POST \
     --url https://{workspaceId}.clinia.cloud/partitions/article_search/v1/collections/article/query \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "semanticQuery": {
	    "text": "give me all articles about diabetes treatment between 2012 and 2018"
		},
}
'

Conversational Experience

IN PREVIEW. API SUGGEST TO CHANGE
Typically, providers want fast evidence-based insights delivered to them in a timely manner in order to optimize care at a point-of-care settings. However, you might have a different use case that benefits from a conversational experience. It is especially interesting if your users are medical researchers that might not have straight forward search intents and that can benefit from a chatbot context. We also have a Conversational API that sits on top of our Data Partition API. This allows the chatbot to intelligently detect when searching on your content is required, and provides the same timely summaries you would get with the Summarization API. The complete endpoint requires two parameters to be defined: the agent to use and the tools made available to the agent. Currently, we only support the data-partition-retriever . This tool requires you to define the partitions available as well as descriptions that will guide the agent in its decisions during the user interactions. Here is a payload example of our complete endpoint sitting on top of the partition we developed above:
curl --request POST \
     --url https://{workspaceId}.clinia.cloud/ai/v1/complete \
     --header 'accept: text/event-stream' \
     --header 'content-type: application/json' \
     --data '
{
    "agent": "medical-librarian",
    "tools": {
        "data-partition-retriever":{
            "partitions":[
                {
                    "key": "article_search",
                    "description": "Data partition that contains medically relevant information"
                }
            ]
        }
    },
    "messages": [
        {
            "role": "user",
            "content": "Can you guide me in my research? I am doing a literature review focused around congenital diseases and I dont know where to start"
        }
    ]
}
'
Currently, the API is stateless, which means you need to provide previous messages every time you want to continue the conversation.
I