Skip to main content

Define the RAG Strategy

The RAG (Retrieval-Augmented Generation) strategy relies on a set of parameters that define how the search results collected for a given question are transformed into a suitable context for the generative model. This context is then used to generate the answer.

The Default Context

The default context is the list of the 20 best paragraphs that have been found for the question. This context is passed to the generative model without any transformation.

Note: You can change the number of paragraphs that are passed to the generative model by using the top_k parameter.

Specific RAG Strategies

Include Textual Hierarchy

Id: hierarchy

The "Textual hierarchy" RAG strategy will prepend the matched paragraph with the title of the resource it comes from and the summary of this resource (if it exists). And it may optionnally extend the text of the paragraph with what is coming after.

It is useful when the paragraphs are good semantic matches but lack the necessary context to generate a proper answer. For example, if the question is "What is the best Sci-Fi saga?", a paragraph like "This saga is the best Sci-Fi saga ever" will be an excellent semantic match, but would not be enough to generate a proper answer as it does not explicitly tells which saga it is. If the paragraph is extended with the title of the resource it comes from ("Star Wars"), the generative model will have a better chance to generate a proper answer.

Neighbouring paragraphs

Id: neighbouring_paragraphs

The "Neighbouring paragraphs" RAG strategy will append to the matched paragraph the previous and next paragraphs of the resource it comes from. You can define how many paragraphs you want to append before and after the matched paragraph.

It is useful when the matched paragraph is not enough to generate a proper answer, but the previous and next paragraphs might contain the necessary information.

A typical use case is when the question is about a specific event like "When and where will happen the next Star Wars fan convention?", and the matched paragraph is a general description of the event. The previous paragraph might contain the date of the event, and the next paragraph might contain the location of the event.

Pre-queries

Id: prequeries

The "Pre-queries" RAG strategy will run a set of queries before the main query to gather additional information that will be used to generate the answer. The results of these queries will be appended to the context.

That’s a very powerful tool to answer complex questions. One of the known weakness of the RAG approach is that it is trying to find the answer in a set of paragraphs that have been indexed. But if the question covers topics that are not present together in any of the indexed paragraphs, none of the results are considered as good matches, and the generative model will not be able to generate a proper answer.

Example: Imagine a Knowledge Box containing information about famous female inventors and scientists. If the question is "What do Cecilia Payne and Hedy Lamarr have in common?", running a semantic search directly with this question will not gather interesting results. You need to ask two separate questions: "What are the main inventions of Hedy Lamarr?" and "What are the main discoveries of Cecilia Payne?" and then use the results to generate the answer:

{
"query": "What do Marie Curie and Hedy Lamarr have in common?",
"rag_strategies": [
{
"name": "prequeries",
"queries": [
{ "request": { "query": "What are the main discoveries of Cecilia Payne?" } },
{ "request": { "query": "What are the main inventions of Hedy Lamarr?" } }
]
}
]
}

It may also be used to boost some results using the weight parameter. Imagine you have a Knowledge Box containing information about your products: product user manual and customer reviews. When asking a question about a product, with the default behavior, the customer reviews might produce excellent semantic matches but provide a poor context to generate the answer. You can use the prequeries strategy to run the same query on the user manual only on one side, and on the customer reviews on the other side, and give a higher weight to the user manual results:

{
"query": "How to replace the battery on C-3PO?",
"rag_strategies": [
{
"name": "prequeries",
"queries": [
{
"request": {
"query": "How to replace the battery on C-3PO?",
"filters": ["/classification.labels/doctype/product_manuals"]
},
"weight": 5
},
{
"request": {
"query": "How to replace the battery on C-3PO?",
"filters": ["/classification.labels/doctype/customer_reviews"]
},
"weight": 1
}
]
}
]
}

Note: the results for the pre-queries are returned in a specific entry called prequeries, apart from the main results (which are in retrieval_results).

Pass Entire Resources as Context

Id: full_resource

This strategy will pass the entire resources that contain the matched paragraphs as context to the generative model. It is similar to the Textual hierarchy strategy approach but a bit more radical, as passing the entire resource will make the context maximal both semantically (which is good) but also in temrs of size (which might be a problem, because depending on the model, you might reach the maximum token limit very fast).

It is a good fit when the resources are small.

Add metadata

This strategy will append to each matched paragraph the metadata associated with the resource it comes from. There are several types of metadata that can be used:

  • origin: metadata about the original resource (like the URL, the title, the author, the publication date, etc.)
  • classification_labels: contains all the labels that have been assigned to the resource
  • ners: contains all the named entities that have been extracted from the resource
  • extra_metadata: additional custom metadata that you can store in your resources

This metadata can then be used through the prompt. Examples:

Use the origin url to provide a markdown link to the corresponding result.
Use preferably results that have been published after 2020.

These metadata can also be used through the JSON output to produce next to the generated answer a rich info card showing images, buttons, or any kind of call-to-action.

Pass Specific Field(s) as Context

Id: field_extension

In most use cases, resources consist of a single main field (a File field, a Text field or a Link field). But Nuclia lets you store multiple fields in a single resource. With the "Pass Specific Field(s) as Context" strategy, you can decide to append to any matched paragraph the content of one or several specific fields of the resource it comes from.

For example, imagine a Knowledge Box containing contracts. The main field of the resource is the contract itself, but there is also an optional extra field called updates that contains the list of updates that have been made to the contract. If the question is "When does clause 3.2.1 of the contract apply?", the matched paragraph in the original contract might be "Clause 3.2.1 applies before the 1st of January 2023". But if this date had been modified, it can be mentioned in updates field, and even if the corresponding paragraph is not a better semantic match than the original paragraph, it will still be provided as context to the generative model, which will then be able to generate the correct answer.

Include Images

Ids: page_image and paragraph_image

If you use a visual LLM, like OpenAI ChatGPT-Vision or Google Gemini Pro Vision, appending images to the context might be valuable.

Nuclia offers two strategies to include images in the context:

  • Page Images: Append images present on the same page as the matched paragraph.
  • Paragraph Images: Append images present next to the matched paragraph.

Add extra context

The extra_context parameter let you add a list of text snippets that will be appended to the context. In that case, you are defining a part of the context yourself, and that is useful in many cases, like:

  • Your Knowledge Box does not contain all the relevant information to generate the answer. For example, if the Knowledge Box contains your products catalog but nothing about the user preferences and you need both to answer properly the user's question, you can add the current user's preferences as extra context.
  • You need to run several distinct searches to gather all the necessary information to generate the answer. In that case, you can use the extra_context parameter to pass the results.
  • You need to chain several questions to generate the answer. In that case, you can use the extra_context parameter to pass the results of the previous questions.

Ask a specific resource

By calling the /ask endpoint directly on the resource URL (rather than on the Knowledge Box URL), it will by-pass the RAG mechanism, no /find call will be done, and the full resource content will be used as context for the generative model.

note

If your resources are too large for this strategy but you know which resource should be used as context, call /ask on the Knowledge Box URL and include the resource ID in the resource_filters parameter.