How to get generative answers
The /chat
endpoint allows you to get generative answers from a Knowledge Box.
For example, if you store information about Hedy Lamarr in your Knowledge Box, you can ask questions like:
Who is Hedy Lamarr?
You will get a generative answer like:
Hedy Lamarr was an actress and inventor known for her contributions to the development of wireless communication technology.
Then, you can continue chatting with the Knowledge Box, based on the context of the previous question:
What did she do during the war?
Here, "she" is understood as "Hedy Lamarr", because it refers to the first question.
Data structure
As the answer generation is a slow process, the /chat
endpoint is delivering a readable HTTP stream.
The stream is structured in 4 different blocks:
- 1st block: first 4 bytes indicate the size of the 2nd block.
- 2nd block: a base64 encoded JSON containing the sources used to build the answer. It uses the same data model as the
/find
endpoint search results. - 3rd block: the answer text, ended by
"_CIT_"
(or"_END_"
if no citations). - 4th block (optional): - 4 first bytes indicates the size of the block, - followed by a base64 encoded JSON containing the citations - ended by
"_END_"
- 5th block (optional): base64 encoded JSON containing the entities.
Usage
-
You can get a fully decoded response directly using the Nuclia Python CLI/SDK.
-
To get generative answers in the Nuclia search widget, you need to enable the
answers
feature:<script src="https://cdn.nuclia.cloud/nuclia-video-widget.umd.js"></script>
<nuclia-search-bar
knowledgebox="YOUR-KB"
zone="ZONE"
features="answers"
></nuclia-search-bar>
<nuclia-search-results></nuclia-search-results> -
For testing, you can use it with
curl
:curl 'https://<ZONE>.nuclia.cloud/api/v1/kb/<YOUR-KB>/chat' -H 'content-type: application/json' --data-raw '{"query":"Who is Hedy Lamarr?","context":[]}' -H "x-synchronous: true"
noteThe
x-synchronous
header on the/chat
is mostly meant for testing purpose. The default behavior is to return a readable stream, as it allows to display the beginning of the answer without waiting for the end of the generation. Thex-synchronous
header makes the query slower, as it waits for the end of the generation before returning the answer. -
To implement your own chat widget, you can get inspiration from the Nuclia search widget implementation:
- Reading a readable HTTP stream (check the
getStream
method) - Decoding the result
- Reading a readable HTTP stream (check the
Citations
By default, the /chat
endpoint makes a find
query to retrieve relevant paragraphs, and the 20 best ones are passed to the generative model to produce the answer.
The sources
block contains the list of paragraphs used to generate the answer. So you know what was provided to the generative model as input, but you do not have any information about the output:
- you do not know which paragraphs were used to produce which part of the answer,
- you do not know if some of the paragraphs were not even used.
The /chat
endpoint accepts a boolean parameter named citations
.
When set to true
, the answer will contain a citations
block indicating the ids of the actual paragraphs that were used to generate the answer, and the start and end positions of the part of the answer that was generated from each paragraph.
Filters and other parameters
The /chat
endpoint accepts the globally same parameters as the /find
endpoint.
Typically filtering works exactly the same way as in the /find
endpoint, and you can refer to the filtering documentation for more details. By filtering on the /chat
endpoint, you can control the sources used to generate the answer.