Generative answer with JSON output

Objective

Until recently, when interacting with a machine, humans had to adapt to the machine's language to ask the question (typically using SQL queries), and then interpret the machine answer. The machine answer is usually based on a structured data format (like a table, or a JSON string) which respects a given schema.

Generative answer is great because it allows to ask the machine with a question like any human would phrase it, and get a human-readable answer.

Nevertheless, a human-readable answer is not ideal when you want the answer to be processed by another machine. In this case, a structured data format is more appropriate.

For example, let's say you want to know what is the last Isaac Asimov's book in the Foundation cycle, you can ask the LLM, and you'll get the answer. But if you want to order this book automatically, then you need the answer to be in a structured format so it can be sent to the ordering system.

In that case, you would like to have the best of both worlds: ask the query in human language ("Please, order the last Isaac Asimov's book in the Foundation cycle"), and get a structured data answer.

This is what the JSON output option on the Nuclia's /ask endpoint is about.

Principle

To obtain a JSON output from the LLM, you need to define the expected data structure. With the book ordering example, you would probably expect a JSON object with the following fields:

title: the title of the book,
author: the author of the book,
ISBN: the ISBN of the book,
price: the price of the book.

To do so, you need to specify the JSON schema you want the answer to respect. This schema is passed in the answer_json_schema parameter of the request. It uses the JSON Schema format. Here is an example of a schema for the book ordering use case:

{
  "name": "book_ordering",
  "description": "Structured answer for a book to order",
  "parameters": {
    "type": "object",
    "properties": {
      "title": {
        "type": "string",
        "description": "The title of the book"
      },
      "author": {
        "type": "string",
        "description": "The author of the book"
      },
      "ref_num": {
        "type": "string",
        "description": "The ISBN of the book"
      },
      "price": {
        "type": "number",
        "description": "The price of the book"
      }
    },
    "required": ["title", "author", "ref_num", "price"]
  }
}

The most important attributes are the description attributes because that's what the LLM will use to understand what you expect.

Sometimes the description is not needed, for example, author is explicit enough as attribute name and the LLM will understand it properly. Sometimes, you have to be specific, like for ref_num which cannot be easily related to the ISBN of the book without a proper description. Or let's say you want the publication date, publication_date would be sufficient for the LLM, but if you add the following description:

"Publication date in ISO format"

Then the LLM will format the date in ISO format for you.

The JSON Schema format is quite powerful, and you can define complex structures, like nested objects, arrays, etc., or even define constraints on the values (like a minimum or maximum value for a number, or the list of required properties):

{
  "name": "book_ordering",
  "description": "Structured answer for a book to order",
  "parameters": {
    "type": "object",
    "properties": {
      "title": {
        "type": "string",
        "description": "The title of the book"
      },
      "price": {
        "type": "number",
        "description": "The price of the book"
      },
      "details": {
        "type": "object",
        "description": "Details of the book",
        "properties": {
          "publication_date": {
            "type": "string",
            "description": "Publication date in ISO format"
          },
          "pages": {
            "type": "number",
            "description": "Number of pages"
          },
          "rating": {
            "type": "number",
            "description": "Rating of the book",
            "minimum": 0,
            "maximum": 5
          }
        },
        "required": ["publication_date"]
      }
    },
    "required": ["title", "details"]
  }
}

Use cases

Automation

The use cases are endless, but one of the most obvious ones is to offer to your users a conversational interface where they can ask questions in natural language, obtain both a regular plain text answer and a set of actions to perform based on the answer.

In most cases, you do not want to entirely by-pass the human-readable answer, because it is a good way to check if the LLM understood the query correctly.

In our example, if the LLM is a proper Isaac Asimov fan, it will probably answer the last book of the Foundation cycle is either "Forward the Foundation" if you meant last according to the publication date, or "Foundation and Earth" if you meant last according to the story chronology.

So you will add a human-readable answer in the JSON output to allow the user to continue interacting with the LLM until the query is correctly understood:

{
  "name": "book_ordering",
  "description": "Structured answer for a book to order",
  "parameters": {
    "type": "object",
    "properties": {
      "answer": {
        "type": "string",
        "description": "Text responding to the user's query with the given context."
      }
      // then the rest of the schema
    }
  }
}

And you can use the structured data to automate the next steps.

Extracting structured information from images

As Nuclia offers a RAG strategy providing images along with the text, it is possible to extract structured information from images if you select a visual LLM.

Typically, you ask the LLM to get data from a chart:

{
  "query": "How many carbon is stored in the soil?",
  "generative_model": "chatgpt-vision",
  "rag_images_strategies": [{ "name": "paragraph_image", "count": 1 }],
  "answer_json_schema": {
    "name": "carbon_cycle",
    "description": "Structured answer for the carbon cycle",
    "parameters": {
      "type": "object",
      "properties": {
        "carbon_in_soil": {
          "type": "number",
          "description": "Amount of carbon stored in the soil in gigatons"
        }
      },
      "required": ["carbon_in_soil"]
    }
  }
}

and you will get the following JSON output:

{
    "answer": "",
    "answer_json": {
        "carbon_in_soil": 2300
    },
    "status": "success",
    …
}

Use metadata

You might need to retrieve extra information that are not present in the text, like the publication date, the author, the URL of the source, etc. This information can be stored in the metadata of the resource and then used in the JSON output.

To let the LLM access these metadata, you can use the metadata_extension RAG strategy and then define the JSON schema accordingly.

Usage

With the API

Pass the JSON schema in the answer_json_schema parameter of the request:

POST /api/v1/kb/<KB_ID>/ask
{
  "query": "What is the last book of the Foundation cycle by Isaac Asimov?",
  "features": ["vectors"],
  "answer_json_schema": {
    "name": "book_ordering",
    "description": "Structured answer for a book to order",
    "parameters": {
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "description": "The title of the book"
        },
        "author": {
          "type": "string",
          "description": "The author of the book"
        },
        "ref_num": {
          "type": "string",
          "description": "The ISBN of the book"
        }
      },
      "required": ["title", "author", "ref_num"]
    }
  }
}

You will obtain a response containing a answer_json attribute:

{
  "answer": "",
  "answer_json": {
      "title": "Forward the Foundation",
      "author": "Isaac Asimov",
      "ref_num": "0-385-24793-1"
  },
  "status": "success",
  "retrieval_results": {
    …
  },
  …
}

Note: the regular answer attribute is empty when the JSON output is requested.

With the Python SDK

from nuclia import sdk
search = sdk.NucliaSearch()
answer = search.ask_json(
    query="How many carbon is stored in the soil?",
    generative_model="chatgpt-vision",
    rag_images_strategies:[{"name":"paragraph_image","count": 1}],
    answer_json_schema={
        "name": "carbon_cycle",
        "description": "Structured answer for the carbon cycle",
        "parameters": {
            "type": "object",
            "properties": {
                "carbon_in_soil": {
                    "type": "number",
                    "description": "Amount of carbon stored in the soil in gigatons"
                }
            },
            "required": ["carbon_in_soil"]
        }
    }
)

print(answer.object["carbon_in_soil"]) # 2300

With the CLI

nuclia search ask_json --query="How many carbon is stored in the soil?" --generative_model="chatgpt-vision" --rag_images_strategies=[{"name":"paragraph_image","count": 1}] --answer_json_schema='{"name": "carbon_cycle", "description": "Structured answer for the carbon cycle", "parameters": {"type": "object", "properties": {"carbon_in_soil": {"type": "number", "description": "Amount of carbon stored in the soil in gigatons"}}, "required": ["carbon_in_soil"]}}'

Objective​

Principle​

Use cases​

Automation​

Extracting structured information from images​

Use metadata​

Usage​

With the API​

With the Python SDK​

With the CLI​