Data Augmentation Agents

Nuclia provides a set of data augmentation agents that can create new data based on the orginal resources in your Knowledge Box:

Labelers
Text/JSON generators
Questions/answers generators
Graph extractors

They can be run as a one-shot process, applying to existing resources, and/or as a continuous process, applying to new resources as they are added to the Knowledge Box. They can apply to all resources or only to resources that match some specific criteria (resource type, field type, keywords).

Agents are managed from the section called Agents in the left menu of the Knowledge Box.

An agent can also trigger a webhook when it has processed a resource. This webhook can be used to trigger other processes, for example to send a notification, or to store the processed resource in another system.

Labelers

These agents can automatically label resources or text blocks based on a short description of each label.

For example, let's imagine a customer support platform, and you want to identify the severity level of the support requests. You can create a labelset called Severity with the following labels:

Low
Moderate
High

Then, create a labeler agent that will label resources. You will have to provide accurate descriptions for each label:

Low: "The text describes a problem that is not urgent and can be handled during normal business hours."
Moderate: "The text describes a problem that is is urgent and should be handled within a few hours."
High: "The text describes a problem that is critical and should be handled immediately."

Generators

These agents can automatically generate new text fields based on existing resources. They apply a given query to each resource and store the result in a new field.

The most common use case is to generate a summary of the resource. You can do it by providing the following query:

Provide a summary of the text.

You can be more specific by providing a query that will extract a specific part of the text, for example:

If the text mentions a scientific discovery, provide a short description of the discovery, its implications, and the authors.

You might want to obtain structured data from unstructured text. In this case, you can use a JSON query providing the schema of the expected output:

{
  "name": "book",
  "description": "Structured answer for a book",
  "parameters": {
    "type": "object",
    "properties": {
      "title": {
        "type": "string",
        "description": "The title of the book"
      },
      "author": {
        "type": "string",
        "description": "The author of the book"
      }
    },
    "required": ["title", "author"]
  }
}

Questions/answers generators

These agents can automatically generate questions and answers based on existing resources. They do not require any specific configuration.

Graph extractors

These agents can automatically extract named entities and identify relations between them based on a short description of the expected entities plus examples or such entities and their relations.

For example, in a legal context, you could define the following entity types:

PLAINTIFF: The person or entity that initiates a lawsuit
DEFENDANT The person or entity against whom a lawsuit is filed

And you would provide examples like:

"John Doe has filed a lawsuit against ABC Corporation for breach of contract.":
- PLAINTIFF = John Doe
- DEFENDANT = ABC Corporation

Then you would provide relation examples like:

"John Doe has filed a lawsuit against ABC Corporation for breach of contract.":
- source = "John Doe"
- target = "ABC Corporation"
- relation = "Plaintiff sues Defendant"

Improving the graph extraction with generator agents

The graph extractor examples assume that the entities we want to connect are mentioned in a single text block. But that's not always the case, sometimes the entities are mentioned in different text blocks, and sometimes the relations are not even explicitly mentioned in the text (typically a sentiment is usually supported by the tone of the text, and not by a specific word). In these cases, you can use a generator agent to generate a text field that will phrase the relation between the entities explicitly. Then you can use the graph extractor to extract the entities and relations from these generated text fields. To inject these relations in your graph, see the example detailed here

note

For programmatic management of these tasks, refer to the API documentation or SDK/CLI documentation. To see examples of creating these agents, check out this how-to tutorial

Labelers​

Generators​

Questions/answers generators​

Graph extractors​

Improving the graph extraction with generator agents​

Labelers

Generators

Questions/answers generators

Graph extractors

Improving the graph extraction with generator agents