Skip to main content

Data augmentation agents

Nuclia provides a set of data augmentation agents that can create new data based on the orginal resources in your Knowledge Box:

  • Labelers
  • Text/JSON generators
  • Questions/answers generators
  • Graph extractors

They can be run as a one-shot process, applying to existing resources, and/or as a continuous process, applying to new resources as they are added to the Knowledge Box. They can apply to all resources or only to resources that match some specific criteria (resource type, field type, keywords).

Agents are managed from the section called Agents in the left menu of the Knowledge Box.

An agent can also trigger a webhook when it has processed a resource. This webhook can be used to trigger other processes, for example to send a notification, or to store the processed resource in another system.

Labelers

These agents can automatically label resources or text blocks based on a short description of each label.

For example, let's imagine a customer support platform, and you want to identify the severity level of the support requests. You can create a labelset called Severity with the following labels:

  • Low
  • Moderate
  • High

Then, create a labeler agent that will label resources. You will have to provide accurate descriptions for each label:

  • Low: "The text describes a problem that is not urgent and can be handled during normal business hours."
  • Moderate: "The text describes a problem that is is urgent and should be handled within a few hours."
  • High: "The text describes a problem that is critical and should be handled immediately."

Generators

These agents can automatically generate new text fields based on existing resources. They apply a given query to each resource and store the result in a new field.

The most common use case is to generate a summary of the resource. You can do it by providing the following query:

Provide a summary of the text.

You can be more specific by providing a query that will extract a specific part of the text, for example:

If the text mentions a scientific discovery, provide a short description of the discovery, its implications, and the authors.

You might want to obtain structured data from unstructured text. In this case, you can use a JSON query providing the schema of the expected output:

{
"name": "book",
"description": "Structured answer for a book",
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the book"
},
"author": {
"type": "string",
"description": "The author of the book"
}
},
"required": ["title", "author"]
}
}

Questions/answers generators

These agents can automatically generate questions and answers based on existing resources. They do not require any specific configuration.

Graph extractors

These agents can automatically extract named entities and identify relations between them based on a short description of the expected entities plus examples or such entities and their relations.

For example, in a legal context, you could define the following entity types:

  • PLAINTIFF: The person or entity that initiates a lawsuit
  • DEFENDANT The person or entity against whom a lawsuit is filed

And you would provide examples like:

  • "John Doe has filed a lawsuit against ABC Corporation for breach of contract.":
    • PLAINTIFF = John Doe
    • DEFENDANT = ABC Corporation

Then you would provide relation examples like:

  • "John Doe has filed a lawsuit against ABC Corporation for breach of contract.":
    • source = "John Doe"
    • target = "ABC Corporation"
    • relation = "Plaintiff sues Defendant"