Data Augmentation Agents: Automate Tasks for Better Search Performance

Improve the performance and precision of your search experience by leveraging automated data augmentation agents on your Knowledge Box. These agents enrich your data by creating additional content to optimize search results.

warning

This feature is in beta and under active development.

Available Data Augmentation Agents

Below is a list of data augmentation agents you can configure via the dashboard, API, SDK, or CLI tools.

Labeler

Automatically labels text blocks or resources with labels based on predefined rules, descriptions, and examples. This helps classify and organize your content efficiently and makes it possible to leverage filtering by label at search time.

Why use it?
Manual labeling is time-consuming and prone to errors. With the Labeler agent, you can streamline this process by defining rules that automate resource labeling.

Generator

Generate additional text or metadata to enrich your resources, such as summaries or related content.

Example Use Case:
Creating concise summaries of resources can help deliver short, relevant answers and improve search outcomes.

Graph extraction

Use a large language model (LLM) to generate a Knowledge Graph by extracting entities and relationships from your resources.

What it does:
Automatically perform Named Entity Recognition (NER) and identify relationships to enhance resource metadata.

Generate questions & answers

Automatically create question-and-answer pairs from your documents to enrich your resources.

Benefit:
Enhancing resources with Q&A pairs improves their usability and provides additional context for your queries.

LLM security

Detect and flag jailbreak-related content in your resources or paragraphs. This agent uses a specialized model to label such content with a "jailbreak_safety" tag.

How it works:
Similar to the Labeler agent, but designed to identify content that could compromise LLM prompt security.

Content Safety

Automatically label text blocks or resources flagged as inappropriate. This agent uses a specialized model to label unsafe content with a "safety" tag.

Use Case:
Ensure that inappropriate or harmful content is flagged and categorized appropriately so that it can be deleted or filtered-out at search time.

Configuration Options

You can choose how to apply these agents to your resources:

Run once on existing resources only
Apply the agent to resources already present in the Knowledge Box.
Apply automatically to future resources
Configure the agent to process new resources as they are added.
Apply automatically to all existing and future resources
Ensure both current and new resources are processed by the agent (options 1 and 2 combined).

note

For programmatic management of these tasks, refer to the API documentation or SDK/CLI documentation. To see examples of creating these agents, check out this how-to tutorial

Available Data Augmentation Agents​

Labeler​

Generator​

Graph extraction​

Generate questions & answers​

LLM security​

Content Safety​

Configuration Options​