Data Augmentation Agents: Automate Tasks for Better Search Performance
Improve the performance and precision of your search experience by leveraging automated data augmentation agents on your Knowledge Box. These agents enrich your data by creating additional content to optimize search results.
This feature is in beta and under active development.
Available Data Augmentation Agents
Below is a list of data augmentation agents you can configure via the dashboard, API, SDK, or CLI tools.
Labeler
Automatically labels text blocks or resources with labels based on predefined rules, descriptions, and examples. This helps classify and organize your content efficiently and makes it possible to leverage filtering by label at search time.
Why use it?
Manual labeling is time-consuming and prone to errors. With the Labeler agent, you can streamline this process by defining rules that automate resource labeling.
Generator
Generate additional text or metadata to enrich your resources, such as summaries or related content.
Example Use Case:
Creating concise summaries of resources can help deliver short, relevant answers and improve search outcomes.
Graph extraction
Use a large language model (LLM) to generate a Knowledge Graph by extracting entities and relationships from your resources.
What it does:
Automatically perform Named Entity Recognition (NER) and identify relationships to enhance resource metadata.
Generate questions & answers
Automatically create question-and-answer pairs from your documents to enrich your resources.
Benefit:
Enhancing resources with Q&A pairs improves their usability and provides additional context for your queries.
LLM security
Detect and flag jailbreak-related content in your resources or paragraphs. This agent uses a specialized model to label such content with a "jailbreak_safety" tag.
How it works:
Similar to the Labeler agent, but designed to identify content that could compromise LLM prompt security.
Content Safety
Automatically label text blocks or resources flagged as inappropriate. This agent uses a specialized model to label unsafe content with a "safety" tag.
Use Case:
Ensure that inappropriate or harmful content is flagged and categorized appropriately so that it can be deleted or filtered-out at search time.
Configuration Options
You can choose how to apply these agents to your resources:
-
Run once on existing resources only
Apply the agent to resources already present in the Knowledge Box. -
Apply automatically to future resources
Configure the agent to process new resources as they are added. -
Apply automatically to all existing and future resources
Ensure both current and new resources are processed by the agent (options 1 and 2 combined).
For programmatic management of these tasks, refer to the API documentation or SDK/CLI documentation. To see examples of creating these agents, check out this how-to tutorial