Label search intent training
Label search intent training allows users to build a phrase-based classifier using all labeled resources and text blocks within their KB (Knowledge Box). This process creates a model that suggests labels during searches.
The images used in this guide are meant to provide a visual reference for the user. However, they may become outdated due to changes in the user interface.
Table of Contents
Upload your information
To construct our model we will utilize this guide which covers the entire process of uploading resources in various formats, labeling data, and training a machine learning model.
We will use an empty KB to work through the process.
First, we need to go to the Resource List section which can be found in the left-hand sidebar of our knowledge base. Once we select the third option labeled Resource List, we will be directed to the page that displays all available resources.
Our resource list is currently empty, which means we need to upload resources and label them with the appropriate Label set to proceed with our training exercises.
Our next step involves uploading resources from a dataset of food recipes that we have curated using free domain resources. To initiate this process, we must click on the pink button labeled Upload. This option offers the flexibility to upload various resources in any data format. For our purposes, we will select the first option (Upload files) and the third option (Add links) since we plan to mainly upload pdfs, docx, txt, mp4, and some links.
Once we click on the Upload files option, we will be redirected to a page where we can upload our files for preprocessing. Nuclia performs several background tasks while the files are being uploaded. These include vectorization, information summarization, entity and relationship extraction, and even automatic classification. If you want to know how to add automatic classification to your KB consult the guide on Automatic label training. In this guide, we will focus on building a tag suggester for our searches, which is something that we cannot find in a KB without some training.
It is quite easy to upload our cooking recipes in various formats through our web app. We can simply drag and drop the files into the Upload your file section and click on Add. This way we can upload multiple files simultaneously. In addition, we can use the option available to upload multiple links since we can work with all types of unstructured information.
It is worth mentioning that it is possible to use any of these options to label the files by groups. However, we want demonstrate how to create a Label set in the resource view so we will not be showing that here.
Once we have clicked on Add, we will see how all of our files are being loaded.
Upon returning to the resource list, we will notice a clock icon in the upper right-hand corner. This icon serves to indicate how many of our files are currently being processed.
Clicking on this icon will take us to a view where we can see the resources that are about to be processed and how they will be processed.
Create a Label set
While our files are being processed, we will create a Label set. This set of labels will be used to categorize our resources. To generate a Label set we have to access the left sidebar, navigate to the Classification tab, and select it.
At the moment this page is devoid of any content. However, by clicking on the Add new option, we can begin constructing a Label set.
At this stage, we can configure the Label set based on our specific needs by adjusting the following options:
Labelset name: The name we want to use for our Label set.
Color: The color that will represent our Label set.
Classification type: Labels can be associated with resources or text blocks. By labeling a resource, we take into account all of its content. However, we can also label only specific text blocks within a file.
Exclusive label: Our classifiers are multilabel by default. This means a resource or text block can have multiple labels associated with it from a Label set.
Labels: A list of labels within our Label set.
Our Label set will be titled "file recipes" and will be visually represented by the color blue. We will classify it as a resource type. We will select the Add only one label by label set option as we wish our resources to be classified as either sweet or savory, but not both. This checkbox will ensure that each resource is assigned a single label. Finally, we will establish the labels we previously identified.
In this situation, the specificity of our Label set regarding resources or tet blocks is of no consequence, as our Automatic label intent training takes all sentences labeled with either resource or text block annotation.
Label your data
We can now return to our list of resources, where the documents we uploaded will have finished processing. Upon inspection, we can verify that they have all been added to the list. While we could browse through them and explore different Nuclia features, let's focus on the classification task for now.
Let's start by labeling our resources using the Label set we just created. There are several ways to do this. One option is to select any of our files and then click on the Add labels feature. A dropdown menu containing every Label set we have generated will appear. In our specific case, only one set should appear, allowing us to easily label our files as either sweet or savory.
Now we can simply select the documents we want and the label to which they belong.
Another way to label our resources would be by clicking on the option represented by three vertical dots in the last column. Here, we will see a dropdown menu of options among which is Classify.
By clicking on Classify, we can view the resource with additional information, including its text blocks. Here we can switch between different views. Furthermore, if we navigate to Resource at the top left we can also tag the resource from there. In this particular case, we have selected a resource that we had previously labeled, so a label already appears.
Train our own model
Once we have all our labels, we will begin training. To do this, we will navigate to the left sidebar and click on the Training option.
Here we have various options based on the type of training we wish to utilize. In our case, we will select the Label search intent training, since we are looking to create a classification model at the phrase level that suggests labels for us. If you would like to explore other options, you can refer to the other tutorials in this section. These tutorials teach you how to train classifiers for text resources and text blocks, as well as entity extractors.
Clicking on the Choose One Label set option will display a range of Label sets to choose from. This training method is capable of handling multiple Label sets by training all labels in a single session. In our case we only have one Label set, which is the one we previously created, so we will select it.
In addition to what has been presented thus far, Nuclia provides users with the opportunity to perform various tasks including training and prediction, by using our API. To utilize this functionality, simply refer to the official documentation that is available on Api References
Once each of our trainings have completed we will be able to see information about them, such as the execution date and time.
To utilize our new model, we need to activate its widget. To achieve this we need to go to the left sidebar again and navigate to the Widget tab.
In this section, we can access a list of widgets that help with various Nuclia functionalities. Before activating the widget we want to use, we need to pay attention to the yellow warning message displayed at the top of the screen. This message explains that the KB must be public for the widgets to function correctly.
Therefore we need our KB to be public. To do this we are going to go back to the sidebar and travel to Home Dashboard. We can make our KB public here by clicking on the Publish option on the right side of the screen.
Once we have made our KB public, we can go back to the Widget tab and activate the Suggest labels option, which allows us to label suggestions based on our queries.
Now that we have our widget, we can go to the left sidebar and move to the Search tab where we can test our model.
If we do a search in our text box now we will not only receive relevant results, but we will also be presented with label suggestions.
In this way we can easily train a classification model using Nuclia.