import Badge from '@site/src/components/Badge';
API Access processed data
Once processed, a resource can be accessed using the Nuclia API:
curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resources/<resource-id>
If the Knowledge Box is not public, you will need to provide an API key:
curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resources/<resource-id> \
-H "X-NUCLIA-SERVICEACCOUNT: Bearer <your-API-key>"
By doing this call, you will only get minimal data about the resource. Here is a typical response:
{
"id": "8529644d58549d024606d69b74f30620",
"title": "Lamarr%20Lesson%20plan.pdf",
"summary": "",
"icon": "application/pdf",
"layout": "",
"thumbnail": "/kb/56d442bb-85f1-470c-b27b-6cb33309ba17/resource/8529644d58549d024606d69b74f30620/file/8529644d58549d024606d69b74f30620/download/extracted/file_thumbnail",
"metadata": {
"metadata": {},
"language": "",
"languages": [],
"status": "PROCESSED"
},
"user_metadata": {
"classifications": [],
"relations": []
},
"created": "2022-05-25T14:25:03.999979",
"modified": "2022-05-25T14:25:03.999960"
}
If you want more data extracted from the resource by Nuclia, you need to add extra parameters to the call.
Get text
For example, if you have videos and want to use Nuclia to extract the text from these videos, you can use the parameters ?show=extracted&extracted=text
.
This will return the entire text that has been extracted from a resource (whatever the original file is, including videos or sound files).
curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resources/<resource-id>?show=extracted&extracted=text
Example:
{
"id": "5b83d66181436cd272b81065655124a4",
"data": {
"files": {
"5b83d66181436cd272b81065655124a4": {
"extracted": {
"text": {
"text": " Ada Lovelace is considered one of the first computer programmers, which may sound a little strange because she lived in the early 1800s. Well, before the invention of the computer, it was the daughter of famous English poet, Lord Byron and Lady Anne Isabella Milbank, her mom was known for being incredibly smart, which Lord Byron found very attractive. Calling her, his princess of parallelograms."
"split_text": {},
"deleted_splits": []
}
}
}
}
}
}
Get metadata
If you use Nuclia to collect metadata from files, you can use the parameters ?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file
. It will return all the metadata of a file stored in the resource.
It can apply to many other use cases of course, as the returned information is very rich (check the list below).
curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resources/<resource-id>?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file
It provides the following information:
- file metadata
- nested texts (like text in an embedded image)
- a summary
- text blocks (under
paragraphs
property) and sentences (defined by the position of their first and last characters, plus start time and end time for a video or audio file) - named entities (people, dates, places, organizations, etc.)
- links
- a thumbnail
- embedded files
Get vectors
Using the parameters ?show=extracted&extracted=vectors
you can get the vector representation of the entire text.
curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resources/<resource-id>?show=extracted&extracted=vectors
This can be useful if you want to use a third-party vector database.