Skip to main content

Access processed data

Once processed, a resource can be accessed using the Nuclia API:

curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id>
note

If the Knowledge Box is not public, you will need to provide an API key:

curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id> \
-H "X-NUCLIA-SERVICEACCOUNT: Bearer <your-API-key>"

By doing this call, you will only get minimal data about the resource. Here is a typical response:

{
"id": "8529644d58549d024606d69b74f30620",
"title": "Lamarr%20Lesson%20plan.pdf",
"summary": "",
"icon": "application/pdf",
"layout": "",
"thumbnail": "/kb/56d442bb-85f1-470c-b27b-6cb33309ba17/resource/8529644d58549d024606d69b74f30620/file/8529644d58549d024606d69b74f30620/download/extracted/file_thumbnail",
"metadata": {
"metadata": {},
"language": "",
"languages": [],
"status": "PROCESSED"
},
"user_metadata": {
"classifications": [],
"relations": []
},
"created": "2022-05-25T14:25:03.999979",
"modified": "2022-05-25T14:25:03.999960"
}

If you want more data extracted from the resource by Nuclia, you need to add extra parameters to the call.

Get text

For example, if you have videos and want to use Nuclia to extract the text from these videos, you can use the parameters ?show=extracted&extracted=text. This will return the entire text that has been extracted from a resource (whatever the original file is, including videos or sound files).

curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id>?show=extracted&extracted=text

Example:

{
"id": "5b83d66181436cd272b81065655124a4",
"data": {
"files": {
"5b83d66181436cd272b81065655124a4": {
"extracted": {
"text": {
"text": " Ada Lovelace is considered one of the first computer programmers, which may sound a little strange because she lived in the early 1800s. Well, before the invention of the computer, it was the daughter of famous English poet, Lord Byron and Lady Anne Isabella Milbank, her mom was known for being incredibly smart, which Lord Byron found very attractive. Calling her, his princess of parallelograms."
"split_text": {},
"deleted_splits": []
}
}
}
}
}
}

Get metadata

If you use Nuclia to collect metadata from files, you can use the parameters ?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file. It will return all the metadata of a file stored in the resource.

It can apply to many other use cases of course, as the returned information is very rich (check the list below).

curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id>?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file

It provides the following information:

  • file metadata
  • nested texts (like text in an embedded image)
  • a summary
  • text blocks (under paragraphs property) and sentences (defined by the position of their first and last characters, plus start time and end time for a video or audio file)
  • named entities (people, dates, places, organizations, etc.)
  • links
  • a thumbnail
  • embedded files

Get vectors

Using the parameters ?show=extracted&extracted=vectors you can get the vector representation of the entire text.

curl https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id>?show=extracted&extracted=vectors

This can be useful if you want to use a third-party vector database.