Skip to main content

API Get processed data

Once processed, a resource can be accessed using the Nuclia API:

curl https://<zone><your-knowledge-box-id>/resources/<resource-id>

If the knowledge box is not public, you will need to provide a service access token:

curl https://<zone><your-knowledge-box-id>/resources/<resource-id> \
-H "X-STF-Serviceaccount: Bearer <your-service-access-token>"

By doing this call, you just get the minimal data about the resource. Here is a typical response:

"id": "8529644d58549d024606d69b74f30620",
"title": "Lamarr%20Lesson%20plan.pdf",
"summary": "",
"icon": "application/pdf",
"layout": "",
"thumbnail": "/kb/56d442bb-85f1-470c-b27b-6cb33309ba17/resource/8529644d58549d024606d69b74f30620/file/8529644d58549d024606d69b74f30620/download/extracted/file_thumbnail",
"metadata": {
"metadata": {},
"language": "",
"languages": [],
"status": "PROCESSED"
"user_metadata": {
"classifications": [],
"relations": []
"created": "2022-05-25T14:25:03.999979",
"modified": "2022-05-25T14:25:03.999960"

If you want the data extracted from the resource by Nuclia, you need to add extra parameters to the call.

Get text

If, for example, you have videos and you want to use Nuclia to extract the texts from theses videos, you can use the parameters ?show=extracted&extracted=text. It will return the entire text that has been extracted from a resource (whatever is the original file, including videos or sound files).

curl https://<zone><your-knowledge-box-id>/resources/<resource-id>?show=extracted&extracted=text


"id": "5b83d66181436cd272b81065655124a4",
"data": {
"files": {
"5b83d66181436cd272b81065655124a4": {
"extracted": {
"text": {
"text": " Ada Lovelace is considered one of the first computer programmers, which may sound a little strange because she lived in the early 1800s. Well, before the invention of the computer, it was the daughter of famous English poet, Lord Byron and Lady Anne Isabella Milbank, her mom was known for being incredibly smart, which Lord Byron found very attractive. Calling her, his princess of parallelograms."
"split_text": {},
"deleted_splits": []

Get metadata

If you use Nuclia to collect metadata from files, you can use the parameters ?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file. It will return all the metadata of a file stored in the resource.

It can apply to many other use cases of course, as the returned information is very rich (check the list bellow).

curl https://<zone><your-knowledge-box-id>/resources/<resource-id>?show=basic&show=origin&show=extracted&extracted=metadata&extracted=file

It provides the following information:

  • the file metadata
  • nested texts (like text in an embedded image)
  • a summary
  • the paragraphs and sentences (defined by the position of their first and last characters, plus start time and end time for a video or audio file)
  • the named entities (people, dates, places, organizations, etc.)
  • the links
  • a thumbnail
  • the embedded files

Get vectors

Using the parameters ?show=extracted&extracted=vectors you can get the vector representation of the entire text.

curl https://<zone><your-knowledge-box-id>/resources/<resource-id>?show=extracted&extracted=vectors

It can be useful if you want to use a third-party vector database.