Index a batch of text resources
This guide will help you index a batch of text resources with their associated metadata, listed in a CSV file.
Prerequisites
-
API Key: Obtain a contributor or writer API key as detailed here.
-
Python 3: Ensure Python 3 is installed on your system:
python --version
-
Nuclia SDK: Install the Nuclia package in your environment::
pip install nuclia
Run the script
Assuming your text data comes with the following metadata:
- a country
- an URL (that's an example, you can adapt it to your own data)
Use the following Python script to upload your text contents to your Nuclia knowledgebox:
import csv
from nuclia import sdk
import sys
KNOWLEDGE_BOX = "https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>"
API_KEY = "<your-api-key-with-contributor-access>"
sdk.NucliaAuth().kb(url=KNOWLEDGE_BOX, token=API_KEY)
def upload(row):
sdk.NucliaResource().create(
slug=row['id'],
title=row['id'],
texts={"text": {"format": "PLAIN", "body": row["text"]}},
usermetadata={
"classifications": [{"labelset":"country", "label": row["country"]}]
},
origin={
"url": row["url"],
},
)
def read_file(path):
with open(path) as csvfile:
reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in reader:
upload(row)
if __name__ == "__main__":
file_path = sys.argv[1]
read_file(file_path)
To execute the script and start uploading a website, run the following command:
python3 script.py /path/to/csv
For more information on the Nuclia Python SDK, see the Nuclia Python SDK documentation.