Skip to main content

Index a batch of text resources

This guide will help you index a batch of text resources with their associated metadata, listed in a CSV file.

Prerequisites

  1. API Key: Obtain a contributor or writer API key as detailed here.

  2. Python 3: Ensure Python 3 is installed on your system:

    python --version
  3. Nuclia SDK: Install the Nuclia package in your environment::

    pip install nuclia

Run the script

Assuming your text data comes with the following metadata:

  • a country
  • an URL (that's an example, you can adapt it to your own data)

Use the following Python script to upload your text contents to your Nuclia knowledgebox:

import csv
from nuclia import sdk
import sys

KNOWLEDGE_BOX = "https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>"
API_KEY = "<your-api-key-with-contributor-access>"

sdk.NucliaAuth().kb(url=KNOWLEDGE_BOX, token=API_KEY)

def upload(row):
sdk.NucliaResource().create(
slug=row['id'],
title=row['id'],
texts={"text": {"format": "PLAIN", "body": row["text"]}},
usermetadata={
"classifications": [{"labelset":"country", "label": row["country"]}]
},
origin={
"url": row["url"],
},
)

def read_file(path):
with open(path) as csvfile:
reader = csv.DictReader(csvfile, delimiter=';', quotechar='"')
for row in reader:
upload(row)

if __name__ == "__main__":
file_path = sys.argv[1]
read_file(file_path)

To execute the script and start uploading a website, run the following command:

python3 script.py /path/to/csv

For more information on the Nuclia Python SDK, see the Nuclia Python SDK documentation.