Skip to main content

Introduction

NucliaDB is the database platform Nuclia uses to store and index data.

Core features:

  • Easily compare the vectors from different models.
  • Store text, files and vectors, labels and annotations.
  • Access and modify your resources efficiently.
  • Perform semantic, keyword, fulltext and graph searches.
  • Export your data in a format compatible with most NLP pipelines (HuggingFace datasets, pytorch, etc).

Quick start

1. Install NucliaDB and run it locally

With docker:

docker pull nuclia/nucliadb:latest

Or with Python pip:

pip install nucliadb
nucliadb

2. Create your first Knowledge Box

A Knowledge Box is a data container in NucliaDB.

To help you interact with NucliaDB, install the Python SDK:

pip install nucliadb_sdk

Then with just a few lines of code, you can start filling NucliaDB with data:

from nucliadb_sdk import NucliaDB, Region

sdk = NucliaDB(region=Region.ON_PREM, url="http://localhost:8080/api")

kb = sdk.create_knowledge_box(slug="my_new_kb")

3. Upload data

To help you upload data, you can also use the sentence_transformers python package:

pip install sentence_transformers

You can use it to insert some vectors:

from sentence_transformers import SentenceTransformer
import base64

encoder = SentenceTransformer("all-MiniLM-L6-v2")
sdk.create_resource(
kbid=kb.uuid,
texts={"text": {"body": "I'm Sierra, a very happy dog"}},
slug="mykey1",
files={
"file": {
"file": {
"filename": "data.txt",
"payload": base64.b64encode(b"asd"),
}
}
},
usermetadata={
"classifications": [{"labelset": "emotion", "label": "positive"}]
},
fieldmetadata=[
{
"field": {
"field": "text",
"field_type": "text",
},
"token": [{"token": "Sierra", "klass": "NAME", "start": 4, "end": 9}],
}
],
uservectors=[
{
"field": {
"field": "text",
"field_type": "text",
},
"vectors": {
"base": {
"vectors": {"vector": encoder.encode(["I'm Sierra, a very happy dog"])[0].tolist()},
}
},
}
],
)

Then insert more data to improve your search index:

sentences = [
"She's having a terrible day", "what a delighful day",
"Dog in catalan is gos", "he is heartbroken",
"He said that the race is quite tough", "love is tough"
]
labels = [
("emotion", "negative"),
("emotion", "positive"),
("emotion", "neutral"),
("emotion", "negative"),
("emotion", "neutral"),
("emotion", "negative")
]
for i in range(len(sentences)):
sdk.create_resource(
kbid=kb.uuid,
texts={"text": {"body": sentences[i]}},
files={
"file": {
"file": {
"filename": "data.txt",
"payload": base64.b64encode(b"asd"),
}
}
},
usermetadata={
"classifications": [{"labelset": labels[i][0], "label": labels[i][1]}]
},
uservectors=[
{
"field": {
"field": "text",
"field_type": "text",
},
"vectors": {
"base": {
"vectors": {"vector": encoder.encode([sentences[i]])[0].tolist()},
}
},
}
],
)

Finally, you can perform a search on your data:

from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")

query_vectors = encoder.encode(["To be in love"])[0].tolist()

results = sdk.search(kbid=kb.uuid, vector = query_vectors, vectorset="base", min_score=0.25)

Connecting the SDK to Nuclia Cloud

You can also connect to Nuclia CLoud with the SDK by providing an API key::

from nucliadb_sdk import NucliaDB, Region

sdk = NucliaDB(api_key="<fill in your api key here>")

Connecting you database to Nuclia Cloud

Connecting your database to Nuclia Cloud allows you to own your data while utilizing Nuclia's Understanding API™: Get your NUA API Key.

Nuclia's Understanding API™ provides a data extraction, enrichmentment and inference.

By utilizing it, you can allow Nulica to do all the heavy lifting for you while you own your own data.

To enable, provide the NUA_API_KEY environment variable when you run NucliaDB:

docker run -it -e NUA_API_KEY=<YOUR-NUA-API-KEY> \
-p 8080:8080 -v nucliadb-standalone:/data nuclia/nucliadb:latest

Then, upload a file into your Knowledge Box:

curl "http://localhost:8080/api/v1/kb/<KB_UUID>/upload" \
-X POST \
-H "X-NUCLIADB-ROLES: WRITER" \
-H "X-FILENAME: `echo -n "myfile" | base64`"
-T /path/to/file

After the data has been processed, you will be able to search against it:

curl http://localhost:8080/api/v1/kb/${KB_UUID}/search?query=your+own+query \
-H "X-NUCLIADB-ROLES: READER"