Skip to main content

Setting up NucliaDB in the Cloud

This document focuses on setting up NucliaDB to work on the following cloud providers:

  • AWS
  • GCP

However, for all cloud providers, you at a minimum need a PostgreSQL database and a VM with attached persistent disk in order to run NucliaDB.

AWS

Requirements:

  • RDS(or Aurora)
    • PostgreSQL version 12+
  • S3 Bucket Creation Accesss
    • Need access to create new S3 Buckets
    • Access key and id

NucliaDB Environment Variable Configuration:

  • DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes on
  • DRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storage
  • DRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection string
  • FILE_BACKEND=s3: Configure NucliaDB with S3 blob storage
  • S3_CLIENT_ID=<AWS_CRED_CLIENT_ID>: S3 Client Id
  • S3_CLIENT_SECRET=<AWS_CRED_CLIENT_SECRET>: S3 Client Secret
  • S3_REGION_NAME=<AWS_REGION>: S3 Region
  • NUA_API_KEY=<API_KEY_VALUE>: Nuclia Understanding API Key. This is the authentication key for using Nuclia´s processing engine. Check out this page to know how to obtain one.
  • CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service

GCP

Requirements:

  • SQL(PostgreSQL)

  • GCS(Google cloud storage) Bucket Creation Accesss

    • NucliaDB needs access to create new GCS Buckets: each KnowledgeBox will have its own bucket where the binaries of the pushed data will be stored.
    • Service credential file: is needed to grant NucliaDB access to the GCS service. It needs to be base-64 encoded and configured in GCS_BASE64_CREDS.
  • Storage class

    • If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the values.yaml file of the chart. Check out GCP documentation on how to create a storage class.

NucliaDB Environment Variable Configuration:

  • DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes on
  • DRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storage
  • DRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection string. Make sure your password does not contain url-invalid characters. To check if it is valid you can run python -c 'from urllib.parse import urlparse; urlparse("<YOUR-DSN-HERE>")'
  • FILE_BACKEND=gcs: Configure NucliaDB with GCS blob storage
  • GCS_PROJECT=<PROJECT ID>: Project ID for your Google Cloud Account
  • GCS_LOCATION=<GOOGLE CLOUD REGION>: Google cloud region
  • GCS_BUCKET=nucliadb_{kbid}: GCS bucket naming format
  • GCS_BASE64_CREDS=<B64_ENCODED_CREDS>: Base-64 encoded GCS credentials.
  • NUA_API_KEY=<API_KEY_VALUE>: Nuclia Understanding API Key. This is the authentication key for using Nuclia´s processing engine. Check out this page to know how to obtain one.
  • CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service

Check out the tutorial on how to install NucliaDB on GCP for a more in-depth walkthrough of the process.

Cluster Support

If you are manually setting up multiple nodes, you will need to configure them to be able to speak to each other:

  • CLUSTER_DISCOVERY_MODE=manual: Manual specify the addresses of nodes in the cluster
  • CLUSTER_DISCOVERY_MANUAL_ADDRESSES: JSON compatible value for list of node addresses in the cluster

If you are installing NucliaDB via Kubernetes, make sure that the values.yaml file has the following values set:

replicas: 2

env:
cluster_discovery_mode: kubernetes
cluster_discovery_kubernetes_namespace: nucliadb
cluster_discovery_kubernetes_selector: 'app.kubernetes.io/name=node'

so that each node is able to automatically join the cluster.

Kubernetes

The easiest way to install and manage NucliaDB is to utilize Kubernetes.

Each cloud provider has their own managed Kubernetes implementation. This document will not cover any cloud specific Kubernetes details, but it is recommended to use kubernetes for your NucliaDB On-prem install if it is available to you.