Setting up NucliaDB in the Cloud
This document focuses on setting up NucliaDB to work on the following cloud providers:
- AWS
- GCP
However, for all cloud providers, you at a minimum need a PostgreSQL database and a VM with attached persistent disk in order to run NucliaDB.
AWS
Requirements:
- RDS(or Aurora)
- PostgreSQL version 12+
- S3 Bucket Creation Accesss
- Need access to create new S3 Buckets
- Access key and id
NucliaDB Environment Variable Configuration:
DATA_PATH=/mnt/data
: Path to mounted persistent disk to store indexes onDRIVER=pg
: Configure NucliaDB with PostgreSQL for metadata storageDRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres
: PostgreSQL connection stringFILE_BACKEND=s3
: Configure NucliaDB with S3 blob storageS3_CLIENT_ID=<AWS_CRED_CLIENT_ID>
: S3 Client IdS3_CLIENT_SECRET=<AWS_CRED_CLIENT_SECRET>
: S3 Client SecretS3_REGION_NAME=<AWS_REGION>
: S3 RegionNUA_API_KEY=<API_KEY_VALUE>
: Nuclia Understanding API Key. This is the authentication key for using Nuclia´s processing engine. Check out this page to know how to obtain one.CORS_ORIGINS=["http://localhost:8080"]
: CORS configuration for your service
GCP
Requirements:
-
SQL(PostgreSQL)
- PostgreSQL version 12+
- We recommend either using the managed PostgreSQL from GCP or installing it via Helm with this helm chart.
-
GCS(Google cloud storage) Bucket Creation Accesss
- NucliaDB needs access to create new GCS Buckets: each KnowledgeBox will have its own bucket where the binaries of the pushed data will be stored.
- Service credential file: is needed to grant NucliaDB access to the GCS service. It needs to be base-64 encoded and configured in
GCS_BASE64_CREDS
.
-
Storage class
- If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the
values.yaml
file of the chart. Check out GCP documentation on how to create a storage class.
- If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the
NucliaDB Environment Variable Configuration:
DATA_PATH=/mnt/data
: Path to mounted persistent disk to store indexes onDRIVER=pg
: Configure NucliaDB with PostgreSQL for metadata storageDRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres
: PostgreSQL connection string. Make sure your password does not contain url-invalid characters. To check if it is valid you can runpython -c 'from urllib.parse import urlparse; urlparse("<YOUR-DSN-HERE>")'
FILE_BACKEND=gcs
: Configure NucliaDB with GCS blob storageGCS_PROJECT=<PROJECT ID>
: Project ID for your Google Cloud AccountGCS_LOCATION=<GOOGLE CLOUD REGION>
: Google cloud regionGCS_BUCKET=nucliadb_{kbid}
: GCS bucket naming formatGCS_BASE64_CREDS=<B64_ENCODED_CREDS>
: Base-64 encoded GCS credentials.NUA_API_KEY=<API_KEY_VALUE>
: Nuclia Understanding API Key. This is the authentication key for using Nuclia´s processing engine. Check out this page to know how to obtain one.CORS_ORIGINS=["http://localhost:8080"]
: CORS configuration for your service
Check out the tutorial on how to install NucliaDB on GCP for a more in-depth walkthrough of the process.
Cluster Support
If you are manually setting up multiple nodes, you will need to configure them to be able to speak to each other:
CLUSTER_DISCOVERY_MODE=manual
: Manual specify the addresses of nodes in the clusterCLUSTER_DISCOVERY_MANUAL_ADDRESSES
: JSON compatible value for list of node addresses in the cluster
If you are installing NucliaDB via Kubernetes, make sure that the values.yaml
file has the following values set:
replicas: 2
env:
cluster_discovery_mode: kubernetes
cluster_discovery_kubernetes_namespace: nucliadb
cluster_discovery_kubernetes_selector: 'app.kubernetes.io/name=node'
so that each node is able to automatically join the cluster.
Kubernetes
The easiest way to install and manage NucliaDB is to utilize Kubernetes.
Each cloud provider has their own managed Kubernetes implementation. This document will not cover any cloud specific Kubernetes details, but it is recommended to use kubernetes for your NucliaDB On-prem install if it is available to you.