Skip to main content

Deploying a full NucliaDB cluster on Virtual Machines

This guide will help you deploy a NucliaDB cluster consisting of the following machines:

  • One PostgreSQL server
  • One MinIO storage server
  • Two NucliaDB servers
  • One nginx reverse proxy / load balancer

The MinIO storage server can be replaced by any storage system that exposes an S3 or GCS API. Any other load balancer can be used.

NOTE: This guide only configures a single server for the database (PostgreSQL) and blob storage (MinIO). For use in production, it is strongly recommended to use a proper High-Availability setup for these services, either by using a managed solution or configuring it manually. This is outside the scope of this guide, you can check the documentation of those software packages instead.

This guide has been tested on the following operating systems, starting from a base installation with the default packages:

  • Debian 12
  • Ubuntu 22.04 LTS
  • Fedora 39
  • CentOS Stream 9
  • RHEL 9

However, it should be possible to make it work in any recent Linux distribution with minor adaptations.

PostgreSQL

The PostgreSQL server is used for storing all metadata about uploaded resources.

This guide covers the main minimum for a demonstration. It does not cover performance tuning, security, backups, etc. For more information, check out PostgreSQL documentation or consider using a hosted database service.

Installation

apt install postgresql

Creating a database and user

It's recommended to create a database and user for exclusive use by NucliaDB, you can achieve this by running the following commands (remember to change the password):

su postgres
psql -c "CREATE USER nucliadb PASSWORD '12345678'";
psql -c "CREATE DATABASE nucliadb OWNER nucliadb";
exit

Allowing connection from external machines

# Make the server accept connections from any external address
echo "host nucliadb nucliadb all md5" >> /etc/postgresql/1*/main/pg_hba.conf
echo "listen_addresses = '*'" >> /etc/postgresql/1*/main/postgresql.conf
systemctl restart postgresql

You can test that the connection works correctly by trying to connect:

psql -h <server_ip> -U nucliadb nucliadb

MinIO

The MinIO storage server acts as the file backend and stores all the documents (binary files) uploaded to NucliaDB. The data for each Knowledge Box will be stored into a different bucket.

This guide covers the main minimum for a demonstration. It does not cover performance tuning, security, backups, etc. For more information, check out MinIO documentation or consider using a hosted storage service.

Installation

wget https://dl.min.io/server/minio/release/linux-amd64/minio.deb
dpkg -i minio.deb

Configuration

Create and edit /etc/default/minio with the user/password to use for the admin user and the data directory:

MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD=12345678
MINIO_VOLUMES="/mnt/data"

Make sure the minio-user has permissions to write in the data directory:

groupadd -r minio-user
useradd -M -r -g minio-user minio-user
mkdir /mnt/data
chown minio-user:minio-user /mnt/data

Finally, start the service:

systemctl enable --now minio

NucliaDB

The NucliaDB machines provide the Nuclia search functionality and manage the search indexes. They expose an API to implement all search endpoints as well as the ability to push your resources into the database.

They also provide a Web UI for both searching and administrative tasks.

We recommend to use two nodes for the NucliaDB cluster, so run the following installation instructions in two servers. Then configure CLUSTER_DISCOVERY_MANUAL_ADDRESSES so that each server knows the IP address of both servers.

Installation

wget https://raw.githubusercontent.com/nuclia/nucliadb/main/scripts/install-vm.sh -O - | bash

Configuration is found in /etc/default/nucliadb. Edit it so that it includes the following configuration:

LOG_OUTPUT_TYPE=STDOUT

DRIVER=pg
DRIVER_PG_URL=postgresql://nucliadb:12345678@<postgresql_server_ip>/nucliadb

FILE_BACKEND=s3
S3_CLIENT_ID=admin
S3_CLIENT_SECRET=12345678
S3_BUCKET=nucliadb-{kbid}
S3_ENDPOINT=http://<minio_server_ip>:9000

NUA_API_KEY=<nua_api_key>

CLUSTER_DISCOVERY_MODE=manual
CLUSTER_DISCOVERY_MANUAL_ADDRESSES='["<this_server_ip>:10009", "<other_server_ip>:10009"]'

A quick description of settings:

  • LOG_OUTPUT_TYPE=STDOUT sends the logs to the system journal via stdout
  • DRIVER=pg enables the PostgreSQL metadata driver
  • DRIVER_PG_URL is the connection URL/DSN of the PostgreSQL server
  • FILE_BACKEND=s3 enables the S3 file backend. An alternative is gcs
  • S3_CLIENT_ID is the MinIO username
  • S3_CLIENT_SECRET is the MinIO password
  • S3_BUCKET is a pattern to generate bucket names for each Knowledge Box
  • S3_ENDPOINT is the URL of the MinIO server
  • NUA_API_KEY is an API Key for the cloud-hosted NUA API. Follow this guide to obtain it
  • CLUSTER_DISCOVERY_MODE=manual enables the NucliaDB clustermode
  • CLUSTER_DISCOVERY_MANUAL_ADDRESSES is the list of endpoints of all NucliaDB servers in the cluster, in JSON format

Then, start nucliadb service:

systemctl enable --now nucliadb.service

Finally, validate it's status and view it's logs with:

systemctl status nucliadb.service
journalctl -u nucliadb.service

Allowing connection from external machines

Nothing to do

Configuration

Full list of configuration options

nginx

nginx acts as a load balancer to route requests to NucliaDB. Additionally, it can be extended to implement authentication.

This guide covers the main minimum for a demonstration. It does not cover TLS security, performance, etc. For that, check out nginx documentation.

Installation

apt install nginx

Basic configuration

Add the upstream servers (the NucliaDB servers) to a new file /etc/nginx/conf.d/nucliadb.conf:

upstream nucliadb_vm {
server <nucliadb_server_ip>:8080;
server <nucliadb_other_server_ip>:8080;
}

Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default find the existing location / block, and replace it with the following.

location / {
proxy_pass http://nucliadb_vm;
}

Finally, reload the config with systemctl reload nginx.

You can now access NucliaDB by visiting http://<nginx_server_ip>/admin.

Basic authorization

This will setup the common use case where you want to keep NucliaDB private while exposing some Knowledge Boxes to the public (e.g: for use with a public widget). Private access will be controlled by authorization on nginx, which will pass the corresponding security headers to the NucliaDB hosts.

First, create a file /etc/nginx/users.auth with a line per user, in the format username:digested_password, where the digested password can be obtain with openssl passwd. For example:

$ openssl passwd
Password: 12345678
Verifying - Password: 12345678
$1$RzDpaG9V$L99glseI5KE5wfx.6z2eJ0

$ echo 'nucliadb:$1$RzDpaG9V$L99glseI5KE5wfx.6z2eJ0' >> /etc/nginx/users.auth

Then, let's configure nginx to use this file as an authorization database:

Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default find the existing location / block, and replace it with the following.

location / {
auth_basic "NucliaDB";
auth_basic_user_file users.auth;

proxy_set_header x-nucliadb-roles "READER;WRITER;MANAGER";
proxy_pass http://nucliadb_vm;
}

And then reload the configuration with systemctl reload nginx.

You should now be asked for a username/password in order to access NucliaDB.

Public access (e.g: widget)

In order to allow public read access to a single Knowledge Box (e.g: a publicly accesible widget), you can edit the configuration to allow it like so:

Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default find the existing location / block, and add a new one below:

location /public_api/v1/kb/<your_knowledge_box_id>/ {
proxy_set_header x-nucliadb-roles "READER";
proxy_pass http://nucliadb_vm/api/v1/kb/<your_knowledge_box_id>/;
add_header Access-Control-Allow-Origin *;
}

Feel free to adjust or the CORS related headers (Access-Control-*) as needed for your use case. The displayed configuration allows hosting the widget anywhere, but you can restrict usage to the widget to an specific domain by specifying it, e.g:

add_header Access-Control-Allow-Origin "https://my-company.com";

Finally, in the HTML code generated by the widget, replace backend="http://some.domain/api" with backend="http://some.domain/public_api"

Users should now be be able to use the public widget without authentication.

Upgrade NucliaDB

In order to upgrade the NucliaDB machines to the latest version, you can run:

/usr/bin/upgrade-nucliadb

This script will take care of stopping the services, updating to the latest version and restarting the services.