Deploying a full NucliaDB cluster on Virtual Machines
This guide will help you deploy a NucliaDB cluster consisting of the following machines:
- One PostgreSQL server
- One MinIO storage server
- Two NucliaDB servers
- One nginx reverse proxy / load balancer
The MinIO storage server can be replaced by any storage system that exposes an S3 or GCS API. Any other load balancer can be used.
NOTE: This guide only configures a single server for the database (PostgreSQL) and blob storage (MinIO). For use in production, it is strongly recommended to use a proper High-Availability setup for these services, either by using a managed solution or configuring it manually. This is outside the scope of this guide, you can check the documentation of those software packages instead.
This guide has been tested on the following operating systems, starting from a base installation with the default packages:
- Debian 12
- Ubuntu 22.04 LTS
- Fedora 39
- CentOS Stream 9
- RHEL 9
However, it should be possible to make it work in any recent Linux distribution with minor adaptations.
PostgreSQL
The PostgreSQL server is used for storing all metadata about uploaded resources.
This guide covers the main minimum for a demonstration. It does not cover performance tuning, security, backups, etc. For more information, check out PostgreSQL documentation or consider using a hosted database service.
Installation
- Debian/Ubuntu
- Fedora/RHEL
apt install postgresql
dnf install postgresql-server
postgresql-setup --initdb
systemctl enable --now postgresql
Creating a database and user
It's recommended to create a database and user for exclusive use by NucliaDB, you can achieve this by running the following commands (remember to change the password):
su postgres
psql -c "CREATE USER nucliadb PASSWORD '12345678'";
psql -c "CREATE DATABASE nucliadb OWNER nucliadb";
exit
Allowing connection from external machines
- Debian/Ubuntu
- Fedora/RHEL
# Make the server accept connections from any external address
echo "host nucliadb nucliadb all md5" >> /etc/postgresql/1*/main/pg_hba.conf
echo "listen_addresses = '*'" >> /etc/postgresql/1*/main/postgresql.conf
systemctl restart postgresql
# Make the server accept connections from any external address
echo "host nucliadb nucliadb all md5" >> /var/lib/pgsql/data/pg_hba.conf
echo "listen_addresses = '*'" >> /var/lib/pgsql/data/postgresql.conf
systemctl restart postgresql
# Make the firewall allow connections to PostgreSQL
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --add-port=5432/tcp
You can test that the connection works correctly by trying to connect:
psql -h <server_ip> -U nucliadb nucliadb
MinIO
The MinIO storage server acts as the file backend and stores all the documents (binary files) uploaded to NucliaDB. The data for each Knowledge Box will be stored into a different bucket.
This guide covers the main minimum for a demonstration. It does not cover performance tuning, security, backups, etc. For more information, check out MinIO documentation or consider using a hosted storage service.
Installation
- Debian/Ubuntu
- Fedora/RHEL
wget https://dl.min.io/server/minio/release/linux-amd64/minio.deb
dpkg -i minio.deb
wget https://dl.min.io/server/minio/release/linux-amd64/minio.rpm
dnf localinstall minio.rpm
# Make the firewall allow connections to MinIO
firewall-cmd --permanent --add-port=9000/tcp
firewall-cmd --add-port=9000/tcp
Configuration
Create and edit /etc/default/minio
with the user/password to use for the admin user and the data directory:
MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD=12345678
MINIO_VOLUMES="/mnt/data"
Make sure the minio-user
has permissions to write in the data directory:
groupadd -r minio-user
useradd -M -r -g minio-user minio-user
mkdir /mnt/data
chown minio-user:minio-user /mnt/data
Finally, start the service:
systemctl enable --now minio
NucliaDB
The NucliaDB machines provide the Nuclia search functionality and manage the search indexes. They expose an API to implement all search endpoints as well as the ability to push your resources into the database.
They also provide a Web UI for both searching and administrative tasks.
We recommend to use two nodes for the NucliaDB cluster, so run the following installation instructions in two servers. Then configure CLUSTER_DISCOVERY_MANUAL_ADDRESSES
so that each server knows the IP address of both servers.
Installation
wget https://raw.githubusercontent.com/nuclia/nucliadb/main/scripts/install-vm.sh -O - | bash
Configuration is found in /etc/default/nucliadb
. Edit it so that it includes the following configuration:
LOG_OUTPUT_TYPE=STDOUT
DRIVER=pg
DRIVER_PG_URL=postgresql://nucliadb:12345678@<postgresql_server_ip>/nucliadb
FILE_BACKEND=s3
S3_CLIENT_ID=admin
S3_CLIENT_SECRET=12345678
S3_BUCKET=nucliadb-{kbid}
S3_ENDPOINT=http://<minio_server_ip>:9000
NUA_API_KEY=<nua_api_key>
CLUSTER_DISCOVERY_MODE=manual
CLUSTER_DISCOVERY_MANUAL_ADDRESSES='["<this_server_ip>:10009", "<other_server_ip>:10009"]'
A quick description of settings:
LOG_OUTPUT_TYPE=STDOUT
sends the logs to the system journal via stdoutDRIVER=pg
enables the PostgreSQL metadata driverDRIVER_PG_URL
is the connection URL/DSN of the PostgreSQL serverFILE_BACKEND=s3
enables the S3 file backend. An alternative isgcs
S3_CLIENT_ID
is the MinIO usernameS3_CLIENT_SECRET
is the MinIO passwordS3_BUCKET
is a pattern to generate bucket names for each Knowledge BoxS3_ENDPOINT
is the URL of the MinIO serverNUA_API_KEY
is an API Key for the cloud-hosted NUA API. Follow this guide to obtain itCLUSTER_DISCOVERY_MODE=manual
enables the NucliaDB clustermodeCLUSTER_DISCOVERY_MANUAL_ADDRESSES
is the list of endpoints of all NucliaDB servers in the cluster, in JSON format
Then, start nucliadb service:
systemctl enable --now nucliadb.service
Finally, validate it's status and view it's logs with:
systemctl status nucliadb.service
journalctl -u nucliadb.service
Allowing connection from external machines
- Debian/Ubuntu
- Fedora/RHEL
Nothing to do
# Make the firewall allow connections to internal cluster port
firewall-cmd --permanent --add-port=10009/tcp
firewall-cmd --add-port=10009/tcp
# Make the firewall allow connections to HTTP API/WebUI port
firewall-cmd --permanent --add-port=8080/tcp
firewall-cmd --add-port=8080/tcp
Configuration
Full list of configuration options
nginx
nginx acts as a load balancer to route requests to NucliaDB. Additionally, it can be extended to implement authentication.
This guide covers the main minimum for a demonstration. It does not cover TLS security, performance, etc. For that, check out nginx documentation.
Installation
- Debian/Ubuntu
- Fedora/RHEL
apt install nginx
dnf install nginx
systemctl enable --now nginx
# Make the firewall allow connections to HTTP API/WebUI port
firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --add-port=80/tcp
# Allow nginx to make HTTP requests
setsebool -P httpd_can_network_connect 1
Basic configuration
- Debian/Ubuntu
- Fedora/RHEL
Add the upstream servers (the NucliaDB servers) to a new file /etc/nginx/conf.d/nucliadb.conf
:
upstream nucliadb_vm {
server <nucliadb_server_ip>:8080;
server <nucliadb_other_server_ip>:8080;
}
Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default
find the existing location /
block, and replace it with the following.
location / {
proxy_pass http://nucliadb_vm;
}
Add the upstream servers (the NucliaDB servers) to a new file /etc/nginx/conf.d/nucliadb.conf
:
upstream nucliadb_vm {
server <nucliadb_server_ip>:8080;
server <nucliadb_other_server_ip>:8080;
}
Create a location to route all resources in a new file /etc/nginx/default.d/nucliadb.conf
:
location / {
proxy_pass http://nucliadb_vm;
}
Finally, reload the config with systemctl reload nginx
.
You can now access NucliaDB by visiting http://<nginx_server_ip>/admin
.
Basic authorization
This will setup the common use case where you want to keep NucliaDB private while exposing some Knowledge Boxes to the public (e.g: for use with a public widget). Private access will be controlled by authorization on nginx, which will pass the corresponding security headers to the NucliaDB hosts.
First, create a file /etc/nginx/users.auth
with a line per user, in the format username:digested_password
, where the digested password can be obtain with openssl passwd
. For example:
$ openssl passwd
Password: 12345678
Verifying - Password: 12345678
$1$RzDpaG9V$L99glseI5KE5wfx.6z2eJ0
$ echo 'nucliadb:$1$RzDpaG9V$L99glseI5KE5wfx.6z2eJ0' >> /etc/nginx/users.auth
Then, let's configure nginx to use this file as an authorization database:
- Debian/Ubuntu
- Fedora/RHEL
Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default
find the existing location /
block, and replace it with the following.
location / {
auth_basic "NucliaDB";
auth_basic_user_file users.auth;
proxy_set_header x-nucliadb-roles "READER;WRITER;MANAGER";
proxy_pass http://nucliadb_vm;
}
Edit /etc/nginx/default.d/nucliadb.conf
so it looks like:
location / {
auth_basic "NucliaDB";
auth_basic_user_file users.auth;
proxy_set_header x-nucliadb-roles "READER;WRITER;MANAGER";
proxy_pass http://nucliadb_vm;
}
And then reload the configuration with systemctl reload nginx
.
You should now be asked for a username/password in order to access NucliaDB.
Public access (e.g: widget)
In order to allow public read access to a single Knowledge Box (e.g: a publicly accesible widget), you can edit the configuration to allow it like so:
- Debian/Ubuntu
- Fedora/RHEL
Edit the default site to add a location to route all resources. In /etc/nginx/sites-available/default
find the existing location /
block, and add a new one below:
location /public_api/v1/kb/<your_knowledge_box_id>/ {
proxy_set_header x-nucliadb-roles "READER";
proxy_pass http://nucliadb_vm/api/v1/kb/<your_knowledge_box_id>/;
add_header Access-Control-Allow-Origin *;
}
Edit /etc/nginx/default.d/nucliadb.conf
and add a new location block:
location /public_api/v1/kb/<your_knowledge_box_id>/ {
proxy_set_header x-nucliadb-roles "READER";
proxy_pass http://nucliadb_vm/api/v1/kb/<your_knowledge_box_id>/;
add_header Access-Control-Allow-Origin *;
}
Feel free to adjust or the CORS related headers (Access-Control-*
) as needed for your use case. The displayed configuration allows hosting the widget anywhere, but you can restrict usage to the widget to an specific domain by specifying it, e.g:
add_header Access-Control-Allow-Origin "https://my-company.com";
Finally, in the HTML code generated by the widget, replace
backend="http://some.domain/api"
with
backend="http://some.domain/public_api"
Users should now be be able to use the public widget without authentication.
Upgrade NucliaDB
In order to upgrade the NucliaDB machines to the latest version, you can run:
/usr/bin/upgrade-nucliadb
This script will take care of stopping the services, updating to the latest version and restarting the services.