Skip to main content

What is Nuclia?

Nuclia is an End-to-End product able to ingest, extract, and store your data in order to search and provides Retrieval-Augmented Generation (RAG). With Nuclia you will be able to leverage any type of private data (docs, videos, etc) to boost applications with powerful search capability using natural language processing and machine learning to understand the searcher's intent and return results that are more relevant to the user's needs.

Nuclia is an API-First service that also aims to make all the complex implementation of these technolgies a hassle free experience for technical and non-technical users. We provide a platform that lets the user test and implement all this powerfull campabilities without having to wtite a single line of code.

This guide will explain how to set up Nuclia in your application.

note

Nuclia is an API-First service that also aims to make all the complex implementation of these technologies a hassle free experience for technical and non-technical users. We provide a platform that lets the user test and implement all these powerful capabilities without having to write a single line of code.

Nuclia integrates four main components:

  • Ingest: Nuclia is able to take any type of unstructured data, like docs, pdf, text, videos, audios, links and others. Is also able process some structure data in small files like XML, CSV and JSON. We are able to get all this files or "resources'' (as we call it) from different sources, whether is directly from the API, our Nuclia Dashboard or a direct integrations with storage products such as: GoogleDrive, OneDrive, Dropbox, Confluence, Sitemaps, SharePoint, Local Files Storage and others.

  • Process: Nuclia runs several steps to process data. First, extracts the text from your data, whatever the format is. It runs a Speech-to-Text to process video or audio and also an Optical Character Recognition OCR when needed for videos and images. Second, the text gets splitted into text blocks and, using Natural Language Processing NLP to understand the meaning of the text, it runs a Name Entity Recognition NER process to extract all key concepts each text block contains (like people, places, dates, etc). Third, it generates the embeddings for the text blocks. And lastly, generates a Knowledge Graph that illustrates the relationship between those extracted entities. All this processing is done in the cloud.

  • Index & Store: Here there are two main things happening. One is the indexing part of the data based on those components previously explained and then, the storing part in NucliaDB, our vector database. Nuclia indexes all metadata plus all those four layers of the information: the entire document extracted text, each splitted text block, all the embeddings and the generated Knowledge Graph. All this extracted and original data gets stored in NucliaDB, an open source vector database project equipt to run by default in Nuclia's Cloud. It can also be deployed on a on-Prem/on-Cloud instance for preference or security and data governance compliance of the user/organization.

  • Search & RAG: Even though a search is contained in a RAG framework, Nuclia can seemingly separate both intents from each other. On one hand there is Search where we take any Natural Language queries and based on its meaning and wording, run a keyword and semantic search to retrieve the most relevant results connected to the questions for the user. On the other hand, we have Retrieval-Augmented Generation RAG, where Nuclia takes all this previous Search results and provides them to Large Language Models LLMs (such as GPT, Anthropic or Gemini) to generate an answer that not only avoids hallucinations caused by LLMs, but is also able to provide citations and the original documentation for further exploration and audit. Is also worthy to mention the potential LLMs customizable prompting features offer, they can enhance not only the accuracy of the answer but they also can modify its structure and tone providing a more specific experience for the end user.

  • Display & Output: Nuclia is keen on providing fast implementation and testing of our platform to technical and non-technical users, for this reason there are three paths you can take when it comes to UI display. The easiests and fastest is with Nuclia's widget, where you can customize the behavior you want out of the box and Nuclia provides a snippet of code with a HTML component that can be implemented in every HTML/frontend framework and even WordPress. The more technical approach is the API route that helps you build your own UI solution, customized it as you see fit and further leverage the API capabilities to your special needs. The third way, let's call it the hybrid approach, would be to further customise the widget with our open source UI starter repository (because we at Nuclia are firm believers in opensource projects).