Building a RAG engine with Next.js and Nuclia
Next.js is a React framework that allows you to build server-side rendered applications with React. It is a great tool to build static websites, but it also allows you to build dynamic websites, with server-side rendering and static generation.
What are your options if you want to offer a search feature in your application?
You can definitely implement with Next.js and NodeJS a very naive search engine able to find a word in any page of your website. But if you expect anything smarter than just exact word matching and case-insensitiveness, it will be tough. Now, imagine if your web site contains videos, PDFs or audio files and you want their content to be searchable as well… That's where Nuclia comes in!
Nuclia is an API able to index and process any kind of data, including audio and video files, to boost applications with powerful search & RAG capability.
Indexing page contents automatically
The Nuclia Dashboard is an easy way to index files or web pages by yourself. That's nice for testing purpose, but it is definitely better to index your Next.js pages automatically.
Let's do a NodeJS script that will collect all Markdown files in your Next.js application and index them in your knowledge box.
First you need to install the Nuclia SDK and the dependencies allowing to use it in NodeJS:
npm install @nuclia/core localstorage-polyfill isomorphic-unfetch
# OR
yarn add @nuclia/core localstorage-polyfill isomorphic-unfetch
Here is how a typical NodeJS script can use the Nuclia API:
const { Nuclia } = require('@nuclia/core');
require('localstorage-polyfill');
require('isomorphic-unfetch');
const nuclia = new Nuclia({
backend: 'https://nuclia.cloud/api',
zone: 'europe-1',
knowledgeBox: '<YOUR-KB-ID>',
apiKey: '<YOUR-API-KEY>',
});
// code to push data to Nuclia (detailed later)
As you can see, you need to provide a Nuclia API key. An API key is necessary when adding or modifying contents in a knowledge box. You can get your API key in the Nuclia Dashboard, in the "API keys" section:
- Create a new API key (name it
nodejs-upload
for example) with Writer role - Click on the
+
sign to generate a new token for this service access - Copy the generated token and paste it in your NodeJS script
Then you can write a script named upload-posts.js
that will index all the Markdown files from ./pages/posts
:
const fs = require('fs');
const path = require('path');
const { Nuclia } = require('@nuclia/core');
require('localstorage-polyfill');
require('isomorphic-unfetch');
const nuclia = new Nuclia({
backend: 'https://nuclia.cloud/api',
zone: 'europe-1',
knowledgeBox: '<YOUR-KB-ID>',
apiKey: '<YOUR-API-KEY>',
});
const uploadPosts = (kb) => {
// Get posts
const postsDir = path.join(process.cwd(), 'pages', 'posts');
const posts = fs.readdirSync(postsDir);
posts
.filter((post) => post.endsWith('.mdx'))
.forEach((post) => {
const postPath = path.join(postsDir, post);
const postContent = fs.readFileSync(postPath, 'utf8');
const postTitle = postContent.split('\n')[0].replace('# ', '');
const postSlug = post.replace('.mdx', '');
// Upload post to Nuclia
const resource = {
title: postTitle,
slug: postSlug,
texts: {
text: {
format: 'MARKDOWN',
body: postContent,
},
},
};
kb.createResource(resource).subscribe({
next: () => console.log(`Uploaded ${postSlug} to Nuclia`),
error: (err) => console.error(`Error with ${postSlug}`, err),
});
});
};
nuclia.db.getKnowledgeBox().subscribe((kb) => uploadPosts(kb));
This script does the following:
- iterate on the
.mdx
files in./pages/posts
, - for each file, extract its markdown content, and get its title from its first line,
- and then upload it to Nuclia using the
createResource
method.
You can run this script with:
node upload-posts.js
Now if you check your Nuclia Dashboard, you should see your posts uploaded in your knowledge box! You will see them in the resource list, and as soon as they are fully processed, you can start asking questions to your Nuclia search engine.
Indexing external links and media files
Nuclia can index any kind of data, not just text. Let's say you have some posts containing links to external pages or to local PDF files.
It would be nice to make their content searchable too.
So what about finding in the blog posts any link to local files or to external web pages and index them?
First you need to find links in markdown files. They are always written like [some-title](some-url)
. So you can use a regular expression to extract them:
const markdownLinks = [...postContent.matchAll(/\[.*?\]\((.*?)\)/g)].map((match) => match[1]);
Then we have 3 cases:
- The link starts with
http
: it is an external link, so you will add it to the resource as alink
field. - The link starts with
/media
: it is a media file, so you will add it to the resource as afile
field.
link
fields can be added directly in the creation payload just like you did with the text
field in the previous step:
const links = markdownLinks
.filter((link) => link.startsWith('http'))
.reduce((all, link, index) => {
all[`link-${index}`] = { uri: link };
return all;
}, {});
const resource = {
title: postTitle,
slug: postSlug,
texts: {
text: {
format: 'MARKDOWN',
body: postContent,
},
},
links,
};
At the contrary, file
fields cannot be added directly because they are binaries, so you need to get the resource once created and then use its upload()
method.
As it involves asynchronous operations, you need to install rxjs
:
npm install rxjs
# OR
yarn add rxjs
Then you can write the following code:
const localFiles = markdownLinks.filter((link) => link.startsWith('/medias'));
kb.createResource(resource, true).pipe(
switchMap((data) =>
localFiles.length > 0
? kb.getResource(data.uuid, [], []).pipe(
switchMap((resource) =>
forkJoin(
localFiles.map((file) => {
const filePath = path.join(process.cwd(), 'public', file);
const fileContent = fs.readFileSync(filePath).buffer;
const fileName = file.split('/').pop();
return resource.upload(fileName, fileContent);
}),
),
),
)
: of(true),
),
);
And now, by running the script again, you should see your posts with their links and media files indexed in Nuclia.
Provide the Nuclia search & RAG feature in your Next.js application
To let your users ask questions about your indexed content, you can use the Nuclia widget.
In the Nuclia Dashboard, go to the "Widgets" section, and create a new widget. You can customize it as you want, and then copy the corresponding HTML snippet directly in your Next.js application. It is a web component, it works independently from the technology you are using.
If you prefer, you can also implement your own search interface in React, using the Nuclia JS SDK to query your knowledge box.