How to build a RAG engine with Strapi and Nuclia

Strapi is an open-source headless CMS, it offers a nice admin panel to manage content and a powerful API to fetch it. But what if you want to provide a search engine for your content?

Nuclia is the best search API to do it!

Nuclia is an API able to index and process any kind of data, including audio and video files, to boost applications with powerful search capability, using natural language processing and machine learning to understand the searcher's intent and return results that are more relevant to the searcher's needs.

The steps are simple:

use lifecycle events to index text content,
index video and audio files too!
use the Nuclia widget to allow visitors to search your content.

Let's go!

Indexing your content

You want that anytime a content is published in Strapi, it gets indexed in Nuclia, and vice versa, when a content is either deleted or unpublished in Strapi, it gets removed from Nuclia.

Fortunately, Strapi allows to hook into the content lifecycle events to execute custom code.

Let's say your content type is called article. You need to create a file named lifecycles.js in the src/api/article/content-types/article folder of your Strapi project.

By adding the following code, you will call either index() or unindex() in the expected events:

module.exports = {
  afterUpdate(event) {
    if (event.params.data.publishedAt === null) {
      unindex(event.result.id);
    } else if (!!event.params.data.publishedAt || !!event.result.publishedAt) {
      index(event.result);
    }
  },
  async beforeDelete(event) {
    const entry = await strapi.db.query('api::article.article').findOne({
      where: { id: event.params.where.id },
    });
    if (entry.publishedAt) {
      unindex(event.params.where.id);
    }
  },
  async beforeDeleteMany(event) {
    const entries = await strapi.db.query('api::article.article').findMany({
      where: event.params.where,
    });
    entries.forEach((entry) => unindex(entry.id));
  },
};

That's a good start, now you need to implement the index() and unindex() functions. That's where the Nuclia API comes into play.

First you need to install the following dependencies:

npm install @nuclia/core isomorphic-unfetch localstorage-polyfill
# OR
yarn add @nuclia/core isomorphic-unfetch localstorage-polyfill

Now you can create the Nuclia object:

const nuclia = new Nuclia({
  backend: 'https://nuclia.cloud/api',
  zone: 'europe-1',
  knowledgeBox: '6700692b-704e-4eb3-8558-5c2ba036c0bd',
  apiKey: 'YOUR-API-KEY',
});

As you can see, you need to provide a Nuclia API key. An API key is necessary when adding or modifying contents in a knowledge box. You can get your API key in the Nuclia Dashboard, in the "API keys" section:

Create a new API key (name it strapi for example) with Writer role
Click on the + sign to generate a new token for this service access
Copy the generated token and paste it in the previous script

Now you are ready to use the Nuclia API to index your content. Here is the index() function:

const index = (content) => {
  const resource = {
    title: content.Title,
    slug: `article-${content.id}`,
    texts: {
      text: {
        format: 'MARKDOWN',
        body: content.Body,
      },
    },
  };
  nuclia.db
    .getKnowledgeBox()
    .pipe(switchMap((kb) => kb.createOrUpdateResource(resource)))
    .subscribe({
      next: () => console.log(`Uploaded article ${content.id} to Nuclia`),
      error: (err) => console.error(`Error with article ${content.id}`, err),
    });
};

It creates the resource data structure expected by Nuclia, passing the title and the markdown content of the article provided by Strapi (assuming your Article content type has a Title field and a Body field). The data structure also contains a slug which will be used to identify the resource in Nuclia. It is important to make it unique, so we prefix it with the content type name.

The createOrUpdateResource() will either create the resource if the slug does not exist, or update it if it already exists.

The unindex() function is similar:

const unindex = (id) => {
  nuclia.db
    .getKnowledgeBox()
    .pipe(switchMap((kb) => kb.getResourceFromData({ id: '', slug: `article-${id}` }).delete()))
    .subscribe({
      next: () => console.log(`${content.id} deleted`),
      error: (err) => console.error(`Error when deleting article ${content.id}`, err),
    });
};

Indexing media files

Nuclia is really good at indexing video or audio files. But for now you are only providing the text content of your articles. Let's fix that!

Let's imagine you have a video content type with a Title field and a Video field. You want to index the video file in Nuclia.

You will implement a lifecycles.js file in the src/api/video/content-types/video folder very similar to the one you have just created for the article content type.

The difference is in the index() function:

const index = (content) => {
  const filePath = `./public${content.Video.url}`;
  const filename = filePath.split('/').pop();
  const contentType = content.Video.mime;
  const id = `video-${content.id}`;
  const resourceData = {
    title: content.Title,
    slug: id,
  };
  nuclia.db
    .getKnowledgeBox()
    .pipe(
      switchMap((kb) =>
        kb.createOrUpdateResource(resourceData).pipe(
          switchMap(() => kb.getResourceBySlug(id, ['values'])),
          switchMap((res) => {
            const fileContent = fs.readFileSync(filePath);
            if (hasFileChanged(id, res, fileContent)) {
              return uploadFile(kb, id, filename, fileContent, contentType);
            } else {
              return of(null);
            }
          }),
        ),
      ),
    )
    .subscribe({
      next: () => console.log(`Uploaded ${id} to Nuclia`),
      error: (err) => console.error(`Error with ${id}`, err),
    });
};

In Strapi, media files are in the /public folder, and the Video field contains the path to the file and its MIME type. That's a good start.

The createOrUpdateResource() function is the same as before, passing the title and the slug. But regarding the file, as it might be big, you want to make sure it is worth uploading it or mot.

By calling getResourceBySlug(), you get the current content of the resource, and it contains the MD5 of the stored file (if any). That way you can compare it with the current file content MD5 and know if that is a different field or not. That's what happens in hasFileChanged:

const hasFileChanged = (id, resource, fileContent) => {
  if (resource.data.files && resource.data.files[id]) {
    const md5 = crypto.createHash('md5').update(fileContent).digest('hex');
    return md5 !== resource.data.files[id].file?.md5;
  }
};

If the file has changed, you upload it with the uploadFile() function, passing the file content and the MIME type:

const uploadFile = (kb, id, filename, fileContent, contentType) => {
  return kb.getResourceFromData({ id: '', slug: id }).upload(id, fileContent.buffer, false, {
    contentType,
    filename,
  });
};

And here you go, you can now index your video files in Nuclia!

The full code example discussed here is available on GitHub.

Provide the Nuclia search & RAG feature in your Strapi application

To let your users ask questions about your indexed content, you can use the Nuclia widget.

In the Nuclia Dashboard, go to the "Widgets" section, and create a new widget. You can customize it as you want, and then copy the corresponding HTML snippet directly in your application. It is a web component, it works independently from the technology you are using.

If you prefer, you can also implement your own search interface, using the Nuclia JS SDK to query your knowledge box.

Indexing your content​

Indexing media files​

Provide the Nuclia search & RAG feature in your Strapi application​

Indexing your content

Indexing media files

Provide the Nuclia search & RAG feature in your Strapi application