Skip to main content

Playing with different embedding models

Currently, there are plenty of embedding models that can be used for your data and choosing one may be difficult. Luckly, Nuclia allows you to try and use different models at the same time.

To add new embedding models to your KB, you only need to use the NucliaDB Vectorsets API. For example, imagine you want to add multilingual-2024-05-06, a POST https://nuclia.cloud/api/v1/kb/<your-kb-id>/vectorsets/multilingual-2024-05-06 will add this new embedding model to your Knowledge Box.

Once added, every new resource will be processed with all your embedding models and can be used in search using passing the vectorset parameter and the name of the embedding model.

What happens with old data? If you want to reprocess all your existing data, you can start a new semantic-model-migrator task to do so. Once finished, all your existing data now will be searchable with your existing and recently added embedding model, that easy!

In a big corpus of data, migrating to another embedding model can be costly, that's why this step is explicit. If you're only testing, you can skip it test with new data.

I found a better model, can I remove the previous one? Sure! There's no point on having more embedding models than necessary. If you want to delete an embedding model, just DELETE https://nuclia.cloud/api/v1/kb/<your-kb-id>/vectorsets/multilingual-2024-05-06 and the embedding model will be removed from your Knowledge Box.