Playing with different embedding models
Currently, there are plenty of embedding models that can be used for your data and choosing one may be difficult. Luckly, Nuclia allows you to try and use different models at the same time.
To add new embedding models to your KB, you only need to use the NucliaDB
Vectorsets API. For example, imagine you want to add multilingual-2024-05-06
,
a POST https://nuclia.cloud/api/v1/kb/<your-kb-id>/vectorsets/multilingual-2024-05-06
will add this new embedding model to your Knowledge Box.
Once added, every new resource will be processed with all your embedding models
and can be used in search using passing the vectorset
parameter and the name
of the embedding model.
What happens with old data?
If you want to reprocess all your existing data, you can start a new
semantic-model-migrator
task to do so. Once finished, all your existing data
now will be searchable with your existing and recently added embedding model,
that easy!
In a big corpus of data, migrating to another embedding model can be costly, that's why this step is explicit. If you're only testing, you can skip it test with new data.
I found a better model, can I remove the previous one?
Sure! There's no point on having more embedding models than necessary. If you
want to delete an embedding model, just DELETE https://nuclia.cloud/api/v1/kb/<your-kb-id>/vectorsets/multilingual-2024-05-06
and the embedding model will be removed from your Knowledge Box.