Pinecone integration
This integration is currently in beta phase, and users must be granted explicit access to the feature by Nuclia. Please contact support@nuclia.com if you are interested in being on the early access list.
Pinecone is a specialized cloud-based vector database designed to support machine learning, AI, and other applications that require efficient storage, retrieval, and management of high-dimensional vector data.
When creating a Knowledge Box at Nuclia, you can configure it to utilize Pinecone as its vector database provider. This is especially useful for large datasets where full text search is not key on the retrieval phase.
The Nuclia-Pinecone integration allows you to leverage Nuclia's seamless APIs to implement your RAG application without worrying about scale.
Configuration
In the Nuclia dashboard's Knowledge Box creation page, you can select Use external Pinecone vector database
and enter your Pinecone API Key.
The API key will be encrypted and stored in NucliaDB's persistence layers.
You can also choose in which Pinecone serverless cluster your indexes will be created (AWS, GCP or Azure).
Nuclia will create indexes on your Pinecone account with the nuclia-
prefix.
After that, you can simply push your data to Nuclia by either utilizing the Nuclia dashboard, the CLI or directly via the API. Once Nuclia has processed your data, all the extracted vectors will be stored by Pincone and your data will be ready for search & RAG.
Upon Knowledge Box deletion, the corresponding Pinecone indexes are also deleted automatically.
Supported features
-
All data ingestion features and data types are supported.
-
The /find and /ask endpoints support querying the Pinecone indexes with most of the existing search parameters, with the exceptions mentioned on the section below. Some of the implemented features are:
- Label filtering: metadata, classification and origin label filtering.
- Date filtering
- Retrieval and answer generation on a specific document
- Minimum semantic score filtering
- All the RAG strategies are supported.
- Filter by security groups
- Search in a specific field
-
Multiple semantic models for a single Knowledge Box: allows to easily compare accuracy and results across various models for the same data. This Nuclia feature is currently in beta phase.
Limitations
-
Only the
semantic
search feature is supported. If thefeatures
parameter does not include thesemantic
option, no results will be returned -- all other options are ignored. -
Filtering by
/entities
labels is not supported and entity labels are ignored at the filtering step. -
The /suggest endpoint does not work on Pinecone Knowledge Boxes, as currently it is only suggesting text blocks based on keyword searches and entities based on graph search.
-
The highlight feature will not work as the retrieval phase is purely semantic.
-
The deprecated
/search
endpoint is not integrated with Pinecone. -
The autofilter feature is not supported, as it is based on a graph search for detected entities on the query.