Skip to main content

import Badge from '@site/src/components/Badge';

How to query a Knowledge Box

Nuclia supports 2 different search endpoints:

  • /search: returns several result sets according the different search techniques (full-text, fuzzy, semantic).
  • /find: returns a single result set where all different results are merged into a hierarchical structure.

Both endpoints support the same query parameters.

Search parameters

Query

Simple query

A simple text search can be performed using a plain text value.

Example:

This query would return results containing the words little and prince:

Little Prince

By putting words into quotes you can search for an exact match:

"Little Prince"

In this case, you will only get results containing the word sequence little prince.

Using the minus sign - in front of a word you can exclude a word from the search:

Little Prince -sheep

This query would return results containing the words little and prince but not sheep.

Filters

The filters parameter allows you to filter the results depending on the value of different properties provided on the resource.

The following attributes are supported:

  • /origin.tags: tags defined in the resource's origin property Example: /origin.tags/blue, /origin.tags/green
  • /classification.labels: labels: /classification.labels/{labelset}/{label} Example: /classification.labels/movie-genre/science-fiction
  • /icon: mime type of resource Example: /icon/application/pdf or /icon/movie/mp4
  • /metadata.status: processing status Example: /metadata.status/PROCESSED, /metadata.status/PENDING or /metadata.status/ERROR
  • /entities: resource entities: /entities/{entity-type}/{entity-id} Example: /entities/CITY/Barcelona
  • /metadata.language: primary language of the document Example: /metadata.language/ca for catalan language
  • /metadata.languages: all other detected languages Example: /metadata.languages/tr for turkish language
  • /field: keyword fields values /field/fieldname/value Example: /field/countries/Slovenia
  • /field-values: flattened keyword field values /field-values/{value} Example: /field-values/Slovenia
  • /origin.metadata: metadata provided by the user Example: /origin.metadata/fieldname/value

Examples:

  • To retrieve PNG images only, use:

    filters=/icon/image/png
  • To retrieve results in which the principal language is Italian, use:

    filters=/metadata.language/it
  • To retrieve results referring to the UNESCO organization, use:

    filters=/entities/ORG/UNESCO

Filters can be combined by repeating the filters parameter. This example will retrieve results which are PDF and which are referring to the UNESCO organization:

filters=/icon/application/pdf&filters=/entities/ORG/UNESCO

Date filtering

You can filter on the creation date using:

  • range_creation_start
  • range_creation_end

Examples:

  • To get all resources created between 2023-01-01 and 2023-12-31:

    range_creation_start=2023-01-01T00:00:00.000Z&range_creation_end=2023-12-31T23:59:59.000Z
  • To get all resources created after 2023-01-01:

    range_creation_start=2023-01-01T00:00:00.000Z

Filtering will be based on the origin.created value if provided in the resource, otherwise it will default to the resource creation date (created).

note

Please note: all resources created before 2023-11-02 will have to be reprocessed for origin.created to be filterable.

Similarly, you can filter on the modification date using:

  • range_modification_start
  • range_modification_end

Search in a specific field

To restrict the search to a specific field you can use the field parameter. It supports different field types:

  • a: generic fields (= basic attributes, like title or summary)
  • t: text fields
  • f: file fields
  • u: link fields

Example:

fields=a/title

To search in several fields, the parameter can be repeated:

fields=a/title&fields=a/summary

Regarding content fields, when used through the resource /search endpoint it allows you to restrict the search to one piece of content only, and when used through the main /search endpoint it allows you to restrict the search to all content having a given id in all resources.

Result options

Features

A search query can be executed against different targets. The target is defined by the features parameter which supports 4 values:

  • document: the query is executed as full-text search against all resource texts (including attributes like title or summary, and all content fields). The result will contain fulltext (listing text sequences matching the query) and resources (listing resources matching the query).
  • paragraph: the query is executed as fuzzy search against all text block texts. The result will contain paragraphs (listing text blocks matching the query) and resources (listing resources containing these text blocks).
  • vector: the query is executed as semantic search against all resource texts. The result will contain sentences (listing sentences semantically close to the query) and resources (listing resources containing these sentences).
  • relations: the query is executed as graph search against all resource entities. The result will contain resources (listing resources related to the entities identified in the query).

These features can be combined by repeating the features parameter:

features=document&features=vector

Facets

By using the faceted parameter, you will get a facets attribute in paragraphs, sentences and fulltext.

This parameter takes on the same values as the filters parameter.

Examples:

  • To get the total amount of matches for each image file type (like jpg, png, gif, etc.), use:

    faceted=/icon/image
  • To get the total amount of matches for each language (like en, it, fr, etc.), use:

    faceted=/metadata.language

Highlight matching words

By setting the split parameter to true, you will get the start and end positions of each matching word in text blocks and fulltext results.

If you additionally set the highlight parameter to true, the matching words are enclosed into <mark> tags.

How to call the search endpoint

API

To search in all resources, the search endpoints are:

https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/search
https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/find

Search endpoints can be called with a GET or a POST request.

A typical curl command to call the search endpoint is:

https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/search?query=Batman&features=document&features=paragraph

If your Knowledge Box is not public, you must provide the X-NUCLIA-SERVICEACCOUNT header with an API token or an Authorization header.

To search in a specific resource, the search endpoint path is:

https://<zone>.nuclia.cloud/api/v1/kb/<your-knowledge-box-id>/resource/<resource-id>/search

Reference documentation

The Nuclia API documentation is available here.