Graph search

Nuclia leverages a powerful feature called knowledge graph. Either with automatically extracted or manually annotated relations, a KB stores a directed graph composed by nodes and relations. Nodes in the graph can be any kind of entity in your data (a person, a city, a concept...) but can also represent a resource, a collaborator... Relations can be arbitrary connections between entities, but also resource relations, synonym relations...

This graph can be leveraged in multiple ways in order to enrich your search experience. Here, we'll explain the graph API.

Note we define a graph to be a set of nodes connected by relations. Other graph literature may call these vertices and edges.

Graph API

The Graph API allows you to directly explore a knowledge graph in a KB. There are usually 3 use cases while you may want to query your graph:

explore nodes
explore relations
explore paths

Node and relation exploration are usually the first entry point to the graph. In both cases, we want to know which nodes or relations exist in the graph that are exactly like or similar to X.

Path exploration dives deeper into the knowledge graph and will be able to respond questions like:

given two nodes X and Y, which are the relations Z that connect them?
which pair of nodes are directly related by relation Z?

Or, more formally, which triplets of source, relation, destination exist in the graph that satisfy a triplet condition, i.e., any known or unknown pair of nodes and a relation.

Node exploration: `/graph/nodes` endpoint

Any node in the graph is composed by 3 parts:

value
type
group/family

A value is just the textual representation of the node. For example, the name of a person, of a company or a concept. Erin, Nuclia and philosophy would all be valid node values.

Each node has a type, usually it's an entity, although others like resource or user are also valid (see API reference for the full list).

Optionally, nodes can also have a group. Groups are arbitrary categories one can use to cluster nodes. For example: person, company or concept.

When querying graph nodes, you can use any of the node parts to find matches. Node values can be searched with different strategies while type are limited to a set of built-in types and subtypes are matched exactly. Let's see some examples!

A simple example would be searching for an exact node existence. Does a person named Erin exist in the graph?

{
    "query": {
        "prop": "node",
        "value": "Erin",
        "type": "entity",
        "subtype": "person"
    }
}

What if we don't know Erin's type or group? We can omit them and find any node with value Erin:

{
    "query": {
        "prop": "node",
        "value": "Erin"
    }
}

We can also omit the value and search only for a given type or group:

{
    "query": {
        "prop": "node",
        "group": "person"
    }
}

As mentioned, values can be searched using different strategies. Until now, we've used the implicit exact match:

{
    "query": {
        "prop": "node",
        "value": "Erin",
        "match": "exact"
    }
}

But we can become typo tolerant with fuzzy search:

{
    "query": {
        "prop": "node",
        "value": "Arin",
        "match": "fuzzy"
    }
}

This will return the same node Erin an any other matching similar values.

Fuzzy search is a useful tool, but can quickly lead to an excess of results, so we recommend to use carefully.

Relation exploration: `/graph/relations` endpoint

On their side, relations are composed by:

label
type

A label is the textual representation of the relation. For example: friendship, knowledge about...

Each relation has a type that classifies different relations. Usually, that will be an ENTITY relation, but other types like SYNONYM or ABOUT are also available (see API reference for the full list).

The relation API is more limited than nodes API, as relations without nodes lose context. Let's see some examples as we did before!

Is there any relation named live_in?

{
    "query": {
        "prop": "relation",
        "label": "live_in"
    }
}

Get all synonym relationships:

{
    "query": {
        "prop": "relation",
        "type": "SYNONYM"
    }
}

Path exploration: `/graph` endpoint

Once we know any combination of source and destination nodes and/or relation, we can actually explore paths between nodes.

A path is a triplet composed by source node, relation and destination node. Path queries are built from any of those known parts and the response is a set of triplets satisfying the query.

A path query where we know some information of every part would look like:

{
    "query": {
        "prop": "path",
        "source": {
            "group": "person"
        },
        "relation": {
            "label": "born_in"
        },
        "destination": {
            "value": "UK",
            "group": "place"
        }
    }
}

However, we may not know some part of it. We can skip nodes and relations:

{
    "query": {
        "prop": "path",
        "destination": {
            "value": "UK",
            "group": "place"
        }
    }
}

For simplicity, the graph API provides you with some common properties to search for. Instead of a path with only a source or destination, we can specify a source_node or destination_node respectively. For relations, we can use the relation prop as we were using it before.

Therefore, the previous query can be rewritten as:

{
    "query": {
        "prop": "destination_node",
        "value": "UK",
        "group": "place"
    }
}

Fuzzy search can also be used as before, defining the type of match:

{
    "query": {
        "prop": "destination_node",
        "value": "France",
        "group": "place",
        "match": "fuzzy"
    }
}

Undirected paths

Sometimes, we know about two nodes being connected by a relation but we don't know the direction of the relation. Path queries have a special field called undirected that can be set to search for paths in any direction.

List all friendship relations between people:

{
    "query": {
        "prop": "path",
        "source": {
            "group": "person"
        },
        "relation": {
            "label": "friend"
        },
        "destination": {
            "group": "person"
        },
        "undirected": true
    }
}

Or get all triplets related with the UK:

{
    "query": {
        "prop": "path",
        "source": {
            "value": "UK",
            "group": "place"
        },
        "undirected": true
    }
}

Similar as before, we have a shorthand for undirected paths where we only know a node but not its position. An equivalent query for the one above would be:

{
    "query": {
        "prop": "node",
        "value": "UK",
        "group": "place"
    }
}

Boolean expressions

All queries explained until now are really powerful to start exploring the graph, but don't have room for much expressivity. That's why the graph API also offers boolean expressions. All three endpoints offer and, or and not expressions in their query and can be nested as much as one wants.

Let's see a more complex example to know about any person that was born or lives in any place different than the UK:

{
    "query": {
        "and": [
            {
                "prop": "source_node",
                "group": "person"
            },
            {
                "or": [
                    {
                        "prop": "relation",
                        "label": "born_in"
                    },
                    {
                        "prop": "relation",
                        "label": "live_in"
                    }
                ]
            },
            {
                "prop": "destination_node",
                "group": "place"
            },
            {
                "not": {
                    "prop": "destination_node",
                    "value": "UK"
                }
            }
        ]
    }
}

Although boolean expressions gives us a great power, remember that paths are built from triplets of source, relation and destination and multi hop queries are not supported at the moment.

Therefore, even if we have a triplet for Erin born in UK, and Erin lives in UK, querying:

{
    "query": {
        "and": [
            {
                "prop": "relation",
                "label": "born_in"
            },
            {
                "prop": "relation",
                "label": "live_in"
            }
        ]
    }
}

won't give us any result, as there's no triplet satisfying this condition (a triplet has a single relation)

Top K

As in other search endpoints, graph API results are limited by the best K. To change the number of results returned by default, you must specify top_k:

{
    "query": {
        ...
    },
    "top_k": 100
}

Filtering

Querying the whole knowledge graph is nice, but sometimes we have too many results or we want to specify a subset of the knowledge graph to search into. As in other endpoints, graph API supports filter_expression, a boolean expression of filters to prefilter which in which fields search should be performed.

As a simple example, let's see a filter to search only in sweet recipes written in English:

{
    "query": {
        ...
    },
    "filter_expression": {
        "field": {
            "and": [
                {
                    "prop": "label",
                    "labelset": "recipes",
                    "label": "sweet"
                },
                {
                    "prop": "language",
                    "language": "en"
                }
            ]
        }
    }
}

(See filtering docs for more examples)

In addition to a field filter expression, security and show_hidden are also supported, giving you the ability to filter in or out results with certain security requirements or hidden.

There is also an special filter that can be combined with graph queries: generated. This is really useful to query the graph generated by users, processor or data augmentation tasks

{
    "query": {
        "and": [
            {
                "prop": "relation",
                "label": "live_in"
            },
            {
                "prop": "generated",
                "by": "data-augmentation"
            }
        ]
    }
}

As the filter is also a prop, it can be used in boolean expressions as any other property.

Graph integration for /find endpoint

Exploring graphs is great, but sometimes paragraphs are better. That's why Nuclia leverages graph search in /find.

There are some graph paths (entity-relation-entity) that are extracted from a specific paragraph. We can retrieve those in a /find call and merge the results with keyword and semantic, leveraging another way to find answers in your unstructured data.

Let's see a complex example and break it down to it's parts:

{
    "query": "Who is Alice?",
    "features": ["keyword", "semantic", "graph"],
    "graph_query": {
        "prop": "path",
        "source": {
            "match": "exact",
            "value": "Alice",
            "group": "person"
        },
        "undirected": true
    },
    "rank_fusion": {
        "name": "rrf",
        "boosting": {
            "keyword": 1,
            "semantic": 2,
            "graph": 0.5
        },
        "window": 50
    },
    "top_k": 20
}

First of all, we are using query for the keyword/semantic question. features includes keyword, semantic and graph. Including graph forces us to also define graph_query, which is a graph path query (identical as the ones explained before).

Reciprocal rank fusion (RRF) can also boost graph results, in this case, we want semantic being the double of keyword and graph being the half.

Finally, top_k allows to select the best 20 results.

Graph API​

Node exploration: /graph/nodes endpoint​

Relation exploration: /graph/relations endpoint​

Path exploration: /graph endpoint​

Undirected paths​

Boolean expressions​

Top K​

Filtering​

Graph integration for /find endpoint​