Graph search
Nuclia leverages a powerful feature called knowledge graph. Either with automatically extracted or manually annotated relations, a KB stores a directed graph composed by nodes and relations. Nodes in the graph can be any kind of entity in your data (a person, a city, a concept...) but can also represent a resource, a collaborator... Relations can be arbitrary connections between entities, but also resource relations, synonym relations...
This graph can be leveraged in multiple ways in order to enrich your search experience. Here, we'll explain the graph API.
Note we define a graph to be a set of nodes connected by relations. Other graph literature may call these vertices and edges.
Graph API
The Graph API allows you to directly explore a knowledge graph in a KB. There are usually 3 use cases while you may want to query your graph:
- explore nodes
- explore relations
- explore paths
Node and relation exploration are usually the first entry point to the graph. In both cases, we want to know which nodes or relations exist in the graph that are exactly like or similar to X.
Path exploration dives deeper into the knowledge graph and will be able to respond questions like:
- given two nodes X and Y, which are the relations Z that connect them?
- which pair of nodes are directly related by relation Z?
Or, more formally, which triplets of source, relation, destination exist in the graph that satisfy a triplet condition, i.e., any known or unknown pair of nodes and a relation.
Node exploration: /graph/nodes
endpoint
Any node in the graph is composed by 3 parts:
- value
- type
- group/family
A value is just the textual representation of the node. For example, the name of a person, of a company or a concept. Erin, Nuclia and philosophy would all be valid node values.
Each node has a type, usually it's an entity, although others like resource or user are also valid (see API reference for the full list).
Optionally, nodes can also have a group. Groups are arbitrary categories one can use to cluster nodes. For example: person, company or concept.
When querying graph nodes, you can use any of the node parts to find matches. Node values can be searched with different strategies while type are limited to a set of built-in types and subtypes are matched exactly. Let's see some examples!
A simple example would be searching for an exact node existence. Does a person named Erin exist in the graph?
{
"query": {
"prop": "node",
"value": "Erin",
"type": "entity",
"subtype": "person"
}
}
What if we don't know Erin's type or group? We can omit them and find any node with value Erin:
{
"query": {
"prop": "node",
"value": "Erin"
}
}
We can also omit the value and search only for a given type or group:
{
"query": {
"prop": "node",
"group": "person"
}
}
As mentioned, values can be searched using different strategies. Until now, we've used the implicit exact match:
{
"query": {
"prop": "node",
"value": "Erin",
"match": "exact"
}
}
But we can become typo tolerant with fuzzy search:
{
"query": {
"prop": "node",
"value": "Arin",
"match": "fuzzy"
}
}
This will return the same node Erin an any other matching similar values.
Fuzzy search is a useful tool, but can quickly lead to an excess of results, so we recommend to use carefully.
Relation exploration: /graph/relations
endpoint
On their side, relations are composed by:
- label
- type
A label is the textual representation of the relation. For example: friendship, knowledge about...
Each relation has a type that classifies different relations. Usually, that will be an ENTITY relation, but other types like SYNONYM or ABOUT are also available (see API reference for the full list).
The relation API is more limited than nodes API, as relations without nodes lose context. Let's see some examples as we did before!
Is there any relation named live_in
?
{
"query": {
"prop": "relation",
"label": "live_in"
}
}
Get all synonym relationships:
{
"query": {
"prop": "relation",
"type": "SYNONYM"
}
}
Path exploration: /graph
endpoint
Once we know any combination of source and destination nodes and/or relation, we can actually explore paths between nodes.
A path is a triplet composed by source node, relation and destination node. Path queries are built from any of those known parts and the response is a set of triplets satisfying the query.
A path query where we know some information of every part would look like:
{
"query": {
"prop": "path",
"source": {
"group": "person",
},
"relation": {
"label": "born_in"
}
"destination": {
"value": "UK",
"group": "place",
},
}
}
However, we may not know some part of it. We can skip nodes and relations:
{
"query": {
"prop": "path",
"destination": {
"value": "UK",
"group": "place",
},
}
}
For simplicity, the graph API provides you with some common properties to search
for. Instead of a path with only a source
or destination
, we can specify a
source_node
or destination_node
respectively. For relations, we can use the
relation
prop as we were using it before.
Therefore, the previous query can be rewritten as:
{
"query": {
"prop": "destination_node",
"value": "UK",
"group": "place",
}
}
Fuzzy search can also be used as before, defining the type of match:
{
"query": {
"prop": "destination_node",
"value": "France",
"group": "place",
"match": "fuzzy"
}
}
Undirected paths
Sometimes, we know about two nodes being connected by a relation but we don't
know the direction of the relation. Path queries have a special field called
undirected
that can be set to search for paths in any direction.
List all friendship relations between people:
{
"query": {
"prop": "path",
"source": {
"group": "person",
},
"relation": {
"label": "friend"
}
"destination": {
"group": "person",
},
"undirected": true
}
}
Or get all triplets related with the UK:
{
"query": {
"prop": "path",
"source": {
"value": "UK",
"group": "place"
},
"undirected": true
}
}
Similar as before, we have a shorthand for undirected paths where we only know a node but not its position. An equivalent query for the one above would be:
{
"query": {
"prop": "node",
"value": "UK",
"group": "place",
}
}
Boolean expressions
All queries explained until now are really powerful to start exploring the
graph, but don't have room for much expressivity. That's why the graph API also
offers boolean expressions. All three endpoints offer and
, or
and not
expressions in their query and can be nested as much as one wants.
Let's see a more complex example to know about any person that was born or lives in any place different than the UK:
{
"query": {
"and": [
{
"prop": "source_node",
"group": "person"
},
{
"or": [
{
"prop": "relation",
"label": "born_in",
},
{
"prop": "relation",
"label": "live_in",
}
]
},
{
"prop": "destination_node",
"group": "place"
},
{
"not": {
"prop": "destination_node",
"value": "UK",
}
}
]
}
}
Although boolean expressions gives us a great power, remember that paths are built from triplets of source, relation and destination and multi hop queries are not supported at the moment.
Therefore, even if we have a triplet for Erin born in UK, and Erin lives in UK, querying:
{
"query": {
"and": [
{
"prop": "relation",
"label": "born_in",
},
{
"prop": "relation",
"label": "live_in",
}
]
}
}
won't give us any result, as there's no triplet satisfying this condition (a triplet has a single relation)
Top K
As in other search endpoints, graph API results are limited by the best K
. To
change the number of results returned by default, you must specify top_k
:
{
"query": {
...
},
"top_k": 100
}
Filtering
Querying the whole knowledge graph is nice, but sometimes we have too many
results or we want to specify a subset of the knowledge graph to search into.
As in other endpoints, graph API supports filter_expression
, a boolean
expression of filters to prefilter which in which fields search should be performed.
As a simple example, let's see a filter to search only in sweet recipes written in English:
{
"query": {
...
},
"filter_expression": {
"field": {
"and": [
{
"prop": "label",
"labelset": "recipes",
"label": "sweet"
},
{
"prop": "language",
"language": "en"
}
]
}
}
}
(See filtering docs for more examples)
In addition to a field filter expression, security
and show_hidden
are also
supported, giving you the ability to filter in or out results with certain
security requirements or hidden.
There is also an special filter that can be combined with graph queries:
generated
. This is really useful to query the graph generated by users,
processor or data augmentation tasks
{
"query": {
"and": [
{
"prop": "relation",
"label": "live_in"
},
{
"prop": "generated",
"by": "data-augmentation"
}
]
}
}
As the filter is also a prop
, it can be used in boolean expressions as any
other property.