gStore:
gStore是由北京大学计算机所数据管理实验室研发面向RDF知识图谱的开源图数据库系统(通常称为Triple Store)。不同于传统基于关系数据库的知识图谱数据管理方法,gStore是一种原生基于图数据模型(Native Graph Model)的RDF数据管理系统,维持了原始RDF知识图谱的图结构;其数据模型是有标签、有向的多边图,每个顶点对应着一个主体或客体。我们将面向RDF的SPARQL查询,转换为面向RDF图的子图匹配查询,利用所提出的基于图结构的索引(VS*-tree)来加速查询的性能。
gStore 拥有以下特性:
  1. gStore从图数据库角度存储和检索RDF知识图谱数据;
  2. gStore支持W3C定义的SPARQL 1.1标准,包括含有Union,OPTIONAL,FILTER和聚集函数的查询;gStore支持有效的增删改操作;
  3. gStore单机可以支持1Billion(十亿)三元组规模的RDF知识图谱的数据管理任务;
我们组准备将用户商品评论数据使用gStore存储,在此基础上开发出适合用于检索用户特性的功能,这篇博客是对gStore中用于加速查询效率的VS*-tree树进行介绍,内容来自北大邹磊教授2014 VLDB论文《gStore:a graph-based SPARQL query engine》
(博客虽然大多英文,但稍有总结概述)
VS*-tree:
VS*-tree is 是一个作用于G*上的索引结构,用于高效地完成SPARQL查询,即找到Q*在G*上的matches;
Background:
Definition of match of Q* over G*:Q*= {v1… vn},A set of n distinct vertices {u1… un} in G*;

这里写图片描述

Solution:
There is a straightforward method consisting of two steps,its first step is a classical inclusion query and the second step is a multiway join process;
Problem need to be solved:reduce the search space for inclusion query
S-tree is proposed:
  1. A height-balanced tree similar to B+ tree;
  2. Each intermediate node is formed by ORing all child signatures in S-tree;
    这里写图片描述
Problem need to be solved: S-tree cannot support the second step, which is a NP-hard;
VS*-tree(based on S-tree) is proposed:
VS*-tree is a multi-resolution summary graph based on S-tree that can be used to reduce the search space of subgraph query processing;
Properties:
  1. Each path from the root to any leaf node has the same length h, which is the height of the VS∗-tree;
  2. Each leaf node corresponds to a vertex in G∗ and has a pointer to it.
  3. … …
    … …
    这里写图片描述
The example of building “super-edge”:

这里写图片描述

Definition of summary match of Q* over G^I:这里写图片描述
Function:
Summary matches can be used to reduce the search space for subgraph search over G*;
Problem need to be solved:

The vertex encoding strategy may lead to some vertex signatures having too many 1,these
vertices can match any query vertex signature;

Solution:

We partition all of u’s neighbors into n groups {g1,…, gn} and each group gi corresponds to one instance of vertex u, denoted as u[i];
这里写图片描述
这里写图片描述

Updates to VS*-tree:
1. Inseration:
The process begins at the root of VS*-tree and iteratively chooses a child node until it reaches a suitable leaf node location where u is inserted;
Then, the summary graph at the leaf level is updated and new super edge may need to be indtoduced;
The change of leaf signature and leaf summary graph must be propagated upwards within the VS*-tree;
Rule:
The location of new node depend on both node signature & G*’s signature;

这里写图片描述

After inserting, the signature of that leaf node and super edges adjacent to it may be updated, and the change must be propagated upwards within the VS*-tree;
2. Split:
Like other height balanced trees, insertion into a node that is already full will invoke a node split;
Rule:
At each iteration, we select as the seed nodes two entities from among the B + 1. We allocate the remaining entities to these two nodes according to Equation 1.
Then, we have to update the signatures & the super edges associated with the two new nodes, and the change will be also propagated to the root of the VS∗-tree;
3. Deleation
Rule:
To delete a vertex u from VS*-tree, we find the leaf node d where u is stored, and delete u, and then adopt a bottom-up strategy to update the signature of and super edges associated with the nodes;
If some node d has less than b entries, then d is deleted and its entries are reinserted into VS*-tree;
Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐