Neo4j数据库GDS算法演示
Neo4j的GDS环境搭建及常见的图算法使用示例。
- Neo4j Server及GDS安装
下载neo4j-community-4.4.16.zip和jdk11的zip包(必须是JDK11,其它版本不行)
下载GDS相应的jar包2.2.7版本
1.Neo4j Server和JDK安装
解压neo4j-community-4.4.16.zip到neo4j的安装目录
解压jdk11的zip包,设置JAVA_HOME(注意可能会与本机已有的JDK冲突,可以在启动neo4j后改回原JAVA_HOME设置).
1.1 GDS安装
将neo4j-graph-data-science-2.2.7.jar拷到{NEO4J_HOME}/plugins目录下,并修改$NEO4J_HOME/conf/neo4j.conf:
#此配置项是必要的,因为GDS库要访问Neo4j的底层组件以实现性能最大化。
dbms.security.procedures.unrestricted=gds.*
#检查$NEO4J_HOME/conf/neo4j.conf文件中是否启用了allowlist过程,并在必要时添加GDS库
dbms.security.procedures.allowlist=gds.*
此时完成安装,启动neo4j server,打开终端并切换到{NEO4J_HOME}/bin目录下,执行命令:
neo4j.bat console
2.数据查询
2.1得到示例的演示步骤
在neo4j中执行:
:play https://guides.neo4j.com/airport-routes/index.html
得到如下:
2.1.1导入示例数据
首先将示例数据文件airport-node-list.csv和iroutes-edges.csv放到neo4j安装目录下的imports目录下。
2.1.1.1创建数据库索引
CREATE CONSTRAINT airports IF NOT EXISTS ON (a:Airport) ASSERT a.iata IS UNIQUE;
CREATE CONSTRAINT cities IF NOT EXISTS ON (c:City) ASSERT c.name IS UNIQUE;
CREATE CONSTRAINT regions IF NOT EXISTS ON (r:Region) ASSERT r.name IS UNIQUE;
CREATE CONSTRAINT countries IF NOT EXISTS ON (c:Country) ASSERT c.code IS UNIQUE;
CREATE CONSTRAINT continents IF NOT EXISTS ON (c:Continent) ASSERT c.code IS UNIQUE;
CREATE INDEX locations IF NOT EXISTS FOR (air:Airport) ON (air.location);
2.1.1.2导入节点数据
WITH
'file:///airport-node-list.csv'
AS url
LOAD CSV WITH HEADERS FROM url AS row
MERGE (a:Airport {iata: row.iata})
MERGE (ci:City {name: row.city})
MERGE (r:Region {name: row.region})
MERGE (co:Country {code: row.country})
MERGE (con:Continent {name: row.continent})
MERGE (a)-[:IN_CITY]->(ci)
MERGE (a)-[:IN_COUNTRY]->(co)
MERGE (ci)-[:IN_COUNTRY]->(co)
MERGE (r)-[:IN_COUNTRY]->(co)
MERGE (a)-[:IN_REGION]->(r)
MERGE (ci)-[:IN_REGION]->(r)
MERGE (a)-[:ON_CONTINENT]->(con)
MERGE (ci)-[:ON_CONTINENT]->(con)
MERGE (co)-[:ON_CONTINENT]->(con)
MERGE (r)-[:ON_CONTINENT]->(con)
SET a.id = row.id,
a.icao = row.icao,
a.city = row.city,
a.descr = row.descr,
a.runways = toInteger(row.runways),
a.longest = toInteger(row.longest),
a.altitude = toInteger(row.altitude),
a.location = point({latitude: toFloat(row.lat), longitude: toFloat(row.lon)});
2.1.1.3导入关系数据
LOAD CSV WITH HEADERS FROM 'file:///iroutes-edges.csv' AS row
MATCH (source:Airport {iata: row.src})
MATCH (target:Airport {iata: row.dest})
MERGE (source)-[r:HAS_ROUTE]->(target)
ON CREATE SET r.distance = toInteger(row.dist);
2.1.1.4查看导入结果
CALL db.schema.visualization()
2.1.2创建图投影
执行任何GDS算法的第一步是在用户定义的名称下创建图投影(也称为内存中图)。图投影以用户定义的名称存储在图目录中,是我们的完整图的子集,用于通过GDS算法计算结果。它们的使用使GDS能够快速有效地进行计算。在创建这些投影时,图元素的性质可能以以下方式改变:
关系的方向可能会改变
节点标签和关系类型可以重命名
并行关系可以被聚合
本地投影为创建图形投影提供了最快的性能。它们接受3个强制参数:graphName、nodeProjection和relationshipProjection。还有一些可选的配置参数可用于进一步配置图形。一般来说,创建原生投影的语法是:
CALL gds.graph.project(
graphName: String,
nodeProjection: String or List or Map,
relationshipProjection: String or List or Map,
configuration: Map
)
YIELD
graphName: String,
nodeProjection: Map,
nodeCount: Integer,
relationshipProjection: Map,
relationshipCount: Integer,
projectMillis: Integer
2.1.2.1创建图投影routers
CALL gds.graph.project(
'routes',
'Airport',
'HAS_ROUTE'
)
YIELD
graphName, nodeProjection, nodeCount, relationshipProjection, relationshipCount
2.1.2.2查询创建的结果:
CALL gds.graph.list('routes')
2.2 示例查询算法
通用算法语法:
CALL gds[.<tier>].<algorithm>.<execution-mode>[.<estimate>](
graphName: String,
configuration: Map
)
将利用之前编写的路由图投影来计算。如果您没有创建这个图形投影或者已经删除了这个图形投影,您将需要重新创建它。尝试重新创建具有相同名称的图形将导致以下错误:
Failed to invoke procedure `gds.graph.project`: Caused by: java.lang.IllegalArgumentException: A graph with name 'routes' already exists.
以下是常见的算法:
PageRank
CALL gds.pageRank.stream('routes')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS n, score AS pageRank
RETURN n.iata AS iata, n.descr AS description, pageRank
ORDER BY pageRank DESC, iata ASC
Community (cluster) detection via Louvain Modularity
CALL gds.louvain.stream('routes')
YIELD nodeId, communityId
WITH gds.util.asNode(nodeId) AS n, communityId
RETURN
communityId,
SIZE(COLLECT(n)) AS numberOfAirports,
COLLECT(DISTINCT n.city) AS cities
ORDER BY numberOfAirports DESC, communityId;
Node similarity
CALL gds.nodeSimilarity.stream('routes')
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2) AS n2, similarity
RETURN
n1.iata AS iata,
n1.city AS city,
COLLECT({iata:n2.iata, city:n2.city, similarityScore: similarity}) AS similarAirports
ORDER BY city LIMIT 20
Node similarity: topN and bottomN
CALL gds.nodeSimilarity.stream(
'routes',
{
topK: 1,
topN: 10
}
)
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2) AS n2, similarity AS similarityScore
RETURN
n1.iata AS iata,
n1.city AS city,
{iata:n2.iata, city:n2.city} AS similarAirport,
similarityScore
ORDER BY city
Node similarity: degree and similarity cutoff
CALL gds.nodeSimilarity.stream(
'routes',
{
degreeCutoff: 100
}
)
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2) AS n2, similarity
RETURN
n1.iata AS iata,
n1.city AS city,
COLLECT({iata:n2.iata, city:n2.city, similarityScore: similarity}) AS similarAirports
ORDER BY city LIMIT 20
Path Finding---Dijkstra’s algorithm: calculating the shortest path given a source node
像我们探索过的所有其他算法类别一样,寻径有几种可能的方法。一般来说,寻径的目的是寻找两个或多个节点之间的最短路径。在我们的机场航路图中,这将帮助我们确定需要哪些机场连接来最小化总体飞行距离。
在前面的例子中,我们没有考虑机场之间的航线距离。然而,在本例中,我们将使用路径距离作为Dijkstra中的权重,从而得到的最短路径反映物理距离最短的路径。要做到这一点,我们必须首先将路线距离作为关系属性包含在我们的图投影中,如下所示:
CALL gds.graph.project(
'routes-weighted',
'Airport',
'HAS_ROUTE',
{
relationshipProperties: 'distance'
}
) YIELD
graphName, nodeProjection, nodeCount, relationshipProjection, relationshipCount
查询机场DEN到MLE之间的最小距离:
MATCH (source:Airport {iata: 'DEN'}), (target:Airport {iata: 'MLE'})
CALL gds.shortestPath.dijkstra.stream('routes-weighted', {
sourceNode: source,
targetNode: target,
relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN
index,
gds.util.asNode(sourceNode).iata AS sourceNodeName,
gds.util.asNode(targetNode).iata AS targetNodeName,
totalCost,
[nodeId IN nodeIds | gds.util.asNode(nodeId).iata] AS nodeNames,
costs,
nodes(path) as path
ORDER BY index
写在最后,并非所有的GDS算法都能在每种类型的图投影上运行。有些算法更喜欢同构图而不是异构图。其他的只能在无向图上正常工作。有些人无法处理关系权重。对于所选择的算法,您应该始终查阅API文档,以验证您的图需要什么。
3 清理演示环境
CALL gds.graph.drop('routes');
CALL gds.graph.drop('routes-weighted');
更多推荐
所有评论(0)