使用Neo4j分析《权力的游戏》

时间 2019-12-06

标签使用 neo4j neo 分析栏目游戏繁體版

原文原文链接

几个月前，数学家 Andrew Beveridge和Jie Shan在数学杂志上发表《权力的网络》，主要分析畅销小说《冰与火之歌》第三部《冰雨的风暴》中人物关系，其已经拍成电视剧《权力的游戏》系列。他们在论文中介绍了如何经过文本分析和实体提取构建人物关系的网络。紧接着，使用社交网络分析算法对人物关系网络分析找出最重要的角色；应用社区发现算法来找到人物聚类。node

#! pip install py2neo
from py2neo import Graph
graph = Graph()

安装py2neopython

!pip install py2neo --upgrade

Collecting py2neo
  Downloading py2neo-3.1.0-py2.py3-none-any.whl (140kB)
[K    100% |████████████████████████████████| 143kB 3.0MB/s 
[?25hInstalling collected packages: py2neo
  Found existing installation: py2neo 2.0.8
    Uninstalling py2neo-2.0.8:
      Successfully uninstalled py2neo-2.0.8
Successfully installed py2neo-3.1.0

Import into Neo4j

首先建立节点c，并作惟一限制性约束，c.name惟一，保证schema的完整性：git

带有标签Character的节点表明小说中的角色，用单向关系类型INTERACTS表明小说中的角色有过接触。节点属性会存储角色的名字name，两角色间接触的次数做为关系的属性：权重（weight）。github

一旦约束建立即相应的建立索引，这将有助于经过角色的名字查询的性能。做者使用Neo4j的Cypher（Cypher是一种声明式图查询语言，能表达高效查询和更新图数据库）LOAD CSV语句导入数据：算法

# 建立节点，并惟一性约束
graph.run("CREATE CONSTRAINT ON (c:Character) ASSERT c.name IS UNIQUE;")

# 导入节点，关系和关系的属性
for record in graph.run('''
LOAD CSV WITH HEADERS FROM "https://www.macalester.edu/~abeverid/data/stormofswords.csv" AS row
MERGE (src:Character {name: row.Source})
MERGE (tgt:Character {name: row.Target})
MERGE (src)-[r:INTERACTS]->(tgt)
SET r.weight = toInt(row.Weight)
RETURN count(*) AS paths_written
'''):
    print(record)

('paths_written': 352)

# match
for r in graph.run('''
MATCH p=(:Character)-[:INTERACTS]-(:Character)
RETURN p limit 10
'''):
    print(r)

<Record p=(Aemon)-[:INTERACTS {weight: 31}]->(Samwell)>
<Record p=(Aemon)<-[:INTERACTS {weight: 4}]-(Stannis)>
<Record p=(Aemon)-[:INTERACTS {weight: 5}]->(Grenn)>
<Record p=(Aemon)<-[:INTERACTS {weight: 4}]-(Robert)>
<Record p=(Aemon)<-[:INTERACTS {weight: 30}]-(Jon)>
<Record p=(Aerys)-[:INTERACTS {weight: 8}]->(Tywin)>
<Record p=(Aerys)-[:INTERACTS {weight: 5}]->(Tyrion)>
<Record p=(Aerys)-[:INTERACTS {weight: 18}]->(Jaime)>
<Record p=(Aerys)-[:INTERACTS {weight: 6}]->(Robert)>
<Record p=(Alliser)<-[:INTERACTS {weight: 15}]-(Jon)>

Analyzing the network 分析网络

Number of characters 人物数量

万事以简单开始。先看看上图上由有多少人物：数据库

# count
for record in graph.run("MATCH (c:Character) RETURN count(c) AS num"):
    print(record)

<Record num=107>

Summary statistics 概要统计

统计每一个角色接触的其它角色的数目：网络

# with min max avg stdev
for record in graph.run('''
MATCH (c:Character)-[:INTERACTS]->()
WITH c, count(*) AS num
RETURN min(num) AS min, max(num) AS max, avg(num) AS avg_characters, stdev(num) AS stdev
'''):
    print(record)

<Record min=1 max=24 avg_characters=4.957746478873241 stdev=6.2276723918750845>

Diameter of the network 网络直径

网络的直径或者测底线或者最长最短路径app

for r in graph.run('''
// Find maximum diameter of network
// maximum shortest path between two nodes
MATCH (a:Character), (b:Character) WHERE id(a) > id(b)
MATCH p=shortestPath((a)-[:INTERACTS*]-(b))
RETURN length(p) AS len, extract(x IN nodes(p) | x.name) AS path
ORDER BY len DESC LIMIT 4
'''):
    print(r)

('len': 6, 'path': ['Illyrio', 'Belwas', 'Daenerys', 'Robert', 'Tywin', 'Oberyn', 'Amory'])
('len': 6, 'path': ['Illyrio', 'Belwas', 'Daenerys', 'Robert', 'Sansa', 'Bran', 'Jojen'])
('len': 6, 'path': ['Illyrio', 'Belwas', 'Daenerys', 'Robert', 'Stannis', 'Davos', 'Shireen'])
('len': 6, 'path': ['Illyrio', 'Belwas', 'Daenerys', 'Robert', 'Sansa', 'Bran', 'Luwin'])

咱们能看到网络中有许多长度为6的路径。函数

Shortest path 最短路径

使用Cypher 的shortestPath函数找到图中任意两个角色之间的最短路径。让咱们找出凯特琳·史塔克（Catelyn Stark ）和卓戈·卡奥（Kahl Drogo）之间的最短路径：工具

for r in graph.run('''
// Shortest path from Catelyn Stark to Khal Drogo
MATCH (catelyn:Character {name: "Catelyn"}), (drogo:Character {name: "Drogo"})
MATCH p=shortestPath((catelyn)-[INTERACTS*]-(drogo))
RETURN p
'''):
    print(r)

<Record p=(Catelyn)-[:INTERACTS {weight: 8}]->(Sansa)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>

All shortest paths 全部的最短路径

联结凯特琳·史塔克（Catelyn Stark ）和卓戈·卡奥（Kahl Drogo）之间的最短路径可能还有其它路径，咱们可使用Cypher的allShortestPaths函数来查找：

for r in graph.run('''
// All shortest paths from Catelyn Stark to Khal Drogo
MATCH (catelyn:Character {name: "Catelyn"}), (drogo:Character {name: "Drogo"})
MATCH p=allShortestPaths((catelyn)-[INTERACTS*]-(drogo))
RETURN p
'''):
    print(r)

<Record p=(Catelyn)-[:INTERACTS {weight: 8}]->(Sansa)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 19}]->(Jaime)-[:INTERACTS {weight: 4}]->(Barristan)<-[:INTERACTS {weight: 11}]-(Jorah)-[:INTERACTS {weight: 6}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 19}]->(Jaime)-[:INTERACTS {weight: 4}]->(Barristan)<-[:INTERACTS {weight: 20}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 19}]->(Jaime)-[:INTERACTS {weight: 17}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 4}]->(Cersei)-[:INTERACTS {weight: 16}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 4}]->(Stannis)<-[:INTERACTS {weight: 5}]-(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 5}]->(Tyrion)<-[:INTERACTS {weight: 4}]-(Viserys)<-[:INTERACTS {weight: 8}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)-[:INTERACTS {weight: 5}]->(Tyrion)-[:INTERACTS {weight: 9}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>
<Record p=(Catelyn)<-[:INTERACTS {weight: 5}]-(Eddard)-[:INTERACTS {weight: 10}]->(Robert)<-[:INTERACTS {weight: 5}]-(Daenerys)-[:INTERACTS {weight: 18}]->(Drogo)>

Pivotal nodes 关键节点

在网络中，若是一个节点位于其它两个节点全部的最短路径上，即称为关键节点。下面咱们找出网络中全部的关键节点：

for r in graph.run('''
// Find all pivotal nodes in network
MATCH (a:Character), (b:Character)
MATCH p=allShortestPaths((a)-[:INTERACTS*]-(b)) 
WITH collect(p) AS paths, a, b
MATCH (c:Character) WHERE all(x IN paths WHERE c IN nodes(x)) AND NOT c IN [a,b]
RETURN a.name, b.name, c.name AS PivotalNode SKIP 490 LIMIT 10
'''):
    print(r)

<Record a.name='Balon' b.name='Lothar' PivotalNode='Robb'>
<Record a.name='Balon' b.name='Luwin' PivotalNode='Bran'>
<Record a.name='Balon' b.name='Luwin' PivotalNode='Robb'>
<Record a.name='Balon' b.name='Melisandre' PivotalNode='Stannis'>
<Record a.name='Balon' b.name='Missandei' PivotalNode='Daenerys'>
<Record a.name='Balon' b.name='Myrcella' PivotalNode='Tyrion'>
<Record a.name='Balon' b.name='Rattleshirt' PivotalNode='Jon'>
<Record a.name='Balon' b.name='Rickard' PivotalNode='Robb'>
<Record a.name='Balon' b.name='Rickon' PivotalNode='Robb'>
<Record a.name='Balon' b.name='Roose' PivotalNode='Robb'>

从结果表格中咱们能够看出有趣的结果：罗柏·史塔克（Robb）是卓戈·卡奥（Drogo）和拉姆塞·波顿（Ramsay）的关键节点。这意味着，全部联结卓戈·卡奥（Drogo）和拉姆塞·波顿（Ramsay）的最短路径都要通过罗柏·史塔克（Robb）。咱们能够经过可视化卓戈·卡奥（Drogo）和拉姆塞·波顿（Ramsay）之间的全部最短路径来验证：

for r in graph.run('''
MATCH (a:Character {name: "Drogo"}), (b:Character {name: "Ramsay"})
MATCH p=allShortestPaths((a)-[:INTERACTS*]-(b))
RETURN p
'''):
    print(r)

<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 5}]-(Sansa)<-[:INTERACTS {weight: 15}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 6}]-(Jorah)-[:INTERACTS {weight: 11}]->(Barristan)<-[:INTERACTS {weight: 4}]-(Jaime)<-[:INTERACTS {weight: 15}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 20}]->(Barristan)<-[:INTERACTS {weight: 4}]-(Jaime)<-[:INTERACTS {weight: 15}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 17}]-(Jaime)<-[:INTERACTS {weight: 15}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)-[:INTERACTS {weight: 5}]->(Stannis)<-[:INTERACTS {weight: 4}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 8}]->(Viserys)-[:INTERACTS {weight: 4}]->(Tyrion)<-[:INTERACTS {weight: 12}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 9}]-(Tyrion)<-[:INTERACTS {weight: 12}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 5}]-(Jon)<-[:INTERACTS {weight: 14}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 11}]-(Tywin)<-[:INTERACTS {weight: 12}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 10}]-(Eddard)-[:INTERACTS {weight: 13}]->(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>
<Record p=(Drogo)<-[:INTERACTS {weight: 18}]-(Daenerys)-[:INTERACTS {weight: 5}]->(Robert)<-[:INTERACTS {weight: 4}]-(Arya)<-[:INTERACTS {weight: 15}]-(Robb)-[:INTERACTS {weight: 4}]->(Ramsay)>

Centrality measures 中心度度量

给出网络中节点的重要性的相对度量。有许多不一样的方式来度量中心度，每种方式都表明不一样类型的“重要性”。

Degree centrality 度中心性

度中心性是最简单度量，即为某个节点在网络中的联结数。在《权力的游戏》的图中，某个角色的度中心性是指该角色接触的其余角色数。做者使用Cypher计算度中心性：

for r in graph.run('''
MATCH (c:Character)-[:INTERACTS]-()
RETURN c.name AS character, count(*) AS degree ORDER BY degree DESC LIMIT 10
'''):
    print(r)

<Record character='Tyrion' degree=36>
<Record character='Sansa' degree=26>
<Record character='Jon' degree=26>
<Record character='Robb' degree=25>
<Record character='Jaime' degree=24>
<Record character='Tywin' degree=22>
<Record character='Cersei' degree=20>
<Record character='Arya' degree=19>
<Record character='Catelyn' degree=18>
<Record character='Joffrey' degree=18>

从上面能够发现，在《权力的游戏》网络中提利昂·兰尼斯特（Tyrion）和最多的角色有接触。鉴于他的心计，咱们以为这是有道理的。

Weighted degree centrality 加权度中心性

做者存储一对角色接触的次数做为INTERACTS关系的weight属性。对该角色的INTERACTS关系的全部weight相加获得加权度中心性。做者使用Cypher计算全部角色的这个度量：

for r in graph.run('''
MATCH (c:Character)-[r:INTERACTS]-()
RETURN c.name AS character, sum(r.weight) AS weightedDegree ORDER BY weightedDegree DESC LIMIT 10
'''):
    print(r)

<Record character='Tyrion' weightedDegree=551>
<Record character='Jon' weightedDegree=442>
<Record character='Sansa' weightedDegree=383>
<Record character='Jaime' weightedDegree=372>
<Record character='Bran' weightedDegree=344>
<Record character='Robb' weightedDegree=342>
<Record character='Samwell' weightedDegree=282>
<Record character='Arya' weightedDegree=269>
<Record character='Joffrey' weightedDegree=255>
<Record character='Daenerys' weightedDegree=232>

Betweenness centrality 介数中心性

介数中心性：在网络中，一个节点的介数中心性是指其它两个节点的全部最短路径都通过这个节点，则这些全部最短路径数即为此节点的介数中心性。介数中心性是一种重要的度量，由于它能够鉴别出网络中的“信息中间人”或者网络聚类后的联结点。

for r in graph.run('''
CALL algo.betweenness.stream('Character')
YIELD nodeId, centrality
MATCH (c:Character) WHERE id(c) = nodeId
RETURN c.name AS c,centrality
ORDER BY centrality DESC limit 10;
'''):
    print(r)

<Record c='Tyrion' centrality=332.97460317460315>
<Record c='Samwell' centrality=244.6357142857143>
<Record c='Stannis' centrality=226.20476190476188>
<Record c='Robert' centrality=208.62301587301593>
<Record c='Mance' centrality=138.66666666666669>
<Record c='Jaime' centrality=119.99563492063493>
<Record c='Sandor' centrality=114.33333333333333>
<Record c='Jon' centrality=111.26666666666667>
<Record c='Janos' centrality=90.65>
<Record c='Aemon' centrality=64.59761904761905>

Closeness centrality 紧度中心性

紧度中心性是指到网络中全部其余角色的平均距离的倒数。在图中，具备高紧度中心性的节点在聚类社区之间被高度联结，但在社区以外不必定是高度联结的。

cql = '''CALL algo.closeness.stream('Character', 'INTERACTS')
YIELD nodeId, centrality
MATCH (c:Character) WHERE id(c) = nodeId
RETURN c.name AS c,centrality
ORDER BY centrality DESC limit 10;'''
for r in graph.run(cql):
    print(r)

<Record c='Tyrion' centrality=0.5120772946859904>
<Record c='Sansa' centrality=0.5096153846153846>
<Record c='Robert' centrality=0.5>
<Record c='Robb' centrality=0.48847926267281105>
<Record c='Arya' centrality=0.48623853211009177>
<Record c='Jon' centrality=0.4796380090497738>
<Record c='Jaime' centrality=0.4796380090497738>
<Record c='Stannis' centrality=0.4796380090497738>
<Record c='Tywin' centrality=0.4690265486725664>
<Record c='Eddard' centrality=0.4608695652173913>

Using python-igraph 使用python-igraph

Neo4j与其它工具（好比，R和Python数据科学工具）完美结合。咱们继续使用apoc运行 PageRank和社区发现（community detection）算法。这里接着使用python-igraph计算分析。Python-igraph移植自R的igraph图形分析库。使用pip install python-igraph安装它。

Building an igraph instance from Neo4j 构建一个igraph实例

为了在《权力的游戏》的数据的图分析中使用igraph，首先须要从Neo4j拉取数据，用Python创建igraph实例。做者使用 Neo4j 的Python驱动库py2neo。咱们能直接传入Py2neo查询结果对象到igraph的TupleList构造器，建立igraph实例：

#! pip install python-igraph
from igraph import Graph as IGraph

query = '''
MATCH (c1:Character)-[r:INTERACTS]->(c2:Character)
RETURN c1.name, c2.name, r.weight AS weight
'''
# 从元组列表表示形式构造一个图
ig = IGraph.TupleList(graph.run(query), weights=True)

ig

<igraph.Graph at 0x2089631d68>

如今有了igraph对象，能够运行igraph实现的各类图算法了。

PageRank

PageRank算法源自Google的网页排名。它是一种特征向量中心性(eigenvector centrality)算法。

在igraph实例中运行PageRank算法，而后把结果写回Neo4j，在角色节点建立一个pagerank属性存储igraph计算的值：

PageRank

# Calculates the Google PageRank values of a graph.
pg = ig.pagerank()

pgvs = []
# ig.vs:图的顶点序列
for p in zip(ig.vs, pg):
    pgvs.append({"name": p[0]["name"], "pg": p[1]})
print(pgvs[:5])

write_clusters_query = '''
UNWIND {nodes} AS n
MATCH (c:Character) WHERE c.name = n.name
SET c.pagerank = n.pg
'''

graph.run(write_clusters_query, nodes=pgvs)

[{'name': 'Stannis', 'pg': 0.018020131765195593}, {'name': 'Aemon', 'pg': 0.007328980991947571}, {'name': 'Robert', 'pg': 0.022292016521362857}, {'name': 'Jon', 'pg': 0.035828696691635555}, {'name': 'Alliser', 'pg': 0.005162125869510499}]





<py2neo.database.Cursor at 0x208962d7f0>

如今能够在Neo4j的图中查询最高PageRank值的节点：

for r in graph.run('''
MATCH (n:Character)
RETURN n.name AS name, n.pagerank AS pagerank ORDER BY pagerank DESC LIMIT 10
'''):
    print(r)

('name': 'Tyrion', 'pagerank': 0.042884981999963316)
('name': 'Jon', 'pagerank': 0.03582869669163558)
('name': 'Robb', 'pagerank': 0.03017114665594764)
('name': 'Sansa', 'pagerank': 0.030009716660108578)
('name': 'Daenerys', 'pagerank': 0.02881425425830273)
('name': 'Jaime', 'pagerank': 0.028727587587471206)
('name': 'Tywin', 'pagerank': 0.02570016262642541)
('name': 'Robert', 'pagerank': 0.022292016521362864)
('name': 'Cersei', 'pagerank': 0.022287327589773507)
('name': 'Arya', 'pagerank': 0.022050209663844467)

Community detection

社区发现算法用来找出图中的社区聚类。做者使用igraph实现的随机游走算法（ walktrap）来找到在社区中频繁有接触的角色社区，在社区以外角色不怎么接触。

在igraph中运行随机游走的社区发现算法，而后把社区发现的结果导入Neo4j，其中每一个角色所属的社区用一个整数来表示：

clusters = IGraph.community_walktrap(ig, weights="weight").as_clustering()

nodes = [{"name": node["name"]} for node in ig.vs]
for node in nodes:
    idx = ig.vs.find(name=node["name"]).index
    node["community"] = clusters.membership[idx]

print(nodes[:5])

write_clusters_query = '''
UNWIND {nodes} AS n
MATCH (c:Character) WHERE c.name = n.name
SET c.community = toInt(n.community)
'''

graph.run(write_clusters_query, nodes=nodes)

[{'name': 'Stannis', 'community': 0}, {'name': 'Aemon', 'community': 1}, {'name': 'Robert', 'community': 2}, {'name': 'Jon', 'community': 1}, {'name': 'Alliser', 'community': 1}]





<py2neo.database.Cursor at 0x208962ae48>

咱们能在Neo4j中查询有多少个社区以及每一个社区的成员数：

for r in graph.run('''
MATCH (c:Character)
WITH c.community AS cluster, collect(c.name) AS  members
RETURN cluster, members ORDER BY cluster ASC
'''):
    print(r)

<Record cluster=0 members=['Davos', 'Melisandre', 'Shireen', 'Stannis', 'Cressen', 'Salladhor']>
<Record cluster=1 members=['Aemon', 'Alliser', 'Craster', 'Eddison', 'Gilly', 'Janos', 'Jon', 'Mance', 'Rattleshirt', 'Samwell', 'Val', 'Ygritte', 'Grenn', 'Karl', 'Bowen', 'Dalla', 'Orell', 'Qhorin', 'Styr']>
<Record cluster=2 members=['Aerys', 'Amory', 'Balon', 'Brienne', 'Bronn', 'Cersei', 'Gregor', 'Jaime', 'Joffrey', 'Jon Arryn', 'Kevan', 'Loras', 'Lysa', 'Meryn', 'Myrcella', 'Oberyn', 'Podrick', 'Renly', 'Robert', 'Robert Arryn', 'Sansa', 'Shae', 'Tommen', 'Tyrion', 'Tywin', 'Varys', 'Walton', 'Petyr', 'Elia', 'Ilyn', 'Pycelle', 'Qyburn', 'Margaery', 'Olenna', 'Marillion', 'Ellaria', 'Mace', 'Chataya', 'Doran']>
<Record cluster=3 members=['Arya', 'Beric', 'Eddard', 'Gendry', 'Sandor', 'Anguy', 'Thoros']>
<Record cluster=4 members=['Brynden', 'Catelyn', 'Edmure', 'Hoster', 'Lothar', 'Rickard', 'Robb', 'Roose', 'Walder', 'Jeyne', 'Roslin', 'Ramsay']>
<Record cluster=5 members=['Belwas', 'Daario', 'Daenerys', 'Irri', 'Jorah', 'Missandei', 'Rhaegar', 'Viserys', 'Barristan', 'Illyrio', 'Drogo', 'Aegon', 'Kraznys', 'Rakharo', 'Worm']>
<Record cluster=6 members=['Bran', 'Hodor', 'Jojen', 'Luwin', 'Meera', 'Rickon', 'Nan', 'Theon']>
<Record cluster=7 members=['Lancel']>

Visualization

See neovis.js