How to create graph engine from a networkx graphΒΆ
First import networkx and other relevant modules:
.. code-block:: python
>>> import random
>>> import json
>>> import tempfile
>>> from pathlib import Path
>>> import networkx as nx
>>> import numpy as np
>>> import deepgnn.graph_engine.snark.convert as convert
>>> from deepgnn.graph_engine.snark.decoders import JsonDecoder
>>> import deepgnn.graph_engine.snark.client as client
We are going to generate a random graph with 30 clusters, each cluster contains exactly 12 nodes.
Nodes are grouped together by id, i.e. first cluster contains nodes [0-11], second has [12-23], etc.:
>>> random.seed(246)
>>> g = nx.connected_caveman_graph(30, 12)
We need to assign some features for every node to train the model and to keep things simple we are going to use random integers and node ids as values for a 2 dimensional feature vector.
>>> nodes = []
>>> data = ""
>>> for node_id in g:
... # Set weights for neighbors
... nbs = {}
... for nb in nx.neighbors(g, node_id):
... nbs[nb] = 1.
...
... # Fill detailed data for the node
... node = {
... "node_weight": 1,
... "node_id": node_id,
... "node_type": 0,
... "float_feature": {"0": [node_id, random.random()]},
... "edge": [{
... "src_id": node_id,
... "dst_id": nb,
... "edge_type": 0,
... "weight": 1.
... }
... for nb in nx.neighbors(g, node_id)],
... }
... data += json.dumps(node) + "\n"
... nodes.append(node)
We can inspect values of node features:
>>> nodes[1]["float_feature"]["0"]
[1, 0.516676816253458]
>>> working_dir = tempfile.TemporaryDirectory()
>>> raw_file = working_dir.name + "/data.json"
>>> with open(raw_file, "w+") as f:
... f.write(data)
287274
Now we can convert graph to binary data:
>>> convert.MultiWorkersConverter(
... graph_path=raw_file,
... partition_count=1,
... output_dir=working_dir.name,
... decoder=JsonDecoder,
... ).convert()
Create a client to use from the temp folder:
>>> cl = client.MemoryGraph(working_dir.name)
>>> cl.node_features(nodes=[1], features=[[0, 2]], dtype=np.float32)
array([[1. , 0.51667684]], dtype=float32)
With large graphs we might want to work with samplers to train our models:
>>> ns = client.NodeSampler(cl, types=[0])
>>> ns.sample(size=2, seed=1)
(array([ 68, 242]), array([0, 0], dtype=int32))
The first item in a tuple, [68, 242] is a list of sampled nodes and the second item is their corresponding types(all zeros). Edge samplers are very similar to the node ones:
>>> es = client.EdgeSampler(cl, types=[0])
>>> es.sample(size=2, seed=2)
(array([292, 53]), array([298, 54]), array([0, 0], dtype=int32))
The returned result is a triple of lists with source nodes, destination nodes and edge types.