.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/models/1_gnn/6_line_graph.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_models_1_gnn_6_line_graph.py: .. _model-line-graph: Line Graph Neural Network ========================= **Author**: `Qi Huang `_, Yu Gai, `Minjie Wang `_, Zheng Zhang .. warning:: The tutorial aims at gaining insights into the paper, with code as a mean of explanation. The implementation thus is NOT optimized for running efficiency. For recommended implementation, please refer to the `official examples `_. .. GENERATED FROM PYTHON SOURCE LINES 20-87 In this tutorial, you learn how to solve community detection tasks by implementing a line graph neural network (LGNN). Community detection, or graph clustering, consists of partitioning the vertices in a graph into clusters in which nodes are more similar to one another. In the :doc:`Graph convolutinal network tutorial <1_gcn>`, you learned how to classify the nodes of an input graph in a semi-supervised setting. You used a graph convolutional neural network (GCN) as an embedding mechanism for graph features. To generalize a graph neural network (GNN) into supervised community detection, a line-graph based variation of GNN is introduced in the research paper `Supervised Community Detection with Line Graph Neural Networks `__. One of the highlights of the model is to augment the straightforward GNN architecture so that it operates on a line graph of edge adjacencies, defined with a non-backtracking operator. A line graph neural network (LGNN) shows how DGL can implement an advanced graph algorithm by mixing basic tensor operations, sparse-matrix multiplication, and message- passing APIs. In the following sections, you learn about community detection, line graphs, LGNN, and its implementation. Supervised community detection task with the Cora dataset -------------------------------------------- Community detection ~~~~~~~~~~~~~~~~~~~~ In a community detection task, you cluster similar nodes instead of labeling them. The node similarity is typically described as having higher inner density within each cluster. What's the difference between community detection and node classification? Comparing to node classification, community detection focuses on retrieving cluster information in the graph, rather than assigning a specific label to a node. For example, as long as a node is clustered with its community members, it doesn't matter whether the node is assigned as "community A", or "community B", while assigning all "great movies" to label "bad movies" will be a disaster in a movie network classification task. What's the difference then, between a community detection algorithm and other clustering algorithm such as k-means? Community detection algorithm operates on graph-structured data. Comparing to k-means, community detection leverages graph structure, instead of simply clustering nodes based on their features. Cora dataset ~~~~~ To be consistent with the GCN tutorial, you use the `Cora dataset `__ to illustrate a simple community detection task. Cora is a scientific publication dataset, with 2708 papers belonging to seven different machine learning fields. Here, you formulate Cora as a directed graph, with each node being a paper, and each edge being a citation link (A->B means A cites B). Here is a visualization of the whole Cora dataset. .. figure:: https://i.imgur.com/X404Byc.png :alt: cora :height: 400px :width: 500px :align: center Cora naturally contains seven classes, and statistics below show that each class does satisfy our assumption of community, i.e. nodes of same class class have higher connection probability among them than with nodes of different class. The following code snippet verifies that there are more intra-class edges than inter-class. .. GENERATED FROM PYTHON SOURCE LINES 88-115 .. code-block:: Python import os os.environ["DGLBACKEND"] = "pytorch" import dgl import torch import torch as th import torch.nn as nn import torch.nn.functional as F from dgl.data import citation_graph as citegrh data = citegrh.load_cora() G = data[0] labels = th.tensor(G.ndata["label"]) # find all the nodes labeled with class 0 label0_nodes = th.nonzero(labels == 0, as_tuple=False).squeeze() # find all the edges pointing to class 0 nodes src, _ = G.in_edges(label0_nodes) src_labels = labels[src] # find all the edges whose both endpoints are in class 0 intra_src = th.nonzero(src_labels == 0, as_tuple=False) print("Intra-class edges percent: %.4f" % (len(intra_src) / len(src_labels))) import matplotlib.pyplot as plt .. rst-class:: sphx-glr-script-out .. code-block:: none NumNodes: 2708 NumEdges: 10556 NumFeats: 1433 NumClasses: 7 NumTrainingSamples: 140 NumValidationSamples: 500 NumTestSamples: 1000 Done loading data from cached files. /dgl/tutorials/models/1_gnn/6_line_graph.py:102: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). labels = th.tensor(G.ndata["label"]) Intra-class edges percent: 0.6994 .. GENERATED FROM PYTHON SOURCE LINES 116-130 Binary community subgraph from Cora with a test dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Without loss of generality, in this tutorial you limit the scope of the task to binary community detection. .. note:: To create a practice binary-community dataset from Cora, first extract all two-class pairs from the original Cora seven classes. For each pair, you treat each class as one community, and find the largest subgraph that at least contains one cross-community edge as the training example. As a result, there are a total of 21 training samples in this small dataset. With the following code, you can visualize one of the training samples and its community structure. .. GENERATED FROM PYTHON SOURCE LINES 130-158 .. code-block:: Python import networkx as nx train_set = dgl.data.CoraBinary() G1, pmpd1, label1 = train_set[1] nx_G1 = G1.to_networkx() def visualize(labels, g): pos = nx.spring_layout(g, seed=1) plt.figure(figsize=(8, 8)) plt.axis("off") nx.draw_networkx( g, pos=pos, node_size=50, cmap=plt.get_cmap("coolwarm"), node_color=labels, edge_color="k", arrows=False, width=0.5, style="dotted", with_labels=False, ) visualize(label1, nx_G1) .. image-sg:: /tutorials/models/1_gnn/images/sphx_glr_6_line_graph_001.png :alt: 6 line graph :srcset: /tutorials/models/1_gnn/images/sphx_glr_6_line_graph_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading /root/.dgl/cora_binary.zip from https://data.dgl.ai/dataset/cora_binary.zip... /root/.dgl/cora_binary.zip: 0%| | 0.00/373k [00:00`_. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 22.600 seconds) .. _sphx_glr_download_tutorials_models_1_gnn_6_line_graph.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 6_line_graph.ipynb <6_line_graph.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 6_line_graph.py <6_line_graph.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 6_line_graph.zip <6_line_graph.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_