First, make sure you have the following packages installed:


Let us load the package igraph which is particularly useful to manipulate graph data.


Graph manipulation

Let us load the Zachary’s Karate Club dataset.

It is composed of links between 34 members of a karate club. Given the club’s small size, each club member knew everyone else. Sociologist Wayne Zachary documented pairwise links between members who regularly interacted outside the club.

Conflit and split:

  • A conflict between the club’s president and the instructor split the club into two.

  • About half of the members followed the instructor ad the other half the president, a breakup that unveiled the ground truth, representing club’s underlying community structure.

  • Today community finding algorithms are often tested based on their ability to infer these two communities from the structure of the network before the split.

g <- igraph::read.graph(file="data/karate.gml", format="gml")

  1. Give a quick summary of the graph (plot, number of vertices, edges, id associated to each vertex).

  1. Who are the most ‘important’ nodes?

  1. Study the empirical degree distribution of the nodes. Use a Gaussian kernel for the non-parametric estimation of the degree distribution. Comment.

  1. Remove the node with \(id=1\) and illustrate the effect on the resulting network.

Graph visualisation

We now load the package sna which is particularly useful to visualise networks.


  1. Test the different visualisation algorithms of sna (look at the mode parameter). Rely on the function gplot. Note that most functions of sna work with adjacency matrices.

Graph models: the simulation part

Let us now simulate various random graph models.

  1. Simulate an undirected graph (without self-loops) with \(n=50\) nodes from an ER model with \(p=0.1\).

  1. Simulate an undirected graph (without self-loops) with \(n=50\) nodes from a SBM model with \(K=3\), the same cluster proportions, and a matrix \(\pi\) such that \(\pi_{kk}=0.9,\forall k\) and \(\pi_{kl}=0.1,\forall k \neq l\).

Graph models: analysis of real data

We now load the package blockmodels to perform inference in the SBM model.


  1. Analyse the karate data set with the spectral clustering algorithms that you implemented in last homework.

  1. Analyse the karate data set with blockmodels. Plot the ICL criterion w.r.t. the number of clusters ranging from \(1\) to \(6\).

  1. Using the cluster_edge_betweenness() and the dendPlot(, mode=“hclust”) functions from the igraph package, detect communities based on edge betweenness. Plot the results.

  1. Compare the obtained classification. Comment