This practical aims to provide a quick overview of sparse Gaussian Graphical Models (GGM) and their use in the context of network reconstruction for gene interaction networks.

To this end, we rely on the R-package huge, which implements some of the most popular sparse GGM methods and provides a set of basic tools for their handling and their analysis.

The first part focuses on an empirical analysis of the statistical models used for network reconstruction. The objective is to quickly study the range of applicability of these methods. It should also give you some insights about their limitations, especially toward the interpretability of the inferred network in terms of biology.

The second part applies these methods to two data sets: the first one consists in a transcriptomic data associated to a small regulatory network (tens of genes) known by the biologists. The second one is a large cohort of breast cancer transcriptomic data set associated to 44,000 transcripts. The objective is to unravel the most striking interactions between differentially expressed genes.

Note : you can form small and balanced groups of students to work. Some function required during the session are available in file external_functions.R (ask the teachers).


First part: empirical study of sparse GGM

Load the huge package. Have a quick glance at the help.

Synthetic data generation, Network representation

The function huge.generator allows to generate a random network and some (expression) data associated with this network.

  • Use this function to generate a simple random network with the size of your choice. Have a look at the structure of the R object produced by the function.
n <- 50; d <- 100
random.net  <- huge.generator(n, d, graph="random")
## Generating data from the multivariate normal distribution with the random graph structure....done.
str(random.net, max.level = 1)
## List of 7
##  $ data      : num [1:50, 1:100] 0.65 1.446 -1.028 1.074 0.131 ...
##   ..- attr(*, "dimnames")=List of 2
##  $ sigma     : num [1:100, 1:100] 1 0.01545 -0.03478 -0.00433 0.01129 ...
##  $ sigmahat  : num [1:100, 1:100] 1 0.00731 0.08636 -0.00712 0.01259 ...
##  $ omega     : num [1:100, 1:100] 1.23 4.54e-18 -6.48e-18 -1.12e-18 5.67e-18 ...
##  $ theta     :Formal class 'dsCMatrix' [package "Matrix"] with 7 slots
##  $ sparsity  : num 0.0309
##  $ graph.type: chr "random"
##  - attr(*, "class")= chr "sim"
  • Try different network typologies and plot the outputs with the dedicated plot function. Comment and explain what represent the different graphical outputs.
plot(random.net)

cluster.net <- huge.generator(n, d, g = 3, graph="cluster", vis=TRUE, verbose=FALSE)