Gene regulatory network inference using fused LASSO on multiple data sets: application to Escherichia coli
Authors: Nooshin Omranian, Jeanne M. O. Eloundou-Mbebi, Bernd Mueller-Roeber and and Zoran Nikoloski
submitted to: Sientific Reports
Implementation Notes
To implement the modified fused LASSO approach for reconstruction of gene regulatory networks over different data sets, we defined a penalty matrix D composed of two blocks: The first block D1 is a diagonal kP × kP matrix for the LASSO penalty which includes W in diagonal, while the second block D2 corresponds to the fusion penalty, expressed as:
|
|
|
|
|
|
[ I(k-1)P × (k-1)P | O(k-1)P × P -
|
|
|
|
O(k-1)P × P | I(k-1)P × (k-1)P ]
|
|
|
|
|
To solve the proposed extension to the fused LASSO, we used the lqa package in R [1], with remarkable computational performance due to the Cholesky decomposition [2] which requires ~p + n ~p2 /2 operations, in which n is the number of observations and ~p=p+1 while p is the number of regressors. We also slightly changed the function fused.lasso in lqa package to allow inclusion of the penalty matrices defined in Eq. 1. The regression coefficients were robustly estimated by 10-fold cross validation based on the optimum values for l1 and l2 from the sets {0.05,0.1,0.5,1,1.5} and {0.1,0.5,1,1.5,2}, respectively. To further speed up the algorithm, we used mclapply function from R package parallel [3].
In addition, we used minet [4] and pROC [5] packages in R to draw ROC curves and estimate statistics for them.
[1] Ulbricht, J. (2012). lqa: Penalized Likelihood Inference for GLMs. R package version1.0-3.
[2] Lawson, C. L. and Hanson, R. J. (1974). Solving Least Squares Problems. Series inAutomatic Computation. Prentice-Hall, Englewood Cliffs, NJ 07632, USA.
[3] R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[4] Meyer, P. E., Kontos, K., Lafitte, F., and Bontempi, G. (2007). Information-theoreticinferenceoflargetranscriptionalregulatory networks.. EURASIP J Bioinform Syst Biol, page 79879.
[5] Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., and Mller, M. (2011). proc: an open-source package for r and s+ to analyze and compare roc curves.. BMC Bioinformatics, 12(1), 77.
R codes for Gene regulatory network inference methods used in the paper:
- Proposed approach
The code for regression based approach which simultaneously infer gene regulatory networks from multiple data
sets can be download from the folowing link.
Download
- Global Silencing
The code is provided by Barzel et al. (2013)
function [S] = SMatrix(G)
clear S;
[N cols] = size(G);
D = (G - eye(N)) * G;
for i = 1:N
for j = i + 1:N
D(i,j) = 0;
D(j,i) = 0;
end
end
S = (G - eye(N) + D) * inv(G);
for i = 1:N
S(i,i) = 0;
end
S = abs(S)/max(max(abs(S)));
Barzel B, Barabási AL (2013) Network link prediction by global silencing of indirect correlations. Nat Biotechnology 31: 720–725.
Network Deconvolution
The Network Deconvolution code is available at http://compbio.mit.edu/nd/code/ND_regulatory.m.
Feizi S, Marbach D, Mdard M, Kellis M (2013) Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnology 31: 726–733.
Graphical Gaussian Models (GGM)
We used the package GeneNet to infer gene regulatory networks based on the GGM proposed in Schäfer et al.
inferred.pcor <- ggm.estimate.pcor(data)
net <- ggm.test.edges(inferred.pcor)
Schäfer J, Strimmer K (2005) An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics 21: 754–764.
GENIE3
We used online tool available at GP-DREAM in order to infer gene regulatory networks.
Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., and Mesirov, J. P. (2006).Genepattern 2.0. Nature Genetics, 38(5),500501.
Marbach, D., Costello, J. C., Kffner, R., Vega, N. M., Prill, R. J.,Camacho, D. M., Allison, K. R., , D. R. E. A. M. C., Kellis, M.,Collins, J. J., and Stolovitzky, G. (2012). Wisdom of crowds for robust gene network inference. Nat Methods, 9(8), 796–804.
The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE)
We used the package minet in order to infer gene regulatory networks based on mutual information.
mim <- build.mim(data,estimator="spearman")
net <- aracne(mim)
Margolin AA, Wang K, Lim WK, Kustagi M, Nemenman I, et al. (2006) Reverse engineering cellular networks. Nat Protoc 1: 662–671.
Context Likelihood of Relatedness (CLR)
We used the package minet in order to infer gene regulatory networks based on mutual information.
mim <- build.mim(data,estimator="spearman")
net <- clr(mim)
Margolin AA, Wang K, Lim WK, Kustagi M, Nemenman I, et al. (2006) Reverse engineering cellular networks. Nat Protoc 1: 662–671.
Categorical Bayesian Networks
We used the package catnet in order to infer gene regulatory networks rely on Bayesian Network.
parentlist = vector(dim(data)[1], mode = "list")
parentlist = lapply(parentlist,function(parentlist)c("rpod", "rpoe" , "rpoh" , "rpos"))
net <- cnSearchSA(data,parentsPool=parentlist,maxParentSet=4,fixedParents=parentlist,selectMode="BIC")
best = cnFindBIC(net, data)
cnMatEdges(best)
Balov N. (2013) A categorical network approach for discovering differentially expressed regulations in cancer BMC medical genomics. BMC medical genomics 6: S1.