Gene regulatory network inference using fused LASSO on multiple data sets: application to Escherichia coli

Authors: Nooshin Omranian, Jeanne M. O. Eloundou-Mbebi, Bernd Mueller-Roeber and and Zoran Nikoloski

submitted to: Sientific Reports



Implementation Notes

To implement the modified fused LASSO approach for reconstruction of gene regulatory networks over different data sets, we defined a penalty matrix D composed of two blocks: The first block D1 is a diagonal kP × kP matrix for the LASSO penalty which includes W in diagonal, while the second block D2 corresponds to the fusion penalty, expressed as:

D1
=
WkP × kP
(1)
D2
=
[ I(k-1)P × (k-1)P | O(k-1)P × P -
O(k-1)P × P | I(k-1)P × (k-1)P ]
D
=
D1|D2T .

To solve the proposed extension to the fused LASSO, we used the lqa package in R [1], with remarkable computational performance due to the Cholesky decomposition [2] which requires ~p + n ~p2 /2 operations, in which n is the number of observations and ~p=p+1 while p is the number of regressors. We also slightly changed the function fused.lasso in lqa package to allow inclusion of the penalty matrices defined in Eq. 1. The regression coefficients were robustly estimated by 10-fold cross validation based on the optimum values for l1 and l2 from the sets {0.05,0.1,0.5,1,1.5} and {0.1,0.5,1,1.5,2}, respectively. To further speed up the algorithm, we used mclapply function from R package parallel [3].

In addition, we used minet [4] and pROC [5] packages in R to draw ROC curves and estimate statistics for them.


[1] Ulbricht, J. (2012). lqa: Penalized Likelihood Inference for GLMs. R package version1.0-3.

[2] Lawson, C. L. and Hanson, R. J. (1974). Solving Least Squares Problems. Series inAutomatic Computation. Prentice-Hall, Englewood Cliffs, NJ 07632, USA.

[3] R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

[4] Meyer, P. E., Kontos, K., Lafitte, F., and Bontempi, G. (2007). Information-theoreticinferenceoflargetranscriptionalregulatory networks.. EURASIP J Bioinform Syst Biol, page 79879.

[5] Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., and Mller, M. (2011). proc: an open-source package for r and s+ to analyze and compare roc curves.. BMC Bioinformatics, 12(1), 77.


R codes for Gene regulatory network inference methods used in the paper: