Long non-coding RNAs (lncRNAs) play crucial roles in complex disease diagnosis, prognosis, prevention, and treatment, but only a small portion of lncRNA-disease associations have been experimentally verified. Various computational models have been proposed to identify lncRNA-disease associations by integrating heterogeneous data sources. However, existing models generally ignore the intrinsic structure of data sources or treat them as equally relevant, while they may not be.
To accurately identify lncRNA-disease associations, researchers at Southwest University propose a Matrix Factorization based LncRNA-Disease Association prediction model (MFLDA in short). MFLDA decomposes data matrices of heterogeneous data sources into low-rank matrices via matrix tri-factorization to explore and exploit their intrinsic and shared structure. MFLDA can select and integrate the data sources by assigning different weights to them. An iterative solution is further introduced to simultaneously optimize the weights and low-rank matrices. Next, MFLDA uses the optimized low-rank matrices to reconstruct the lncRNA-disease association matrix and thus to identify potential associations. In five-fold cross validation experiments to identify verified lncRNA-disease associations, MFLDA achieves an Area Under the receiver operating characteristic Curve (AUC) of 0.7408, at least 3% higher than those given by state-of-the-art data fusion based computational models. An empirical study on identifying masked lncRNA-disease associations again shows that MFLDA can identify potential associations more accurately than competing models. A case study on identifying lncRNAs associated with breast, lung, and stomach cancers show that 38 out of 45 (84%) associations predicted by MFLDA are supported by recent biomedical literature and further proves the capability of MFLDA in identifying novel lncRNA-disease associations. MFLDA is a general data fusion framework, and as such it can be adopted to predict associations between other biological entities.
The operating principle of MFLDA. MFLDA iteratively optimizes the low-rank matrices (Gi) of multiple relational data matrices via matrix tri-factorization, and weights (Wij) assigned to these data matrices to selectively fuse them. It finally reconstructs the target association matrix based on the optimized low-rank matrices and weights.
Availability – The source code for MFLDA is available at: http://mlda.swu.edu.cn/codes.php?name=MFLDA