University of Hawaii Cancer Center researchers use “big data” for lncRNA discoveries

lncRNAHONOLULU – University of Hawai’i Cancer Center researchers used “Big Data” to discover potential cancer biomarkers, a panel of six long intergenic non-coding RNAs (lincRNAs), for the diagnosis of all types of cancers including lung, breast, prostate, liver and ovarian cancers.

“These biomarkers are highly accurate and robust, up to 97 percent, and could be developed into an early screening test for all types of cancers. We find that early detection and prevention is key for the survival and quality of life of cancer patients. This could benefit patients in Hawai’i and around the world,” said Lana Garmire, PhD, an assistant professor in the Cancer Epidemiology Program at the UH Cancer Center.

Garmire, a translational bioinformatics expert who is successful at obtaining competitive NIH grants, used a powerful data mining approach to search through thousands of cancer organ tumor samples and large data sets to find the panel of lincRNAs.

LncRNAs as Cancer Biomarkers
Garmire’s findings published in EBiomedicine highlight lincRnas as the most recently discovered new class of RNA molecules. The advancement of technologies has enabled the identification of tens of thousands of new lincRNAs. Researchers found the molecules to be excellent candidates for cancer biomarkers. Compared to protein coding genes, lincRNAs expression patterns are more specific to particular tissues and developmental stages, and thus could be better biomarkers for cancers.

“We have worked on this study for the last two years, and are at the verge of discovering something very useful. Thanks to the High Performance Computing (HPC) facility at UH Manoa, we have been using a Big Data analytics approach to start our hypothesis with massive data evidence before validating it in the lab,” said Garmire.

The pan-cancer diagnostic model for the lincRNA panel


(a) The classification of the lincRNA panel was based on a computational RNA-Seq pipeline. The TCGA data were split into 80% training and 20% testing subsets. Five out of the six lincRNAs were selected as predictive features using Correlation Feature Selection (CFS). Pan-cancer diagnostic models were constructed using four standard classification machine learning methods: Random Forest (RF), Linear Support Vector Machines (LSVM), Gaussian Support Vector Machines (GSVM) and Logistic Regression (L2-LR). The best model was chosen based on various metrics of the Receiver operating characteristic (ROC) curves, including Area Under the Curve (AUC), F-score, Matthew’s correlation coefficient (MCC) and Accuracy. (b) The performance of the classifier was analysed with the ROC curves on the TCGA hold-out testing data, based on the four classification methods mentioned above and (c) ROC curves of the top Random Forest model on four independent RNA-Seq validation datasets. (d) AUCs were calculated on the TCGA hold-out testing data in and the four validation datasets.

The panel of biomarkers has been approved as a provisional patent, and Garmire is working on securing licensing.

The findings published in EBioMedicine.

About the University of Hawai’i Cancer Center
The University of Hawai’i Cancer Center through its various activities, cancer trial patients and their guests, and other visitors adds more than $54 million to the O’ahu economy. This is equivalent to supporting 776 jobs. It is one of only 69 research institutions designated by the National Cancer Institute. Affiliated with the University of Hawai’i at Manoa, the center is dedicated to eliminating cancer through research, education, and improved patient care. Learn more at Like us on Facebook at Follow us on Twitter @UHCancerCenter.

Source – The University of Hawai’i Cancer Center

Chinga T et al. (2016) Pan-Cancer Analyses Reveal Long Intergenic Non-Coding RNAs Relevant to Tumor Diagnosis, Subtyping and Prognosis. EBioMedicine [Epub ahead of print]. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *