Breast cancer is intrinsically heterogeneous and is commonly classified into four main subtypes associated with distinct biological features and clinical outcomes. However, currently available data resources and methods are limited in identifying molecular subtyping on protein-coding genes, and little is known about the roles of long non-coding RNAs (lncRNAs), which occupies 98% of the whole genome. lncRNAs may also play important roles in subgrouping cancer patients and are associated with clinical phenotypes.
The purpose of this project was to identify lncRNA gene signatures that are associated with breast cancer subtypes and clinical outcomes. Researchers from the University of Mississippi identified lncRNA gene signatures from The Cancer Genome Atlas (TCGA )RNAseq data that are associated with breast cancer subtypes by an optimized 1-Norm SVM feature selection algorithm. We evaluated the prognostic performance of these gene signatures with a semi-supervised principal component (superPC) method.
Although lncRNAs can independently predict breast cancer subtypes with satisfactory accuracy, a combined gene signature including both coding and non-coding genes will give the best clinically relevant prediction performance. The researchers highlighted eight potential biomarkers (three from coding genes and five from non-coding genes) that are significantly associated with survival outcomes.
Visualization of breast cancer subtypes using selected 29 non-coding
(a) and 36 “all” (b) gene features for 839 TCGA RNAseq training set; the sensitivity and specificity of prediction accuracy based on ROC curve for 29 non-coding (c) and 36 “all” (d) gene features.
These proposed methods are a novel means of identifying subtype-specific coding and non-coding potential biomarkers that are both clinically relevant and biologically significant.