Long non-coding RNAs (lncRNAs) that drive tumorigenesis are a growing focus of cancer genomics studies. To facilitate further discovery, researchers at the University of Bern have created the “Cancer LncRNA Census” (CLC), a manually-curated and strictly-defined compilation of lncRNAs with causative roles in cancer. CLC has two principle applications: first, as a resource for training and benchmarking de novo identification methods; and second, as a dataset for studying the fundamental properties of these genes. CLC Version 1 comprises 122 lncRNAs implicated in 31 distinct cancers. LncRNAs are included based on functional or genetic evidence for different causative roles in cancer progression. All belong to the GENCODE reference annotation, to facilitate integration across projects and datasets. For each entry, the evidence type, biological activity (oncogene or tumour suppressor), source reference and cancer type are recorded. CLC genes are significantly enriched amongst de novo predicted driver genes from PCAWG. CLC genes are distinguished from other lncRNAs by a series of features consistent with biological function, including gene length, expression and sequence conservation of both exons and promoters. The researchers identify a trend for CLC genes to be co-localised with known protein-coding cancer genes along the human genome. Finally, by integrating data from transposon-mutagenesis functional screens, the researchers show that mouse orthologues of CLC genes tend also to be cancer driver genes. Thus CLC represents a valuable resource for research into long non-coding RNAs in cancer. Their evolutionary and genomic properties have implications for understanding disease mechanisms and point to conserved functions across ~80 million years of evolution.
Overview of the Cancer LncRNA Census
Rows represent the 122 CLC genes, columns represent 31 cancer types. Blue cells indicate evidence for the involvement of a given lncRNA in that cancer type. Left column indicates functional classification: tumour suppressor (TSG), oncogene (OG) or both (OG/TSG). Above and to the right, barplots indicate the count totals of each column / row. The piechart shows the fraction that CLC within GENCODE v24 lncRNAs. Note that 8 CLC genes are classified as “pseudogenes” by GENCODE. “nonCLC” refers to all other GENCODE-annotated lncRNAs, which are used as background in comparative analyses.