Long noncoding RNAs (lncRNAs) represent a vast unexplored genetic space that may hold missing drivers of tumourigenesis, but few such “driver lncRNAs” are known. Until now, they have been discovered through changes in expression, leading to problems in distinguishing between causative roles and passenger effects.
Researchers at the CRG – Barcelona Institute of Science and Technology have developed a different approach for driver lncRNA discovery using mutational patterns in tumour DNA. Their pipeline, ExInAtor, identifies genes with excess load of somatic single nucleotide variants (SNVs) across panels of tumour genomes. Heterogeneity in mutational signatures between cancer types and individuals is accounted for using a simple local trinucleotide background model, which yields high precision and low computational demands.
The researchers use ExInAtor to predict drivers from the GENCODE annotation across 1112 entire genomes from 23 cancer types. Using a stratified approach, they identify 15 high-confidence candidates: 9 novel and 6 known cancer-related genes, including MALAT1, NEAT1 and SAMMSON. Both known and novel driver lncRNAs are distinguished by elevated gene length, evolutionary conservation and expression. These researchers have presented a first catalogue of mutated lncRNA genes driving cancer, which will grow and improve with the application of ExInAtor to future tumour genome projects.
LncRNA cancer driver genes predicted by ExInAtor across cancer genomes
(A) All driver lncRNAs (Q ≤ 0.1) and the tumour type in which they are identified. Gene names in blue indicate those belonging to CRL. (B) A mutation density plot for NEAT1 in all cancers, plotting the SNVs per kilobase as a function of gene regions. Grey represent background regions, while colours represent the mutational contribution of each cancer type to the single exon. The x-axis represents position, in bp, with respect to the start of the background region, defined here to be at 10 kb upstream of the gene’s annotated TSS. (C) The Breast mutation profile of RP11-1101K5.1, a gene with mutations in four exons. Rectangles depict mutational density of exons (blue) and introns (grey). The gene structure is indicated below, where wider portions represent exons, separated by narrower introns. (D) The Breast mutation frequency in BCAR4. (E) Percentage of genes and candidates in CNV regions and proximal to cancer-related germline SNPs. Numbers above bars indicate the absolute numbers of genes represented by each percentage. Statistical significance in each case was estimated using Fisher’s Exact test. (F) An example of an ExInAtor-predicted novel candidate gene, RP11-820L6.1. Note the presence of promoter-like histone marks (red, ChromHMM track), evolutionary conservation (PhastCons Primate conservation), and cancer SNVs around the gene TSS, as well as a proximal P53 binding site (“P53_merged”).
Availability – The latest ExInAtor version is freely available for download here: https://github.com/alanzos/ExInAtor/