Long non-coding RNAs (lncRNAs) form a substantial component of the transcriptome and are involved in a wide variety of regulatory mechanisms. Compared to protein-coding genes, they are often expressed at low levels and are restricted to a narrow range of cell types or developmental stages. As a consequence, the diversity of their isoforms is still far from being recorded and catalogued in its entirety, and the debate is ongoing about what fraction of non-coding RNAs truly conveys biological function rather than being “junk”.
Here, using a collection of more than 100 transcriptomes from related B cell lymphoma, University Leipzig researchers show that lncRNA loci produce a very defined set of splice variants. While some of them are so rare that they become recognizable only in the superposition of dozens or hundreds of transcriptome datasets and not infrequently include introns or exons that have not been included in available genome annotation data, there is still a very limited number of processing products for any given locus. The combined depth of our sequencing data is large enough to effectively exhaust the isoform diversity: the overwhelming majority of splice junctions that are observed at all are represented by multiple junction-spanning reads. The researchers conclude that the human transcriptome produces virtually no background of RNAs that are processed at effectively random positions, but is-under normal circumstances-confined to a well defined set of splice variants.
Two examples with previously unannotated splice junctions and introns
(Top) In ENSG00000267939, we find six introns and two additional exons compared to a single intron described in GENCODE v19. (Below) For ENSG00000263470 we find eight introns plus a likely false positive compared to two introns in GENCODE.