Identification of long non-coding RNA in the horse transcriptome

Efforts to resolve the transcribed sequences in the equine genome have focused on protein-coding RNA. The transcription of the intergenic regions, although detected via total RNA sequencing (RNA-seq), has yet to be characterized in the horse. The most recent equine transcriptome based on RNA-seq from several tissues was a prime opportunity to obtain a concurrent long non-coding RNA (lncRNA) database.

This lncRNA database has a breadth of eight tissues and a depth of over 20 million reads for select tissues, providing the deepest and most expansive equine lncRNA database. Utilizing the intergenic reads and three categories of novel genes from a previously published equine transcriptome pipeline, researchers from the University of California, Davis better describe these groups by annotating the lncRNA candidates. These lncRNA candidates were filtered using an approach adapted from human lncRNA annotation, which removes transcripts based on size, expression, protein-coding capability and distance to the start or stop of annotated protein-coding transcripts.

Tissue and RNA-seq library preparation effects on lncRNA detection and expression


a There is a positive relationship between the number of annotated genes and candidate lncRNA detected in each tissue; the pie charts represent the cumulative TPM of that tissue with the turquoise correlated to the expression of the protein-coding transcripts and red to the candidate lncRNA expression. The pies outlined in yellow were rRNA-depleted RNA-seq libraries, pies outlined in black were Ovation RNA-seq libraries and the pies outlined in blue were the polyA-captured RNA-seq libraries. b The hierarchically clustered heatmap also shows clustering on a tissue and RNA-seq library level. c There is a distinguishable difference in the number on lncRNA that seem to be unique to a given tissue, with the skin having the largest number of unique lncRNA and the highest cumulative expression associated with its unique lncRNA. The green line represents the cumulative TPM of all the uniquely present lncRNA, divided by 5 for scaling

This equine lncRNA database has 20,800 transcripts that demonstrate characteristics unique to lncRNA including low expression, low exon diversity and low levels of sequence conservation. These candidate lncRNA will serve as a baseline lncRNA annotation and begin to describe the RNA-seq reads assigned to the intergenic space in the horse.

Availability – The input data including the scripts used to make them can be found at original equine transcriptome Github page:

Scott EY, Mansour T, Bellone RR, Brown CT, Mienaltowski MJ, Penedo MC, Ross PJ, Valberg SJ, Murray JD, Finno CJ. (2017) Identification of long non-coding RNA in the horse transcriptome. BMC Genomics 18(1):511. [article]

Leave a Reply

Your email address will not be published. Required fields are marked *