Long non-coding RNAs (lncRNAs) have emerged in recent years as major players in a multitude of pathways across species, but it remains challenging to understand which of them are important and how their functions are performed. Comparative sequence analysis has been instrumental for studying proteins and small RNAs, but the rapid evolution of lncRNAs poses new challenges that demand new approaches. Here, the author reviews the lessons learned so far from genome-wide mapping and comparisons of lncRNAs across different species. He also discusses how comparative analyses can help us to understand lncRNA function and provide practical considerations for examining functional conservation of lncRNA genes.
A generic pipeline for the identification of lncRNAs from RNA-seq data
Long non-coding RNAs (lncRNAs) are identified separately in each species and in each tissue or sample. RNA sequencing (RNA-seq) reads are either first mapped to the genome and then assembled into transcripts (genome-guided assembly, such as that performed by Cufflinks), or first assembled into transcripts (de novo assembly, such as that performed by Trinity) and then mapped to the genome. Transcripts from all samples are then merged, multiple filtering steps remove various artefacts and protein-coding genes, and the remaining transcripts are classified into one of the lncRNA classes. lincRNAs, long intergenic non-coding RNAs.