Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data.
Reserchers from the Mayo Clinic have developed UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR’s accuracy and their relevance to sample clinical phenotypes.
UClncR workflow diagram
The workflow starts from aligned bam (right parameters for stranded/unstranded RNA-seq should be set) for transcript assembly by StringTie. For un-stranded RNA-seq, the workflow only works with lincRNAs. Known lincRNAs are simply quantified and novel lincRNAs are predicted and quantified. For stranded RNA-seq, overlap transcripts in the opposite strand are quantified and predicted.
Availability – UClncR is publically available at http://bioinformaticstools.mayo.edu/research/UClncR .