Long non-coding RNAs (lncRNAs) have emerged as a class of factors that are important for regulating development and cancer. Computational prediction of lncRNAs from ultra-deep RNA sequencing has been successful in identifying candidate lncRNAs. However, the complexity of handling and integrating different types of genomics data poses significant challenges to experimental laboratories that lack extensive genomics expertise.
To address this issue, researchers at the New York University School of Medicine have developed lncRNA-screen, a comprehensive pipeline for computationally screening putative lncRNA transcripts over large multimodal datasets. The main objective of this work is to facilitate the computational discovery of lncRNA candidates to be further examined by functional experiments. lncRNA-screen provides a fully automated easy-to-run pipeline which performs data download, RNA-seq alignment, assembly, quality assessment, transcript filtration, novel lncRNA identification, coding potential estimation, expression level quantification, histone mark enrichment profile integration, differential expression analysis, annotation with other type of segmented data (CNVs, SNPs, Hi-C, etc.) and visualization. Importantly, lncRNA-screen generates an interactive report summarizing all interesting lncRNA features including genome browser snapshots and lncRNA-mRNA interactions based on Hi-C data. In summary, this pipeline provides a comprehensive solution for lncRNA discovery and an intuitive interactive report for identifying promising lncRNA candidates.
The workflow of lncRNA-screen
Phase I conducts RNA-seq alignment and transcriptome assembly, performs transcript filtration and generates a putative lncRNA list. Phase II uses RNA-seq expression quantification, ChIP-seq histone marks and user-defined annotations (CNVs, Hi-C, etc.) to classify lncRNA into different groups.
Availability – lncRNA-screen is available as free open-source software on GitHub: https://github.com/NYU-BFX/lncRNA-screen