Identification and function annotation of long intervening noncoding RNAs

RNA-seq technology offers the promise of rapid comprehensive discovery of long intervening noncoding RNAs (lincRNAs). Basic tools such as Tophat and Cufflinks have been widely used for RNA-seq assembly. However, advanced bioinformatics methodologies that allow in-depth analysis of lincRNAs are lacking. Here, researchers from the Chinese Academy of Sciences describe a computational protocol that is especially designed for the identification of novel lincRNAs and the prediction of the function. The protocol mainly includes two open-access tools, CNCI and ncFANs. CNCI allows users to distinguish noncoding from protein-coding transcripts and to retrieve novel lincRNAs. ncFANs integrates expression profiles of protein-coding and lincRNA genes to construct coexpression networks. Such networks are subsequently used to perform function predictions of unknown lincRNAs. This protocol will allow users to apply these procedures without the need of additional training.

An overview of the protocol


The raw RNA-seq reads of multiple samples are processed by the Cufflinks protocol (light blue background). The output of the Cufflinks protocol contains two types of files, a merged assembly file and an expression file. The merged assembly is first compared with the known gene annotations and classified into four categories. The resulting potentially novel gene catalog is provided as input to CNCI, which classifies each input transcript into coding or noncoding sequence. Then, novel lincRNA gene catalog is generated by filtering out unsatisfied transcripts. The novel lincRNA genes are merged with known genes into a unified annotation for further analysis. This merged annotation together with its expression profile generated by the Cuffdiff utility are used to construct the coexpression network. Based on the network, ncFANs uses three methods as well as genomic co-location information to predict lincRNA functions.

