The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-Seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-Seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and – the focus of this review – deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-Seq data. Here the authors review current methods for identifying lncRNAs using large-scale sequencing data from RNA-Seq experiments and highlight analytical considerations that are required when undertaking such projects.
- Ilott NE, Ponting CP. (2013) Predicting long non-coding RNAs using RNA sequencing. Methods [Epub ahead of print]. [abstract]