TERIUS – accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association

LncRNAs are long regulatory non-coding RNAs, some of which are arguably predicted to have coding potential. Despite coding potential classifiers that utilize ribosome profiling data successfully detected actively translated regions, they are less sensitive to lncRNAs. Furthermore, lncRNA annotation can be susceptible to false positives obtained from 3′ untranslated region (UTR) fragments of mRNAs.

To lower these limitations in lncRNA annotation, researchers from Hanyang University, Korea have developed a novel tool TERIUS that provides a two-step filtration process to distinguish between bona fide and false lncRNAs. The first step successfully separates lncRNAs from protein-coding genes showing enhanced sensitivity compared to other methods. To eliminate 3’UTR fragments, the second step takes advantage of the 3’UTR-specific association with regulator of nonsense transcripts 1 (UPF1), leading to refined lncRNA annotation. Importantly, TERIUS enabled the detection of misclassified transcripts in published lncRNA annotations.

Two-step schematic flow of TERIUS


In RPS, ribosome reads are mapped to transcripts and converted to sub-codon position signals. Then the signals are shifted to find the most-likely coding frame and adjusted before weighted relative entropy was calculated for ncRNA (gold) and mRNA (purple) sets. Resulting distribution was estimated to generate a model (x axis: WRE, y axis: density). Transcripts predicted as coding are classified as mRNA while ncRNA and low ribosome transcripts (LRT) are passed on to the second step, where they are further classified as bona fide lncRNAs or 3’UTR fragments depending on their association to UPF1. UAS classification is also based on density model. X axis represents UPF1 CLIP-seq RPM divided by RNA-seq RPM in log scale and y axis is density. The bar colored in yellow and purple in the left represents the fraction of transcripts without UPF1 association

TERIUS is a robust method for lncRNA annotation, which provides an additional filtration step for 3’UTR fragments. TERIUS was able to successfully re-classify GENCODE and miTranscriptome lncRNA annotations. The developers believe that TERIUS can benefit construction of extensive and accurate non-coding transcriptome maps in many genomes.

Choi SW, Nam JW. (2018) TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association. BMC Bioinformatics 19(Suppl 1):41. [article]

