In silico prediction of genomic long non-coding RNAs (lncRNAs) is prerequisite to the construction and elucidation of non-coding regulatory network. Chromatin modifications marked by chromatin regulators are important epigenetic features, which can be captured by prevailing high-throughput approaches such as ChIP sequencing. Researchers at the Harbin Institute of Technology and Harbin Medical University, China demonstrate that the accuracy of lncRNA predictions can be greatly improved when incorporating high-throughput chromatin modifications over mouse embryonic stem differentiation toward adult Cerebellum by logistic regression with LASSO regularization. Importantly, chromatin information is suggested to be complementary to genomic sequence information, highlighting the importance of an integrated model. Applying integrated model, they obtain a list of putative lncRNAs based on uncharacterized fragments from transcriptome assembly. They demonstrate that the putative lncRNAs have regulatory role in vicinity of known gene loci by expression and Gene Ontology enrichment analysis. They also show that the lncRNA expression specificity can be efficiently modeled by the chromatin data with same developmental stage.
The study not only supports the biological hypothesis that chromatin can regulate expression of tissue-specific or developmental stage-specific lncRNAs but also reveals the discriminating features between lncRNA and coding genes, which would guide further lncRNA identifications and characterizations.
- Lv J, Liu H, Huang Z, Su J, He H, Xiu Y, Zhang Y, Wu Q. (2013) Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features. Nucleic Acids Res [Epub ahead of print]. [article]