RNA and protein interactions play crucial roles in multiple biological processes, while these interactions are significantly influenced by the structures and sequences of protein and RNA molecules. A large class of long noncoding RNAs (lncRNAs) can bind and modulate the activity of chromatin proteins, and play roles in chromatin modifications. In this process, lncRNAs, e.g. the Xist, with specific structures can localize chromatin-remodeling complex, such as DNMT3a and possibly also EZH2, to specific target regions whereby stable epigenetic gene silencing can be initiated, or act as a scaffold, e.g. the Hotair, to bind more than two proteins with their modules and direct them to target loci. It is now apparently observed that many lncRNAs are the key regulators of transcriptional and translational output, in addition to other genetic and epigenetic regulators.
In this study, researchers at Wake Forest School of Medicine first performed an analysis of RNA-protein interacting complexes, and identified interface properties of sequences and structures, which reveal the diverse nature of the binding sites. With the observations, they built a three-step prediction model, namely RPI-Bind, for the identification of RNA-protein binding regions using the sequences and structures of both proteins and RNAs. The three steps include 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions on both RNA and protein simultaneously, with the results from steps 1) and 2). Compared with existing methods, most of which employ only sequences, their model significantly improves the prediction accuracy at each of the three steps. Especially, this model outperforms the catRAPID by >20% at the 3rd step. All of these results indicate the importance of structures in RNA-protein interactions, and suggest that the RPI-Bind model is a powerful theoretical framework for studying RNA-protein interactions.
The step-wise work flow of the RPI-Bind prediction method
The whole work flow consists of two steps: training classification models and the applications. The model training process includes various processes, such as construction of the training dataset, feature extraction from sequences and structures in the training data set and development of ‘RPI-Bind’ method, consisting of three models. The developed models were then applied to solve three problems, including 1) the prediction of RNA binding regions on protein, 2) the prediction of protein binding regions on RNA, and 3) the prediction of interacting regions onboth RNA and protein simultaneously.