Integrating Large-Scale RNA-Seq and CLIP-Seq Datasets Enables Study of lncRNA

Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism More »

Scientists discover long-sought genetic mechanism for cancer progression

Action of a key lncRNA different in colon cancer versus normal colon tissue Genetics researchers from Case Western Reserve School More »

MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA-DNA triplex structures

Long noncoding RNAs (lncRNAs) regulate gene expression by association with chromatin, but how they target chromatin remains poorly understood. Researchers More »

LncRNA Regulator Of Brown Fat Identified

from Asian Scientist AsianScientist (Apr. 29, 2015) – A study by researchers in Duke-NUS Graduate Medical School Singapore (Duke-NUS) has More »

An update on LNCipedia – a database for annotated human lncRNA sequences

LNCipedia collects long non-coding RNA sequences and annotation from different sources. In version 3.0, over 90,000 new transcripts were added More »



Integrating Large-Scale RNA-Seq and CLIP-Seq Datasets Enables Study of lncRNA


Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein-lncRNA interactions.

In this study, researchers at Sun Yat-sen University analyzed millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies and identified 22,735 RBP-lncRNA regulatory relationships.

The researchers found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. They also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs.

Finally, the researchers developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets.

Availability – StarBase V2.0 is available at:

  • Li JH, Liu S, Zheng LL, Wu J, Sun WJ, Wang ZL, Zhou H, Qu LH, Yang JH. (2015) Discovery of Protein-lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets. Front Bioeng Biotechnol 2:88. [article]

C-It-Loci – a knowledge database for tissue-enriched loci

Increasing evidences suggest that most of the genome is transcribed into RNAs, but many of them are not translated into proteins. All those RNAs that do not become proteins are called ‘non-coding RNAs (ncRNAs)’, which outnumbers protein-coding genes. Interestingly, these ncRNAs are shown to be more tissue specifically expressed than protein-coding genes. Given that tissue-specific expressions of transcripts suggest their importance in the expressed tissue, researchers are conducting biological experiments to elucidate the function of such ncRNAs. Owing greatly to the advancement of next-generation techniques, especially RNA-seq, the amount of high-throughput data are increasing rapidly. However, due to the complexity of the data as well as its high volume, it is not easy to re-analyze such data to extract tissue-specific expressions of ncRNAs from published datasets.

Here, researchers from Goethe University Frankfurt introduce a new knowledge database called ‘C-It-Loci’, which allows a user to screen for tissue-specific transcripts across three organisms: human, mouse and zebrafish. C-It-Loci is intuitive and easy to use to identify not only protein-coding genes but also ncRNAs from various tissues. C-It-Loci defines homology through sequence and positional conservation to allow for the extraction of species-conserved loci. C-It-Loci can be used as a starting point for further biological experiments.


Scheme of C-It-Loci. (a) Flowchart of building of C-It-Loci. All the analyzed results were imported as MySQL data tables into C-It-Loci. (b) Definition of CGP. The genomic coordinates from one protein-coding gene (‘Gene A’) to the immediately downstream protein-coding gene (‘Gene B’) are defined as one locus unit. When homologous protein-coding genes are found in another species for both protein-coding genes in the locus, this locus is defined as ‘conserved locus’, which we called ‘C-It-Loci Genomic Positions (CGP)’

Availability – C-It-Loci is freely available online without registration at

  • Weirick T, John D, Dimmeler S, Uchida S. (2015) C-It-Loci: a knowledge database for tissue-enriched loci. Bioinformatics [Epub ahead of print]. [abstract]

Co-LncRNA – investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions.

To facilitate such an effort, researchers at Harbin Medical University, China have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis.

In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community.


Flowchart used in Co-LncRNA for investigating the combinatorial effects of lncRNAs in GO annotations and KEGG pathways.


  • Zhao Z, Bai J, Wu A, Wang Y, Zhang J, Wang Z, Li Y, Xu J, Li X. (2015) Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data. Database (Oxford). 2015 Sep 10. [article]

LncReg – a reference resource for lncRNA-associated regulatory networks


Long non-coding RNAs (lncRNAs) are critical in the regulation of various biological processes. In recent years, plethora of lncRNAs have been identified in mammalian genomes through different approaches, and the researchers are constantly reporting the regulatory roles of these lncRNAs, which leads to complexity of literature about particular lncRNAs. Therefore, for the convenience of the researchers, we collected regulatory relationships of the lncRNAs and built a database called ‘LncReg’. This database is developed by collecting 1081 validated lncRNA-associated regulatory entries, including 258 non-redundant lncRNAs and 571 non-redundant genes. With regulatory relationships information, LncReg can provide overall perspectives of regulatory networks of lncRNAs and comprehensive data for bioinformatics research, which is useful for understanding the functional roles of lncRNAs.


  • Zhou Z, Shen Y, Khan MR, Li A. (2015) LncReg: a reference resource for lncRNA-associated regulatory networks. Database (Oxford). 2015 Sep 10. [article]

zflncRNApedia – A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs

Recent transcriptome annotation using deep sequencing approaches have annotated a large number of long non-coding RNAs in zebrafish, a popular model organism for human diseases. These studies characterized lncRNAs in critical developmental stages as well as adult tissues. Each of the studies has uncovered a distinct set of lncRNAs, with minor overlaps. The availability of the raw RNA-Seq datasets in public domain encompassing critical developmental time-points and adult tissues provides us with a unique opportunity to understand the spatiotemporal expression patterns of lncRNAs.

Now, researchers from the CSIR-Institute of Genomics and Integrative Biology have created a catalog of lncRNAs in zebrafish, derived largely from the three annotation sets, as well as manual curation of literature to compile a total of 2,267 lncRNA transcripts in zebrafish. The lncRNAs were further classified based on the genomic context and relationship with protein coding gene neighbors into 4 categories. Analysis revealed a total of 86 intronic, 309 promoter associated, 485 overlapping and 1,386 lincRNAs. They have created a comprehensive resource which houses the annotation of lncRNAs as well as associated information including expression levels, promoter epigenetic marks, genomic variants and retroviral insertion mutants. The resource also hosts a genome browser where the datasets could be browsed in the genome context.


Availability – The resource is freely available at URL:

  • Dhiman H, Kapoor S, Sivadas A, Sivasubbu S, Scaria V. (2015) zflncRNApedia: A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs. PLoS One 10(6):e0129997. [article]