C-It-Loci – a knowledge database for tissue-enriched loci

Increasing evidences suggest that most of the genome is transcribed into RNAs, but many of them are not translated into proteins. All those RNAs that do not become proteins are called ‘non-coding RNAs (ncRNAs)’, which outnumbers protein-coding genes. Interestingly, these ncRNAs are shown to be more tissue specifically expressed than protein-coding genes. Given that tissue-specific expressions of transcripts suggest their importance in the expressed tissue, researchers are conducting biological experiments to elucidate the function of such ncRNAs. Owing greatly to the advancement of next-generation techniques, especially RNA-seq, the amount of high-throughput data are increasing rapidly. However, due to the complexity of the data as well as its high volume, it is not easy to re-analyze such data to extract tissue-specific expressions of ncRNAs from published datasets.

Here, researchers from Goethe University Frankfurt introduce a new knowledge database called ‘C-It-Loci’, which allows a user to screen for tissue-specific transcripts across three organisms: human, mouse and zebrafish. C-It-Loci is intuitive and easy to use to identify not only protein-coding genes but also ncRNAs from various tissues. C-It-Loci defines homology through sequence and positional conservation to allow for the extraction of species-conserved loci. C-It-Loci can be used as a starting point for further biological experiments.


Scheme of C-It-Loci. (a) Flowchart of building of C-It-Loci. All the analyzed results were imported as MySQL data tables into C-It-Loci. (b) Definition of CGP. The genomic coordinates from one protein-coding gene (‘Gene A’) to the immediately downstream protein-coding gene (‘Gene B’) are defined as one locus unit. When homologous protein-coding genes are found in another species for both protein-coding genes in the locus, this locus is defined as ‘conserved locus’, which we called ‘C-It-Loci Genomic Positions (CGP)’

Availability – C-It-Loci is freely available online without registration at http://c-it-loci.uni-frankfurt.de


  • Weirick T, John D, Dimmeler S, Uchida S. (2015) C-It-Loci: a knowledge database for tissue-enriched loci. Bioinformatics [Epub ahead of print]. [abstract]

Co-LncRNA – investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions.

To facilitate such an effort, researchers at Harbin Medical University, China have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis.

In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community.


Flowchart used in Co-LncRNA for investigating the combinatorial effects of lncRNAs in GO annotations and KEGG pathways.


  • Zhao Z, Bai J, Wu A, Wang Y, Zhang J, Wang Z, Li Y, Xu J, Li X. (2015) Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data. Database (Oxford). 2015 Sep 10. [article]

LncReg – a reference resource for lncRNA-associated regulatory networks


Long non-coding RNAs (lncRNAs) are critical in the regulation of various biological processes. In recent years, plethora of lncRNAs have been identified in mammalian genomes through different approaches, and the researchers are constantly reporting the regulatory roles of these lncRNAs, which leads to complexity of literature about particular lncRNAs. Therefore, for the convenience of the researchers, we collected regulatory relationships of the lncRNAs and built a database called ‘LncReg’. This database is developed by collecting 1081 validated lncRNA-associated regulatory entries, including 258 non-redundant lncRNAs and 571 non-redundant genes. With regulatory relationships information, LncReg can provide overall perspectives of regulatory networks of lncRNAs and comprehensive data for bioinformatics research, which is useful for understanding the functional roles of lncRNAs.

Availability: http://bioinformatics.ustc.edu.cn/lncreg/

  • Zhou Z, Shen Y, Khan MR, Li A. (2015) LncReg: a reference resource for lncRNA-associated regulatory networks. Database (Oxford). 2015 Sep 10. [article]

zflncRNApedia – A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs

Recent transcriptome annotation using deep sequencing approaches have annotated a large number of long non-coding RNAs in zebrafish, a popular model organism for human diseases. These studies characterized lncRNAs in critical developmental stages as well as adult tissues. Each of the studies has uncovered a distinct set of lncRNAs, with minor overlaps. The availability of the raw RNA-Seq datasets in public domain encompassing critical developmental time-points and adult tissues provides us with a unique opportunity to understand the spatiotemporal expression patterns of lncRNAs.

Now, researchers from the CSIR-Institute of Genomics and Integrative Biology have created a catalog of lncRNAs in zebrafish, derived largely from the three annotation sets, as well as manual curation of literature to compile a total of 2,267 lncRNA transcripts in zebrafish. The lncRNAs were further classified based on the genomic context and relationship with protein coding gene neighbors into 4 categories. Analysis revealed a total of 86 intronic, 309 promoter associated, 485 overlapping and 1,386 lincRNAs. They have created a comprehensive resource which houses the annotation of lncRNAs as well as associated information including expression levels, promoter epigenetic marks, genomic variants and retroviral insertion mutants. The resource also hosts a genome browser where the datasets could be browsed in the genome context.


Availability – The resource is freely available at URL: http://genome.igib.res.in/zflncRNApedia

  • Dhiman H, Kapoor S, Sivadas A, Sivasubbu S, Scaria V. (2015) zflncRNApedia: A Comprehensive Online Resource for Zebrafish Long Non-Coding RNAs. PLoS One 10(6):e0129997. [article]

An update on LNCipedia – a database for annotated human lncRNA sequences


LNCipedia collects long non-coding RNA sequences and annotation from different sources. In version 3.0, over 90,000 new transcripts were added to the database. 6917 of these transcripts were obtained from RefSeq by filtering for accession prefix (NR_) and size (200bp). This filtering strategy however, does not confine to long non-coding RNAs and also yields transcripts associated with protein coding genes. Transcripts with incomplete open reading frames that are subject to nonsense-mediated mRNA decay for instance are also annotated with accession prefix NR_. These transcripts are generally not considered as true lncRNAs and typically exhibit a high coding potential score when assessed by PhyloCSF. The authors therefore chose to exclude these transcripts from the database and confine their analysis to the RefSeq subset with keyword biomol_ncrna_lncrna as suggested by RefSeq’s Dr. Kimm D. Pruit. This change is reflected in LNCipedia.org update 3.1 and this corrigendum serves to elucidate the discrepancies in the article caused by this update. (read more…)

