Integrating Large-Scale RNA-Seq and CLIP-Seq Datasets Enables Study of lncRNA

Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism More »

Scientists discover long-sought genetic mechanism for cancer progression

Action of a key lncRNA different in colon cancer versus normal colon tissue Genetics researchers from Case Western Reserve School More »

MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA-DNA triplex structures

Long noncoding RNAs (lncRNAs) regulate gene expression by association with chromatin, but how they target chromatin remains poorly understood. Researchers More »

LncRNA Regulator Of Brown Fat Identified

from Asian Scientist AsianScientist (Apr. 29, 2015) – A study by researchers in Duke-NUS Graduate Medical School Singapore (Duke-NUS) has More »

An update on LNCipedia – a database for annotated human lncRNA sequences

LNCipedia collects long non-coding RNA sequences and annotation from different sources. In version 3.0, over 90,000 new transcripts were added More »


RNA-Seq reveals thousands of novel long non-coding RNAs in B cell lymphoma

Gene profiling of diffuse large B cell lymphoma (DLBCL) has revealed broad gene expression deregulation compared to normal B cells. While many studies have interrogated well known and annotated genes in DLBCL, none have yet performed a systematic analysis to uncover novel unannotated long non-coding RNAs (lncRNA) in DLBCL. In this study researchers from the Weill Cornell Medical College sought to uncover these lncRNAs by examining RNA-seq data from primary DLBCL tumors and performed supporting analysis to identify potential role of these lncRNAs in DLBCL.

The researchers performed a systematic analysis of novel lncRNAs from the poly-adenylated transcriptome of 116 primary DLBCL samples. RNA-seq data were processed using de novo transcript assembly pipeline to discover novel lncRNAs in DLBCL. Systematic functional, mutational, cross-species, and co-expression analyses using numerous bioinformatics tools and statistical analysis were performed to characterize these novel lncRNAs.

The researchers identified 2,632 novel, multi-exonic lncRNAs expressed in more than one tumor, two-thirds of which are not expressed in normal B cells. Long read single molecule sequencing supports the splicing structure of many of these lncRNAs. More than one-third of novel lncRNAs are differentially expressed between the two major DLBCL subtypes, ABC and GCB. Novel lncRNAs are enriched at DLBCL super-enhancers, with a fraction of them conserved between human and dog lymphomas. They see transposable elements (TE) overlap in the exonic regions; particularly significant in the last exon of the novel lncRNAs suggest potential usage of cryptic TE polyadenylation signals. They identified highly co-expressed protein coding genes for at least 88 % of the novel lncRNAs. Functional enrichment analysis of co-expressed genes predicts a potential function for about half of novel lncRNAs. Finally, systematic structural analysis of candidate point mutations (SNVs) suggests that such mutations frequently stabilize lncRNA structures instead of destabilizing them.


Discovery of these 2,632 novel lncRNAs in DLBCL significantly expands the lymphoma transcriptome and our analysis identifies potential roles of these lncRNAs in lymphomagenesis and/or tumor maintenance. For further studies, these novel lncRNAs also provide an abundant source of new targets for antisense oligonucleotide pharmacology, including shared targets between human and dog lymphomas.

  • Verma A, Jiang Y, Du W, Fairchild L, Melnick A, Elemento O. (2015) Transcriptome sequencing reveals thousands of novel long non-coding RNAs in B cell lymphoma. Genome Med 7(1):110. [article]

Featured lncRNA – RZE1

In the fungal pathogen Cryptococcus neoformans, the switch from yeast to hypha is an important morphological process preceding the meiotic events during sexual development. Morphotype is also known to be associated with cryptococcal virulence potential. Previous studies identified the regulator Znf2 as a key decision maker for hypha formation and as an anti-virulence factor.

By a forward genetic screen, researchers at Texas A&M University discovered that a long non-coding RNA (lncRNA) RZE1 functions upstream of ZNF2 in regulating yeast-to-hypha transition. They demonstrate that RZE1 functions primarily in cis and less effectively in trans. Interestingly, RZE1’s function is restricted to its native nucleus. Accordingly, RZE1 does not appear to directly affect Znf2 translation or the subcellular localization of Znf2 protein. Transcriptome analysis indicates that the loss of RZE1 reduces the transcript level of ZNF2 and Znf2’s prominent downstream targets. In addition, microscopic examination using single molecule fluorescent in situ hybridization (smFISH) indicates that the loss of RZE1 increases the ratio of ZNF2 transcripts in the nucleus versus those in the cytoplasm.


Working model for RZE1’s nuclear function in Cryptococcus.

Taken together, this lncRNA controls Cryptococcus yeast-to-hypha transition through regulating the key morphogenesis regulator Znf2. This is the first functional characterization of a lncRNA in a human fungal pathogen. Given the potential large number of lncRNAs in the genomes of Cryptococcus and other fungal pathogens, the findings implicate lncRNAs as an additional layer of genetic regulation during fungal development that may well contribute to the complexity in these “simple” eukaryotes.

  • Chacko N, Zhao Y, Yang E, Wang L, Cai JJ, Lin X (2015) The lncRNA RZE1 Controls Cryptococcal Morphological Transition. PLoS Genet 11(11): e1005692.[article]

The role of lncRNA in maintaining genome stability

Long non-coding RNAs (lncRNAs) are important players in diverse biological processes. Upon DNA damage, cells activate a complex signaling cascade referred to as the DNA damage response (DDR). Using a microarray screen, researchers from the National Cancer Institute identify here a novel lncRNA, DDSR1 (DNA damage-sensitive RNA1), which is induced upon DNA damage. DDSR1 induction is triggered in an ATM-NF-κB pathway-dependent manner by several DNA double-strand break (DSB) agents. Loss of DDSR1 impairs cell proliferation and DDR signaling and reduces DNA repair capacity by homologous recombination (HR). The HR defect in the absence of DDSR1 is marked by aberrant accumulation of BRCA1 and RAP80 at DSB sites. In line with a role in regulating HR, DDSR1 interacts with BRCA1 and hnRNPUL1, an RNA-binding protein involved in DNA end resection.


This study establishes a role for the lncRNA DDSR1 in maintaining genome stability. DDSR1 promotes homologous recombination by regulating recruitment of DNA repair factors to DSB after DNA damage.

  • The lncRNA DDSR1 is induced upon DNA damage and interacts with BRCA1 and the RNA-binding repair protein hnRNPUL1.
  • DDSR1 and hnRNPUL1 interact to form a complex which prevents BRCA1 from promiscuous DNA binding and fine-tunes the recruitment of BRCA1 to DSBs upon DNA damage.
  • Absence of DDSR1 or hnRNPUL1 during DNA damage leads to increased recruitment of RAP80 and BRCA1 to DSBs to limit HR.

Sharma V, Khurana S, Kubben N, Abdelmohsen K, Oberdoerffer P, Gorospe M, Misteli T. (2015) A BRCA1-interacting lncRNA regulates homologous recombination. EMBO Rep [Epub ahead of print]. [abstract]

Integrating Large-Scale RNA-Seq and CLIP-Seq Datasets Enables Study of lncRNA


Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein-lncRNA interactions.

In this study, researchers at Sun Yat-sen University analyzed millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies and identified 22,735 RBP-lncRNA regulatory relationships.

The researchers found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. They also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs.

Finally, the researchers developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets.

Availability – StarBase V2.0 is available at:

  • Li JH, Liu S, Zheng LL, Wu J, Sun WJ, Wang ZL, Zhou H, Qu LH, Yang JH. (2015) Discovery of Protein-lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets. Front Bioeng Biotechnol 2:88. [article]

C-It-Loci – a knowledge database for tissue-enriched loci

Increasing evidences suggest that most of the genome is transcribed into RNAs, but many of them are not translated into proteins. All those RNAs that do not become proteins are called ‘non-coding RNAs (ncRNAs)’, which outnumbers protein-coding genes. Interestingly, these ncRNAs are shown to be more tissue specifically expressed than protein-coding genes. Given that tissue-specific expressions of transcripts suggest their importance in the expressed tissue, researchers are conducting biological experiments to elucidate the function of such ncRNAs. Owing greatly to the advancement of next-generation techniques, especially RNA-seq, the amount of high-throughput data are increasing rapidly. However, due to the complexity of the data as well as its high volume, it is not easy to re-analyze such data to extract tissue-specific expressions of ncRNAs from published datasets.

Here, researchers from Goethe University Frankfurt introduce a new knowledge database called ‘C-It-Loci’, which allows a user to screen for tissue-specific transcripts across three organisms: human, mouse and zebrafish. C-It-Loci is intuitive and easy to use to identify not only protein-coding genes but also ncRNAs from various tissues. C-It-Loci defines homology through sequence and positional conservation to allow for the extraction of species-conserved loci. C-It-Loci can be used as a starting point for further biological experiments.


Scheme of C-It-Loci. (a) Flowchart of building of C-It-Loci. All the analyzed results were imported as MySQL data tables into C-It-Loci. (b) Definition of CGP. The genomic coordinates from one protein-coding gene (‘Gene A’) to the immediately downstream protein-coding gene (‘Gene B’) are defined as one locus unit. When homologous protein-coding genes are found in another species for both protein-coding genes in the locus, this locus is defined as ‘conserved locus’, which we called ‘C-It-Loci Genomic Positions (CGP)’

Availability – C-It-Loci is freely available online without registration at

  • Weirick T, John D, Dimmeler S, Uchida S. (2015) C-It-Loci: a knowledge database for tissue-enriched loci. Bioinformatics [Epub ahead of print]. [abstract]