Pancreatic ductal adenocarcinoma (PDA) is a highly metastatic disease with limited therapeutic options. Genome and transcriptome analyses have identified signalling pathways and cancer driver genes with implications in patient stratification and targeted therapy. However, these analyses were performed in bulk samples and focused on coding genes, which represent a small fraction of the genome.
Researchers at the Columbia University Medical Center developed a computational framework to reconstruct the non-coding transcriptome from cross-sectional RNA-Seq, integrating somatic copy number alterations (SCNA), common germline variants associated to PDA risk and clinical outcome. The researchers validated the results in an independent cohort of paired epithelial and stromal RNA-Seq derived from laser capture microdissected human pancreatic tumours, allowing us to annotate the compartment specificity of their expression. They employed systems and experimental biology approaches to interrogate the function of epithelial long non-coding RNAs (lncRNAs) associated with genetic traits and clinical outcome in PDA.
The researchers generated a catalogue of PDA-associated lncRNAs. They showed that lncRNAs define molecular subtypes with biological and clinical significance. They identified lncRNAs in genomic regions with SCNA and single nucleotide polymorphisms associated with lifetime risk of PDA and associated with clinical outcome using genomic and clinical data in PDA. Systems biology and experimental functional analysis of two epithelial lncRNAs (LINC00673 and FAM83H-AS1) suggest they regulate the transcriptional profile of pancreatic tumour samples and PDA cell lines.
These findings indicate that lncRNAs are associated with genetic marks of pancreatic cancer risk, contribute to the transcriptional regulation of neoplastic cells and provide an important resource to design functional studies of lncRNAs in PDA.
Identification of lncRNAs and molecular subtyping of pancreatic ductal adenocarcinoma (PDA)
(A) Schematic representation of the computational analysis. NORI identified 3433 lncRNAs expressed in PDA using RNA-Seq from a cohort of 109 tumours from TCGA. The output of NORI was a subset into abundant lncRNAs (RPKM>1) prioritised for experimental validation, and lncRNAs whose expression correlates (q<0.001) with the allele frequency of PDA driver genes for the identification of molecular subtypes in PDA by non-negative matrix factorisation (NMF). Abundant lncRNAs were annotated with the genomic distance to recurrent SCNA and/or single nucleotide polymorphisms (SNP) associated with PDA risk and with the expression correlation with clinical outcome. In addition, an independent cohort of LCM PDA samples (n=66 epithelium, 65 stroma) was analysed to validate expression of lncRNAs in PDA and to select epithelial lncRNAs for functional analysis. (B) NMF using the expression of lncRNAs identified three molecular subtypes in the TCGA cohort (n=147). (C) Kaplan-Meier disease-free survival estimations for the individual subtypes. (D) Differential gene expression analysis between molecular subtypes. Each TCGA sample is colour coded according to the molecular subtype. KRASmutAF is depicted as an independent estimation of tumour cellularity of each sample. AF, allele frequency; CPAT, Coding Potential Assessment Tool; GTF, gene transfer format; GWAS, genome-wide association studies; LCM, laser-captured microdissected; lncRNAs, long non-coding RNAs; NORI, Non-coding RNA Identification; RPKM, reads per kilobase of transcript per million mapped reads; SCNA, somatic copy number alterations; TCGA, The Cancer Genome Atlas; UCSC, University of California Santa Cruz.