The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in highly heterozygous species such as grapevine (Vitis vinifera L.). This work is an attempt to utilize a de novo-assembled transcriptome of the V. vinifera cultivar ‘Riesling’ to improve annotation of the grapevine reference genome sequence.
Here researchers from Missouri State University and the USDA show that the transcriptome assembly of a single V. vinifera cultivar is insufficient for a complete genome annotation of the grapevine reference genome constructed from V. vinifera PN40024. Further, they provide evidence that the gene models they identified cannot be completely anchored to the previously published V. vinifera PN40024 gene models. In addition to these findings, the researchers present a computational pipeline for the de novo identification of lncRNAs. These results demonstrate that, in grapevine, lncRNAs are significantly different from protein coding transcripts in such metrics as length, GC-content, minimum free energy, and length-corrected minimum free energy.
Diagram of the long non-coding RNA identification pipeline
The numbers on the left of the figure indicate the number of transcripts remaining after the filtering step shown on the right. Through the ‘Compare sequences across data sets’ step, numbers are shown for accessions Ventosa and 588,673 respecitvely. The final number, framed in green, shows the number of psuedo-validated lncRNAs through the Coding Potential Calculator
In grapevine, high-level heterozygosity necessitates that transcriptome characterization be based on cultivar-specific reference genome sequences. These results strengthen the hypothesis that lncRNAs have thermodynamically different properties than protein-coding RNAs. The analyses of both coding and non-coding RNAs will be instrumental in uncovering inter-cultivar variation in wild and cultivated grapevine species.