Riassunto analitico
The great importance of the yeasts from the subphylum Saccharomycotina has been described widely. They are, in fact, very important to humans not only as pathogens or model organisms but also for taking part in numerous essential ecosystem services. These yeasts have been exploited for decades as cell factories, becoming very crucial components of biotechnology-based industries. A significant enhancement to the study and the comprehension of these microorganisms became possible in the last years, mainly due to the genome NGS (Next Generation Sequencing) techniques that are now very accessible for almost every research group. This significant increase of genetic data has led to a better comprehension of how these genomes evolved and how they can regulate cell behaviour in response to the surrounding environment. Despite that, the genome sequence itself is not very useful because the raw nucleotide sequence is not very informative about the genes of the microorganism, how they work and how they are regulated in response to environmental stimuli. For these reasons, many genome annotation pipelines have been implemented to overcome the challenges and exploit the opportunities presented by NGS. In particular, during my master thesis internship, I used Maker2, a genome annotation and data management tool. I’ve worked with the hybrid yeast Zygosaccharomyces rouxii strain ATCC42981, which is reported to be very halotolerant, and it can grow in medium with very low water activity and pH range to 1.8. For these reasons, Z. rouxii is used in many food industrial processes and has an economic importance as causative of food spoilage. Starting from the raw nucleotide sequence, I applied an existing Maker2 protocol, adapting it to perform the challenging annotation of a hybrid genome. This protocol relies on both de novo and evidence-based gene prediction, combining several external software like RepeatMasker, SNAP, Augustus, Blast+ and Exonerate. The outcome was then manually curated to correct recursive errors that frequently occur when using these annotators. The de novo annotated genome was then validated through comparison with the haploid Zygosaccharomyces rouxii type strain CBS732. The next step was the functional annotation of the genes predicted by Maker2. To achieve this, we used Interproscan to classify the proteins into families and predicting domains and important sites. Interproscan works with a wide range of databases, including the Gene Ontology Resources, so the next steps were a GO (Gene Ontology) and KofamKOALA (KEGG Orthology) analysis, in order to see the distribution of the proteins within different cellular components, molecular functions, biological processes and pathways. Because of the halotolerance properties, the focus of my work was on the genes reported being responsible for high salt concentration resistance. We investigated their presence in several pathways, the copy number, and if there were any identity differences between the homologous in the two subgenomes.
|
Abstract
The great importance of the yeasts from the subphylum Saccharomycotina has been described widely. They are, in fact, very important to humans not only as pathogens or model organisms but also for taking part in numerous essential ecosystem services. These yeasts have been exploited for decades as cell factories, becoming very crucial components of biotechnology-based industries.
A significant enhancement to the study and the comprehension of these microorganisms became possible in the last years, mainly due to the genome NGS (Next Generation Sequencing) techniques that are now very accessible for almost every research group. This significant increase of genetic data has led to a better comprehension of how these genomes evolved and how they can regulate cell behaviour in response to the surrounding environment. Despite that, the genome sequence itself is not very useful because the raw nucleotide sequence is not very informative about the genes of the microorganism, how they work and how they are regulated in response to environmental stimuli. For these reasons, many genome annotation pipelines have been implemented to overcome the challenges and exploit the opportunities presented by NGS. In particular, during my master thesis internship, I used Maker2, a genome annotation and data management tool.
I’ve worked with the hybrid yeast Zygosaccharomyces rouxii strain ATCC42981, which is reported to be very halotolerant, and it can grow in medium with very low water activity and pH range to 1.8. For these reasons, Z. rouxii is used in many food industrial processes and has an economic importance as causative of food spoilage.
Starting from the raw nucleotide sequence, I applied an existing Maker2 protocol, adapting it to perform the challenging annotation of a hybrid genome. This protocol relies on both de novo and evidence-based gene prediction, combining several external software like RepeatMasker, SNAP, Augustus, Blast+ and Exonerate. The outcome was then manually curated to correct recursive errors that frequently occur when using these annotators. The de novo annotated genome was then validated through comparison with the haploid Zygosaccharomyces rouxii type strain CBS732.
The next step was the functional annotation of the genes predicted by Maker2. To achieve this, we used Interproscan to classify the proteins into families and predicting domains and important sites. Interproscan works with a wide range of databases, including the Gene Ontology Resources, so the next steps were a GO (Gene Ontology) and KofamKOALA (KEGG Orthology) analysis, in order to see the distribution of the proteins within different cellular components, molecular functions, biological processes and pathways. Because of the halotolerance properties, the focus of my work was on the genes reported being responsible for high salt concentration resistance. We investigated their presence in several pathways, the copy number, and if there were any identity differences between the homologous in the two subgenomes
|