Publication Type: | Thesis |
Year of Publication: | 2023 |
Authors: | M. Mikail Bala |
Academic Department: | Bioinformatics |
Degree: | Master of Science |
Number of Pages: | 51 pp |
Date Published: | March 2013 |
University: | Virginia Commonwealth University |
City: | Richmond, Virginia |
Keywords: | Columbicola eowilsoni, Columbicola koopae, Columbicola masoni |
Abstract: | Many insects are known to harbour intracellular and heritable bacteria (endosymbionts), which provide their hosts with adaptive traits. Whole insect gDNA shotgun sequencing projects often sequence the genome of endosymbiont, in addition to the insect’s genome. There are approximately 600 whole genome shotgun libraries from insects available on the public repository (NCBI), which can be mined to obtain endosymbiont genomes. The assembly and annotation of endosymbiont genomes can contribute towards the exploration of their role as obligate symbiotic partners. However, de novo assembly of an endosymbiont genome, continues to be challenging, when the host and/or enteric bacterial gDNA is present in the library as well. So far, whole genome sequence data has been mined by investigators, who manually interrogate the data at multiple steps, a process that is time consuming and difficult to replicate. Here I developed and evaluated a novel strategy that reduces intervention required by the researcher. The strategy consists of two steps: 1) filtering of de novo assembled endosymbiont contigs using Blastn search against a custom reference database and 2) reconstruction of the genome through de novo assembly of reads associated with filtered contigs. Illumina HiSeq libraries were simulated in silico and the pipeline was deployed using the simulated data to test the efficacy of the method. The mean endosymbiont genome recovery from simulated data was 91.27% with a range of 100%-76%. When the method was tested with “real” whole louse shotgun sequencing libraries obtained from a public repository, the results were mixed. The strategy was accurate in contig selection when the louse contained an endosymbiont which had a small genome enriched for AT bases, with a mean percent genome recovery of 98.38% and the range of 100% - 95.88%. However, in other cases involving symbionts with larger genomes, the resulting genomes appeared to be incomplete and require further evaluation. |
URL: | https://scholarscompass.vcu.edu/etd/7213/ |
A standardized pipeline for isolation and assembly of genomes from symbiotic bacteria in whole louse genomic sequence data
File attachments: