A standardized pipeline for isolation and assembly of genomes from symbiotic bacteria in whole louse genomic sequence data

Publication Type:Thesis
Year of Publication:2023
Authors:M. Mikail Bala
Academic Department:Bioinformatics
Degree:Master of Science
Number of Pages:51 pp
Date Published:March 2013
University:Virginia Commonwealth University
City:Richmond, Virginia
Keywords:Columbicola eowilsoni, Columbicola koopae, Columbicola masoni
Abstract:

Many insects are known to harbour intracellular and heritable bacteria (endosymbionts), which provide their hosts with adaptive traits. Whole insect gDNA shotgun sequencing projects often sequence the genome of endosymbiont, in addition to the insect’s genome. There are approximately 600 whole genome shotgun libraries from insects available on the public repository (NCBI), which can be mined to obtain endosymbiont genomes. The assembly and annotation of endosymbiont genomes can contribute towards the exploration of their role as obligate symbiotic partners. However, de novo assembly of an endosymbiont genome, continues to be challenging, when the host and/or enteric bacterial gDNA is present in the library as well. So far, whole genome sequence data has been mined by investigators, who manually interrogate the data at multiple steps, a process that is time consuming and difficult to replicate. Here I developed and evaluated a novel strategy that reduces intervention required by the researcher. The strategy consists of two steps: 1) filtering of de novo assembled endosymbiont contigs using Blastn search against a custom reference database and 2) reconstruction of the genome through de novo assembly of reads associated with filtered contigs. Illumina HiSeq libraries were simulated in silico and the pipeline was deployed using the simulated data to test the efficacy of the method. The mean endosymbiont genome recovery from simulated data was 91.27% with a range of 100%-76%. When the method was tested with “real” whole louse shotgun sequencing libraries obtained from a public repository, the results were mixed. The strategy was accurate in contig selection when the louse contained an endosymbiont which had a small genome enriched for AT bases, with a mean percent genome recovery of 98.38% and the range of 100% - 95.88%. However, in other cases involving symbionts with larger genomes, the resulting genomes appeared to be incomplete and require further evaluation.

URL:https://scholarscompass.vcu.edu/etd/7213/
File attachments: 
Thu, 2023-05-04 15:25 -- Yokb
Scratchpads developed and conceived by (alphabetical): Ed Baker, Katherine Bouton Alice Heaton Dimitris Koureas, Laurence Livermore, Dave Roberts, Simon Rycroft, Ben Scott, Vince Smith