R-053. “Pan-Genomes” and the Microbial Species/Strain Boundary: How Close is “Close”?

J. H. Badger1, R. T. DeBoy1, D. R. Lovley2, B. A. Methé1;
1J. Craig Venter Inst., Rockville, MD, 2Univ. of Massachusetts, Amherst, MA.

A definition of a bacterial species remains an elusive concept in microbiology. Operational definitions based on phenotypic characterizations, DNA/DNA hybridizations and 16S rRNA gene-based phylogeny are commonly used but their veracity is being challenged with the advent of genome sequencing. An unexpected discovery in bacterial genomics is the extensive variation in gene content between different sequenced isolates even those classified as members of the same species. Pan-genome analyses attempt to answer the question of how many genomes are needed to fully describe a bacterial species using mathematical extrapolations to the global gene set of a group of genomes. In one study of six Streptococcus agalactiae strains, pan-genome analyses found approximately 33 new strain-specific genes would be added with every additional strain sequenced; a finding termed an “open” pan-genome (effectively an infinite gene set). Elucidating the similarity between genomes is needed to correctly apply these analyses and accurately define a bacterial species. To answer this question, we applied pan-genome analyses to representatives of the Geobacteraceae, Fe(III) reducing prokaryotes found in subsurface environments, as these examinations have yet to be undertaken in environmentally relevant organisms and compared them with other genome sets including strains of pathogenic Neisseria meningitidis, and Neisseria gonorrhoeae. Simulated genome sets are also being used to identify effects of variables such as GC content, synteny and genomic distance. Preliminary results predict “closed” pan-genomes when species of Geobacter or Neisseria are analyzed while “open” pan-genomes are produced with strains. In particular, the results using species are counter-intuitive as one would expect a pan-genome including all bacterial life to be “open” since new genes are discovered on a regular basis. This finding appears to be an effect of greater sequence distance between genomes. Exploring pan-genomes based on sequences of different distance and composition will elucidate why this is the case and create more biologically meaningful extrapolation methods.