N-203. A Protocol for Rapid and Efficient Bacterial Community Analysis using Pyrosequencing

Q. Wang, B. Chai, W. Sul, D. M. Tourlousse, R. C. Penton, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, J. M. Tiedje, J. R. Cole;
Michigan State Univ., East Lansing, MI.

Sequencing SSU rRNA genes from environmental samples is a standard method for determining bacterial community composition. New sequencing technologies such as pyrosequencing have been successfully used as a rapid and efficient tool to enable in-depth analysis of bacterial composition. We designed a set of primers targeting the hypervariable V4 region of the16S rRNA gene and developed an analysis pipeline that together allow simultaneous sequencing and analysis of up to 80 samples using the 454 Genome Sequencer FLX System. The V4 region has an appropriate length for FLX sequencing. In addition, V4 is one of the variable regions providing the most accurate taxonomic classification, and it has a conserved secondary structure that aids alignment. We developed primers targeting highly conserved regions flanking V4 and tested them for coverage against sequences from RDP release 9.53 and from the GOS database of marine bacterial sequences, and found the primers perfectly matched 94.6% and 94.7% of sequences, respectively. Forward primers were synthesized with one of a number of short tag sequences to allow samples to be separated after sequencing in a common reaction. We developed an analysis pipeline to automate the data processing and simplify the computationally intensive analysis of such large sequencing libraries. Raw sequence reads are sorted using the tag sequence and potential poor-quality sequences discarded. Reads are then assigned to taxa using the RDP Classifier, and both summary and detailed classification results are provided. Next, sequences are aligned using a fast aligner that incorporates rRNA secondary structure information. Reads are then clustered into Operational Taxonomic Units (OTUs) at multiple distances and several common ecological metrics calculated including: Chao1, Shannon Index and rarefaction. The processed data are available in formats suitable for common ecological and statistical packages including Spade, EstimateS, and R. Other options are available to cluster data from multiple samples, to extract specific sequences from the dataset, and to produce comparative metrics between samples.