The Microbiome Quality Control project

Click here to edit subtitle

Welcome to the MicroBiome Quality Control (MBQC) project. The human microbiome has the potential to become one of the most important new tools for personalized health and precision medicine. In order to transition from a basic research environment to the clinic, technologies and computational methods for assessing human-associated microbial communities must be standardized and quality controlled. Inspired by progress in related areas such as the gene expression microarray (MAQC), the MBQC is a collaborative effort to comprehensively evaluate methods for measuring the human microbiome. This includes tools for sampling human-associated microbes at different body sites, techniques and protocols for handling human microbiome samples, and computational pipelines for microbiome data processing. We hope to improve the state-of-the-art in each of these areas and promote open sharing of standard operating procedures and best practices throughout the field. Everyone is welcome to participate in the MBQC.

The MBQC Baseline study (MBQC-base) has performed a first evaluation of two of the several steps typically used to obtain and analyze the human microbiome.  The baseline assessment included contributions from 16 sample handling laboratories and 9 bioinformatics laboratories, in addition to several additional groups participating in data analysis and manuscript preparation - all on a much-appreciated volunteer basis!  The resulting baseline data include raw sequences, sequence data re-blinded prior to bioinformatics processing, raw OTU tables, and the final integrated data products.  For information on the preprint manuscript currently in review, please contact us.

Variables evaluated during the MBQC-base comprised:
  • Samples. For the baseline phase of the MBQC, all samples were provided by a central repository for convenience and efficiency, and we focused only on the human gut as represented by stool samples. Additional body sites, sample sources, and sample collection methods will be incorporated in future phases of the project. Samples were provided to participants in sets of 96 tubes and comprised three originating biospecimen formats: frozen stool, freeze-dried stool, and pre-extracted DNA.
  • Handling. For the baseline phase of the MBQC, labs registering for the sample handling module received one or more sample sets as specified above and produced raw sequencing files as output for downstream processing by performing three sub-modules:
    • Extraction. DNA extraction is known to have a large impact on the apparent abundance of some members of microbial communities, and many distinct commercial kits are available. The extraction sub-module began with the provided frozen and freeze-dried stool samples as input and produced quantified isolated DNA as output.
    • 16S amplification. The baseline phase included only 16S amplicon based surveys of the human microbiome; other technologies such as metagenomics will be evaluated after this phase. The choice of 16S rRNA gene amplification primers and protocol can have a profound effect on studies' results. The 16S amplification sub-module began with isolated DNA (as provided or from lab-specific extraction) as input and generated pooled barcoded libraries suitable for sequencing as output.
    • Sequencing. A variety of sequencing platforms are now appropriate for human microbiome studies. The baseline included only assessments based on the Illumina MiSeq and HiSeq platforms, although several kits are appropriate for these platforms and were accommodated by the prototype. The sequencing sub-module began with 16S amplicons as input and provided raw sequence reads as output.
  • Bioinformatics. In some cases, differences in computational data handling can have as large or larger of an effect than differences in physical sample handling on the final measurement. The MBQC baseline accommodated any bioinformatic protocol that resulted in an Operational Taxonomic Unit (OTU) table with standardized annotations. Bioinformatics modules received blinded raw sequence reads as input and provided annotated OTU tables as output.