Last data update: Dec 02, 2024. (Total: 48272 publications since 2009)
Records 1-3 (of 3 Records) |
Query Trace: Huang AD[original query] |
---|
Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics
Lindsey RL , Gladney LM , Huang AD , Griswold T , Katz LS , Dinsmore BA , Im MS , Kucerova Z , Smith PA , Lane C , Carleton HA . Front Microbiol 2023 14 1225207 Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods. |
Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.
Xiaoli L , Hagey JV , Park DJ , Gulvik CA , Young EL , Alikhan NF , Lawsin A , Hassell N , Knipe K , Oakeson KF , Retchless AC , Shakya M , Lo CC , Chain P , Page AJ , Metcalf BJ , Su M , Rowell J , Vidyaprakash E , Paden CR , Huang AD , Roellig D , Patel K , Winglee K , Weigand MR , Katz LS . PeerJ 2022 10 e13821 BACKGROUND: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. METHODS: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. RESULTS: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. DISCUSSION: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines. |
Metagenomics of two severe foodborne outbreaks provides diagnostic signatures and signs of co-infection not attainable by traditional methods.
Huang AD , Luo C , Pena-Gonzalez A , Weigand MR , Tarr C , Konstantinidis KT . Appl Environ Microbiol 2016 83 (3) Diagnostic testing for foodborne pathogens relies on culture-based techniques that are not rapid enough for real-time disease surveillance and do not give a quantitative picture of pathogen abundance or the response of the natural microbiome. Powerful sequence-based, culture-independent approaches such as shotgun metagenomics could sidestep these limitations, and potentially reveal a pathogen-specific signature on the microbiome that would have implications not only for diagnostics but also for better understanding disease progression and pathogen ecology. However, metagenomics have not yet been validated for foodborne pathogen detection. Toward closing these gaps, we applied shotgun metagenomics to stool samples collected from two geographically isolated (Alabama and Colorado) foodborne outbreaks, where the etiologic agents were identified as distinct strains of Salmonella enterica serovar Heidelberg by culture-dependent methods. Metagenomic investigations were consistent with the culture-based findings and revealed, in addition, the in-situ abundance and level of intra-population diversity of the pathogen, the possibility for co-infections with Staphylococcus aureus, and significant shifts in the gut microbiome during infection relative to reference healthy samples. Additionally, we designed our bioinformatics pipeline to deal with several challenges associated with analysis of clinical samples such as the high frequency of co-eluting human DNA sequences and assessment of the virulence potential of pathogens. Comparisons of these results to those of other studies revealed that in several cases of diarrheal outbreaks -but not all- the disease and healthy states of the gut microbial community might be distinguishable, opening new possibilities for diagnostics. IMPORTANCE STATEMENT: Diagnostic testing for enteric pathogens has relied for decades on culture-based techniques but a total of 38.4 million cases of foodborne illness per year cannot be attributed to specific causes. This study describes new culture-independent metagenomic approaches and the associated bioinformatics approaches to detect and type the causative agents of microbial disease with unprecedented accuracy, opening new possibilities for future development of health technologies and diagnostics. Our tools and approaches should be applicable to other microbial diseases in addition to foodborne diarrhea. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Dec 02, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure