Last data update: Aug 15, 2025. (Total: 49733 publications since 2009)
| Records 1-4 (of 4 Records) |
| Query Trace: Im SB[original query] |
|---|
| Validation of Core and Whole-Genome Multi-Locus Sequence Typing Schemes for Shiga-Toxin-Producing E. coli (STEC) Outbreak Detection in a National Surveillance Network, PulseNet 2.0, USA
Leeper MM , Schroeder MN , Griswold T , Thakur M , Krishnan K , Katz LS , Hise KB , Williams GM , Stroika SG , Im SB , Lindsey RL , Smith PA , Huffman J , Kelley A , Cleland S , Collins AJ , Gautam S , Tyagi E , Park S , Carriço JA , Machado MP , Pouseele H , Michielsen D , Carleton HA . Microorganisms 2025 13 (6)
Shiga-toxin-producing E. coli (STEC) is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Whole-genome sequencing (WGS) is a powerful tool used in public health and microbiology for the detection, surveillance, and outbreak investigation of STEC. In this study, we applied three WGS-based subtyping methods, high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multi-locus sequence typing using chromosome-associated loci [wgMLST (chrom)], and core genome multi-locus sequence typing (cgMLST), to isolate sequences from 11 STEC outbreaks. For each outbreak, we evaluated the concordance between subtyping methods using pairwise genomic differences (number of SNPs or alleles), linear regression models, and tanglegrams. Pairwise genomic differences were highly concordant between methods for all but one outbreak, which was associated with international travel. The slopes of the regressions for hqSNP vs. allele differences were 0.432 (cgMLST) and 0.966 wgMLST (chrom); the slope was 1.914 for cgMLST vs. wgMLST (chrom) differences. Tanglegrams comprised of outbreak and sporadic sequences showed moderate clustering concordance between methods, where Baker's Gamma Indices (BGIs) ranged between 0.35 and 0.99 and Cophenetic Correlation Coefficients (CCCs) were ≥0.88 across all outbreaks. The K-means analysis using the Silhouette method showed the clear separation of outbreak groups with average silhouette widths ≥0.87 across all methods. This study validates the use of cgMLST for the national surveillance of STEC illness clusters using the PulseNet 2.0 system and demonstrates that hqSNP or wgMLST can be used for further resolution. |
| Evaluation of whole and core genome multilocus sequence typing allele schemes for Salmonella enterica outbreak detection in a national surveillance network, PulseNet USA
Leeper MM , Tolar BM , Griswold T , Vidyaprakash E , Hise KB , Williams GM , Im SB , Chen JC , Pouseele H , Carleton HA . Front Microbiol 2023 14 1254777
Salmonella enterica is a leading cause of bacterial foodborne and zoonotic illnesses in the United States. For this study, we applied four different whole genome sequencing (WGS)-based subtyping methods: high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multilocus sequence typing using either all loci [wgMLST (all loci)] and only chromosome-associated loci [wgMLST (chrom)], and core genome multilocus sequence typing (cgMLST) to a dataset of isolate sequences from 9 well-characterized Salmonella outbreaks. For each outbreak, we evaluated the genomic and epidemiologic concordance between hqSNP and allele-based methods. We first compared pairwise genomic differences using all four methods. We observed discrepancies in allele difference ranges when using wgMLST (all loci), likely caused by inflated genetic variation due to loci found on plasmids and/or other mobile genetic elements in the accessory genome. Therefore, we excluded wgMLST (all loci) results from any further comparisons in the study. Then, we created linear regression models and phylogenetic tanglegrams using the remaining three methods. K-means analysis using the silhouette method was applied to compare the ability of the three methods to partition outbreak and sporadic isolate sequences. Our results showed that pairwise hqSNP differences had high concordance with cgMLST and wgMLST (chrom) allele differences. The slopes of the regressions for hqSNP vs. allele pairwise differences were 0.58 (cgMLST) and 0.74 [wgMLST (chrom)], and the slope of the regression was 0.77 for cgMLST vs. wgMLST (chrom) pairwise differences. Tanglegrams showed high clustering concordance between methods using two statistical measures, the Baker's gamma index (BGI) and cophenetic correlation coefficient (CCC), where 9/9 (100%) of outbreaks yielded BGI values ≥ 0.60 and CCCs were ≥ 0.97 across all nine outbreaks and all three methods. K-means analysis showed separation of outbreak and sporadic isolate groups with average silhouette widths ≥ 0.87 for outbreak groups and ≥ 0.16 for sporadic groups. This study demonstrates that Salmonella isolates clustered in concordance with epidemiologic data using three WGS-based subtyping methods and supports using cgMLST as the primary method for national surveillance of Salmonella outbreak clusters. |
| Evaluation of core genome and whole genome multilocus sequence typing schemes for Campylobacter jejuni and Campylobacter coli outbreak detection in the USA
Joseph LA , Griswold T , Vidyaprakash E , Im SB , Williams GM , Pouseele HA , Hise KB , Carleton HA . Microb Genom 2023 9 (5)
Campylobacter is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak Campylobacter isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic Campylobacter jejuni and Campylobacter coli isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic C. jejuni and C. coli isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model R (2) and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model R (2) and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that C. jejuni and C. coli isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used. |
| Genome-Enabled Molecular Subtyping and Serotyping for Shiga Toxin-Producing Escherichia coli
Im SB , Gupta S , Jain M , Chande AT , Carleton HA , Jordan IK , Rishishwar L . Front Sustain Food Syst 2021 5
Foodborne pathogens are a major public health burden in the United States, leading to 9.4 million illnesses annually. Since 1996, a national laboratory-based surveillance program, PulseNet, has used molecular subtyping and serotyping methods with the aim to reduce the burden of foodborne illness through early detection of emerging outbreaks. PulseNet affiliated laboratories have used pulsed-field gel electrophoresis (PFGE) and immunoassays to subtype and serotype bacterial isolates. Widespread use of serotyping and PFGE for foodborne illness surveillance over the years has resulted in the accumulation of a wealth of routine surveillance and outbreak epidemiological data. This valuable source of data has been used to understand seasonal frequency, geographic distribution, demographic information, exposure information, disease severity, and source of foodborne isolates. In 2019, PulseNet adopted whole genome sequencing (WGS) at a national scale to replace PFGE with higher-resolution methods such as the core genome multilocus sequence typing. Consequently, PulseNet's recent shift to genome-based subtyping methods has rendered the vast collection of historic surveillance data associated with serogroups and PFGE patterns potentially unusable. The goal of this study was to develop a bioinformatics method to associate the WGS data that are currently used by PulseNet for bacterial pathogen subtyping to previously characterized serogroup and PFGE patterns. Previous efforts to associate WGS to PFGE patterns relied on predicting DNA molecular weight based on restriction site analysis. However, these approaches failed owing to the non-uniform usage of genomic restriction sites by PFGE restriction enzymes. We developed a machine learning approach to classify isolates to their most probable serogroup and PFGE pattern, based on comparisons of genomic k-mer signatures. We applied our WGS classification method to 5,970 Shiga toxin-producing Escherichia coli (STEC) isolates collected as part of PulseNet's routine foodborne surveillance activities between 2003 and 2018. Our machine learning classifier is able to associate STEC WGS to higher-level serogroups with very high accuracy and lower-level PFGE patterns with somewhat lower accuracy. Taken together, these classifications support the ability of public health investigators to associate currently generated WGS data with historical epidemiological knowledge linked to serogroups and PFGE patterns in support of outbreak surveillance for food safety and public health. © Copyright © 2021 Im, Gupta, Jain, Chande, Carleton, Jordan and Rishishwar. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Aug 15, 2025
- Content source:
- Powered by CDC PHGKB Infrastructure




