Last data update: Aug 15, 2025. (Total: 49733 publications since 2009)
| Records 1-4 (of 4 Records) |
| Query Trace: Williams GM[original query] |
|---|
| Validation of Core and Whole-Genome Multi-Locus Sequence Typing Schemes for Shiga-Toxin-Producing E. coli (STEC) Outbreak Detection in a National Surveillance Network, PulseNet 2.0, USA
Leeper MM , Schroeder MN , Griswold T , Thakur M , Krishnan K , Katz LS , Hise KB , Williams GM , Stroika SG , Im SB , Lindsey RL , Smith PA , Huffman J , Kelley A , Cleland S , Collins AJ , Gautam S , Tyagi E , Park S , Carriço JA , Machado MP , Pouseele H , Michielsen D , Carleton HA . Microorganisms 2025 13 (6)
Shiga-toxin-producing E. coli (STEC) is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Whole-genome sequencing (WGS) is a powerful tool used in public health and microbiology for the detection, surveillance, and outbreak investigation of STEC. In this study, we applied three WGS-based subtyping methods, high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multi-locus sequence typing using chromosome-associated loci [wgMLST (chrom)], and core genome multi-locus sequence typing (cgMLST), to isolate sequences from 11 STEC outbreaks. For each outbreak, we evaluated the concordance between subtyping methods using pairwise genomic differences (number of SNPs or alleles), linear regression models, and tanglegrams. Pairwise genomic differences were highly concordant between methods for all but one outbreak, which was associated with international travel. The slopes of the regressions for hqSNP vs. allele differences were 0.432 (cgMLST) and 0.966 wgMLST (chrom); the slope was 1.914 for cgMLST vs. wgMLST (chrom) differences. Tanglegrams comprised of outbreak and sporadic sequences showed moderate clustering concordance between methods, where Baker's Gamma Indices (BGIs) ranged between 0.35 and 0.99 and Cophenetic Correlation Coefficients (CCCs) were ≥0.88 across all outbreaks. The K-means analysis using the Silhouette method showed the clear separation of outbreak groups with average silhouette widths ≥0.87 across all methods. This study validates the use of cgMLST for the national surveillance of STEC illness clusters using the PulseNet 2.0 system and demonstrates that hqSNP or wgMLST can be used for further resolution. |
| primerForge: a Python program for identifying primer pairs capable of distinguishing groups of genomes from each other
Wirth JS , Katz LS , Williams GM , Chen JC . J Open Source Softw 2024 9 (101)
In both molecular epidemiology and microbial ecology, it is useful to be able to categorize specific strains of microorganisms in either an ingroup or an outgroup in a given population, e.g. to distinguish a pathogenic strain of interest from its non-virulent relatives. An "ingroup" refers to a group of microbes that are the primary focus of study or interest. Conversely, an "outgroup" consists of microbes that are closely-related to, but have evolved separately from, the ingroup. While whole genome sequencing and downstream phylogenetic analyses can be employed to do this, these techniques are often slow and can be resource intensive. Additionally, the laboratory would have to sequence the whole genome to use these tools to determine whether or not a new sample is part of the ingroup or outgroup. Alternatively, polymerase chain reaction (PCR) can be used to amplify regions of genetic material that are specific to the strain(s) of interest. PCR is faster, less expensive, and more accessible than whole genome sequencing, so having a PCR-based approach can accelerate the detection of specific strain(s) of microbes and facilitate diagnoses and/or population studies. |
| Evaluation of whole and core genome multilocus sequence typing allele schemes for Salmonella enterica outbreak detection in a national surveillance network, PulseNet USA
Leeper MM , Tolar BM , Griswold T , Vidyaprakash E , Hise KB , Williams GM , Im SB , Chen JC , Pouseele H , Carleton HA . Front Microbiol 2023 14 1254777
Salmonella enterica is a leading cause of bacterial foodborne and zoonotic illnesses in the United States. For this study, we applied four different whole genome sequencing (WGS)-based subtyping methods: high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multilocus sequence typing using either all loci [wgMLST (all loci)] and only chromosome-associated loci [wgMLST (chrom)], and core genome multilocus sequence typing (cgMLST) to a dataset of isolate sequences from 9 well-characterized Salmonella outbreaks. For each outbreak, we evaluated the genomic and epidemiologic concordance between hqSNP and allele-based methods. We first compared pairwise genomic differences using all four methods. We observed discrepancies in allele difference ranges when using wgMLST (all loci), likely caused by inflated genetic variation due to loci found on plasmids and/or other mobile genetic elements in the accessory genome. Therefore, we excluded wgMLST (all loci) results from any further comparisons in the study. Then, we created linear regression models and phylogenetic tanglegrams using the remaining three methods. K-means analysis using the silhouette method was applied to compare the ability of the three methods to partition outbreak and sporadic isolate sequences. Our results showed that pairwise hqSNP differences had high concordance with cgMLST and wgMLST (chrom) allele differences. The slopes of the regressions for hqSNP vs. allele pairwise differences were 0.58 (cgMLST) and 0.74 [wgMLST (chrom)], and the slope of the regression was 0.77 for cgMLST vs. wgMLST (chrom) pairwise differences. Tanglegrams showed high clustering concordance between methods using two statistical measures, the Baker's gamma index (BGI) and cophenetic correlation coefficient (CCC), where 9/9 (100%) of outbreaks yielded BGI values ≥ 0.60 and CCCs were ≥ 0.97 across all nine outbreaks and all three methods. K-means analysis showed separation of outbreak and sporadic isolate groups with average silhouette widths ≥ 0.87 for outbreak groups and ≥ 0.16 for sporadic groups. This study demonstrates that Salmonella isolates clustered in concordance with epidemiologic data using three WGS-based subtyping methods and supports using cgMLST as the primary method for national surveillance of Salmonella outbreak clusters. |
| Evaluation of core genome and whole genome multilocus sequence typing schemes for Campylobacter jejuni and Campylobacter coli outbreak detection in the USA
Joseph LA , Griswold T , Vidyaprakash E , Im SB , Williams GM , Pouseele HA , Hise KB , Carleton HA . Microb Genom 2023 9 (5)
Campylobacter is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak Campylobacter isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic Campylobacter jejuni and Campylobacter coli isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic C. jejuni and C. coli isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model R (2) and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model R (2) and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that C. jejuni and C. coli isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Aug 15, 2025
- Content source:
- Powered by CDC PHGKB Infrastructure




