Last data update: Mar 17, 2025. (Total: 48910 publications since 2009)
Records 1-30 (of 31 Records) |
Query Trace: Carleton HA[original query] |
---|
Kalamari: a representative set of genomes of public health concern
Katz LS , Griswold T , Lindsey RL , Lauer AC , Im MS , Williams G , Halpin JL , Gómez GA , Kucerova Z , Morrison S , Page A , Den Bakker HC , Carleton HA . Microbiol Resour Announc 2025 e0096324 ![]() ![]() Kalamari is a resource that supports genomic epidemiology and pathogen surveillance. It consists of representative genomes and common contaminants. Kalamari also contains a custom taxonomy and software for downloading and formatting the data. |
Genetic diversity in Salmonella enterica in outbreaks of foodborne and zoonotic origin in the USA in 2006-2017
Trees E , Carleton HA , Folster JP , Gieraltowski L , Hise K , Leeper M , Nguyen TA , Poates A , Sabol A , Tagg KA , Tolar B , Vasser M , Webb HE , Wise M , Lindsey RL . Microorganisms 2024 12 (8) ![]() ![]() Whole genome sequencing is replacing traditional laboratory surveillance methods as the primary tool to track and characterize clusters and outbreaks of the foodborne and zoonotic pathogen Salmonella enterica (S. enterica). In this study, 438 S. enterica isolates representing 35 serovars and 13 broad vehicle categories from one hundred epidemiologically confirmed outbreaks were evaluated for genetic variation to develop epidemiologically relevant interpretation guidelines for Salmonella disease cluster detection. The Illumina sequences were analyzed by core genome multi-locus sequence typing (cgMLST) and screened for antimicrobial resistance (AR) determinants and plasmids. Ninety-three of the one hundred outbreaks exhibited a close allele range (less than 10 allele differences with a subset closer than 5). The remaining seven outbreaks showed increased variation, of which three were considered polyclonal. A total of 16 and 28 outbreaks, respectively, showed variations in the AR and plasmid profiles. The serovars Newport and I 4,[5],12:i:-, as well as the zoonotic and poultry product vehicles, were overrepresented among the outbreaks, showing increased variation. A close allele range in cgMLST profiles can be considered a reliable proxy for epidemiological relatedness for the vast majority of S. enterica outbreak investigations. Variations associated with mobile elements happen relatively frequently during outbreaks and could be reflective of changing selective pressures. |
Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics
Lindsey RL , Gladney LM , Huang AD , Griswold T , Katz LS , Dinsmore BA , Im MS , Kucerova Z , Smith PA , Lane C , Carleton HA . Front Microbiol 2023 14 1225207 ![]() ![]() Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods. |
Antimicrobial resistance in multistate outbreaks of nontyphoidal Salmonella infections linked to animal contact-United States, 2015-2018
Frey E , Stapleton GS , Nichols MC , Gollarza LM , Birhane M , Chen JC , McCullough A , Carleton HA , Trees E , Hise KB , Tolar B , Francois Watkins L . J Clin Microbiol 2023 e0098123 ![]() ![]() Animal contact is an established risk factor for nontyphoidal Salmonella infections and outbreaks. During 2015-2018, the U.S. Centers for Disease Control and Prevention (CDC) and other U.S. public health laboratories began implementing whole-genome sequencing (WGS) of Salmonella isolates. WGS was used to supplement the traditional methods of pulsed-field gel electrophoresis for isolate subtyping, outbreak detection, and antimicrobial susceptibility testing (AST) for the detection of resistance. We characterized the epidemiology and antimicrobial resistance (AMR) of multistate salmonellosis outbreaks linked to animal contact during this time period. An isolate was considered resistant if AST yielded a resistant (or intermediate, for ciprofloxacin) interpretation to any antimicrobial tested by the CDC or if WGS showed a resistance determinant in its genome for one of these agents. We identified 31 outbreaks linked to contact with poultry (n = 23), reptiles (n = 6), dairy calves (n = 1), and guinea pigs (n = 1). Of the 26 outbreaks with resistance data available, we identified antimicrobial resistance in at least one isolate from 20 outbreaks (77%). Of 1,309 isolates with resistance information, 247 (19%) were resistant to ≥1 antimicrobial, and 134 (10%) were multidrug-resistant to antimicrobials from ≥3 antimicrobial classes. The use of resistance data predicted from WGS increased the number of isolates with resistance information available fivefold compared with AST, and 28 of 43 total resistance patterns were identified exclusively by WGS; concordance was high (>99%) for resistance determined by AST and WGS. The use of predicted resistance from WGS enhanced the characterization of the resistance profiles of outbreaks linked to animal contact by providing resistance information for more isolates. |
Evaluation of whole and core genome multilocus sequence typing allele schemes for Salmonella enterica outbreak detection in a national surveillance network, PulseNet USA
Leeper MM , Tolar BM , Griswold T , Vidyaprakash E , Hise KB , Williams GM , Im SB , Chen JC , Pouseele H , Carleton HA . Front Microbiol 2023 14 1254777 ![]() ![]() Salmonella enterica is a leading cause of bacterial foodborne and zoonotic illnesses in the United States. For this study, we applied four different whole genome sequencing (WGS)-based subtyping methods: high quality single-nucleotide polymorphism (hqSNP) analysis, whole genome multilocus sequence typing using either all loci [wgMLST (all loci)] and only chromosome-associated loci [wgMLST (chrom)], and core genome multilocus sequence typing (cgMLST) to a dataset of isolate sequences from 9 well-characterized Salmonella outbreaks. For each outbreak, we evaluated the genomic and epidemiologic concordance between hqSNP and allele-based methods. We first compared pairwise genomic differences using all four methods. We observed discrepancies in allele difference ranges when using wgMLST (all loci), likely caused by inflated genetic variation due to loci found on plasmids and/or other mobile genetic elements in the accessory genome. Therefore, we excluded wgMLST (all loci) results from any further comparisons in the study. Then, we created linear regression models and phylogenetic tanglegrams using the remaining three methods. K-means analysis using the silhouette method was applied to compare the ability of the three methods to partition outbreak and sporadic isolate sequences. Our results showed that pairwise hqSNP differences had high concordance with cgMLST and wgMLST (chrom) allele differences. The slopes of the regressions for hqSNP vs. allele pairwise differences were 0.58 (cgMLST) and 0.74 [wgMLST (chrom)], and the slope of the regression was 0.77 for cgMLST vs. wgMLST (chrom) pairwise differences. Tanglegrams showed high clustering concordance between methods using two statistical measures, the Baker's gamma index (BGI) and cophenetic correlation coefficient (CCC), where 9/9 (100%) of outbreaks yielded BGI values ≥ 0.60 and CCCs were ≥ 0.97 across all nine outbreaks and all three methods. K-means analysis showed separation of outbreak and sporadic isolate groups with average silhouette widths ≥ 0.87 for outbreak groups and ≥ 0.16 for sporadic groups. This study demonstrates that Salmonella isolates clustered in concordance with epidemiologic data using three WGS-based subtyping methods and supports using cgMLST as the primary method for national surveillance of Salmonella outbreak clusters. |
Reoccurring Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks, 2016-2019
Chen JC , Patel K , Smith PA , Vidyaprakash E , Snyder C , Tagg KA , Webb HE , Schroeder MN , Katz LS , Rowe LA , Howard D , Griswold T , Lindsey RL , Carleton HA . Emerg Infect Dis 2023 29 (9) 1895-1899 ![]() Genomic characterization of an Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks dates its emergence to late 2015. One clade has notable accessory genomic content and a previously described mutation putatively associated with increased arsenic tolerance. This strain is a reoccurring, emerging, or persistent strain causing illness over an extended period. |
Predicting food sources of Listeria monocytogenes based on genomic profiling using random forest model
Gu W , Cui Z , Stroika S , Carleton HA , Conrad A , Katz LS , Richardson LC , Hunter J , Click ES , Bruce BB . Foodborne Pathog Dis 2023 20 (12) 579-586 ![]() ![]() ![]() Listeria monocytogenes can cause severe foodborne illness, including miscarriage during pregnancy or death in newborn infants. When outbreaks of L. monocytogenes illness occur, it may be possible to determine the food source of the outbreak. However, most reported L. monocytogenes illnesses do not occur as part of a recognized outbreak and most of the time the food source of sporadic L. monocytogenes illness in people cannot be determined. In the United States, L. monocytogenes isolates from patients, foods, and environments are routinely sequenced and analyzed by whole genome multilocus sequence typing (wgMLST) for outbreak detection by PulseNet, the national molecular surveillance system for foodborne illnesses. We investigated whether machine learning approaches applied to wgMLST allele call data could assist in attribution analysis of food source of L. monocytogenes isolates. We compiled isolates with a known source from five food categories (dairy, fruit, meat, seafood, and vegetable) using the metadata of L. monocytogenes isolates in PulseNet, deduplicated closely genetically related isolates, and developed random forest models to predict the food sources of isolates. Prediction accuracy of the final model varied across the food categories; it was highest for meat (65%), followed by fruit (45%), vegetable (45%), dairy (44%), and seafood (37%); overall accuracy was 49%, compared with the naive prediction accuracy of 28%. Our results show that random forest can be used to capture genetically complex features of high-resolution wgMLST for attribution of isolates to their sources. |
Mashtree: a rapid comparison of whole genome sequence files
Katz LS , Griswold T , Morrison SS , Caravas JA , Zhang S , den Bakker HC , Deng X , Carleton HA . J Open Source Softw 2019 4 (44) In the past decade, the number of publicly available bacterial genomes has increased dramatically. These genomes have been generated for impactful initiatives, especially in the field of genomic epidemiology (Brown, Dessai, McGarry, & Gerner-Smidt, 2019; Timme et al., 2017). Genomes are sequenced, shared publicly, and subsequently analyzed for phylogenetic relatedness. If two genomes of epidemiological interest are found to be related, further investigation might be prompted. However, comparing the multitudes of genomes for phylogenetic relatedness is computationally expensive and, with large numbers, laborious. Consequently, there are many strategies to reduce the complexity of the data for downstream analysis, especially using nucleotide stretches of length k (kmers). |
Evaluation of core genome and whole genome multilocus sequence typing schemes for Campylobacter jejuni and Campylobacter coli outbreak detection in the USA
Joseph LA , Griswold T , Vidyaprakash E , Im SB , Williams GM , Pouseele HA , Hise KB , Carleton HA . Microb Genom 2023 9 (5) ![]() ![]() Campylobacter is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak Campylobacter isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic Campylobacter jejuni and Campylobacter coli isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic C. jejuni and C. coli isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model R (2) and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model R (2) and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that C. jejuni and C. coli isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used. |
Cronobacter sakazakii Infections in Two Infants Linked to Powdered Infant Formula and Breast Pump Equipment - United States, 2021 and 2022.
Haston JC , Miko S , Cope JR , McKeel H , Walters C , Joseph LA , Griswold T , Katz LS , Andújar AA , Tourdot L , Rounds J , Vagnone P , Medus C , Harris J , Geist R , Neises D , Wiggington A , Smith T , Im MS , Wheeler C , Smith P , Carleton HA , Lee CC . MMWR Morb Mortal Wkly Rep 2023 72 (9) 223-226 ![]() ![]() Cronobacter sakazakii, a species of gram-negative bacteria belonging to the Enterobacteriaceae family, is known to cause severe and often fatal meningitis and sepsis in young infants. C. sakazakii is ubiquitous in the environment, and most reported infant cases have been attributed to contaminated powdered infant formula (powdered formula) or breast milk that was expressed using contaminated breast pump equipment (1-3). Previous investigations of cases and outbreaks have identified C. sakazakii in opened powdered formula, breast pump parts, environmental surfaces in the home, and, rarely, in unopened powdered formula and formula manufacturing facilities (2,4-6). This report describes two infants with C. sakazakii meningitis reported to CDC in September 2021 and February 2022. CDC used whole genome sequencing (WGS) analysis to link one case to contaminated opened powdered formula from the patient's home and the other to contaminated breast pump equipment. These cases highlight the importance of expanding awareness about C. sakazakii infections in infants, safe preparation and storage of powdered formula, proper cleaning and sanitizing of breast pump equipment, and using WGS as a tool for C. sakazakii investigations. |
Genome Sequences from a Reemergence of Vibrio cholerae in Haiti, 2022 Reveal Relatedness to Previously Circulating Strains.
Walters C , Chen J , Stroika S , Katz LS , Turnsek M , Compère V , Im MS , Gomez S , McCullough A , Landaverde C , Putney J , Caidi H , Folster J , Carleton HA , Boncy J , Lee CC . J Clin Microbiol 2023 61 (3) e0014223 ![]() ![]() After more than 3 years without a documented cholera case, the Republic of Haiti reported its first resurgent case on 30 September 2022 (1–3). As of 18 February 2023, more than 27,000 cholera cases have been hospitalized and 594 deaths confirmed from all 10 departments (4). Here, we describe Vibrio cholerae isolates first characterized by the Laboratoire National de Santé Publique (LNSP) and include both genotypic and phenotypic antimicrobial resistance profiles. Whole-genome sequencing (WGS) analysis was compared with recently circulating cholera toxin-producing V. cholerae O1 in a maximum likelihood phylogeny. |
Molecular characterization of circulating Salmonella Typhi strains in an urban informal settlement in Kenya.
Ochieng C , Chen JC , Osita MP , Katz LS , Griswold T , Omballa V , Ng'eno E , Ouma A , Wamola N , Opiyo C , Achieng L , Munywoki PK , Hendriksen RS , Freeman M , Mikoleit M , Juma B , Bigogo G , Mintz E , Verani JR , Hunsperger E , Carleton HA . PLoS Negl Trop Dis 2022 16 (8) e0010704 ![]() A high burden of Salmonella enterica subspecies enterica serovar Typhi (S. Typhi) bacteremia has been reported from urban informal settlements in sub-Saharan Africa, yet little is known about the introduction of these strains to the region. Understanding regional differences in the predominant strains of S. Typhi can provide insight into the genomic epidemiology. We genetically characterized 310 S. Typhi isolates from typhoid fever surveillance conducted over a 12-year period (2007-2019) in Kibera, an urban informal settlement in Nairobi, Kenya, to assess the circulating strains, their antimicrobial resistance attributes, and how they relate to global S. Typhi isolates. Whole genome multi-locus sequence typing (wgMLST) identified 4 clades, with up to 303 pairwise allelic differences. The identified genotypes correlated with wgMLST clades. The predominant clade contained 290 (93.5%) isolates with a median of 14 allele differences (range 0-52) and consisted entirely of genotypes 4.3.1.1 and 4.3.1.2. Resistance determinants were identified exclusively in the predominant clade. Determinants associated with resistance to aminoglycosides were observed in 245 isolates (79.0%), sulphonamide in 243 isolates (78.4%), trimethoprim in 247 isolates (79.7%), tetracycline in 224 isolates (72.3%), chloramphenicol in 247 isolates (79.6%), β-lactams in 239 isolates (77.1%) and quinolones in 62 isolates (20.0%). Multidrug resistance (MDR) determinants (defined as determinants conferring resistance to ampicillin, chloramphenicol and cotrimoxazole) were found in 235 (75.8%) isolates. The prevalence of MDR associated genes was similar throughout the study period (2007-2012: 203, 76.3% vs 2013-2019: 32, 72.7%; Fisher's Exact Test: P = 0.5478, while the proportion of isolates harboring quinolone resistance determinants increased (2007-2012: 42, 15.8% and 2013-2019: 20, 45.5%; Fisher's Exact Test: P<0.0001) following a decline in S. Typhi in Kibera. Some isolates (49, 15.8%) harbored both MDR and quinolone resistance determinants. There were no determinants associated with resistance to cephalosporins or azithromycin detected among the isolates sequenced in this study. Plasmid markers were only identified in the main clade including IncHI1A and IncHI1B(R27) in 226 (72.9%) isolates, and IncQ1 in 238 (76.8%) isolates. Molecular clock analysis of global typhoid isolates and isolates from Kibera suggests that genotype 4.3.1 has been introduced multiple times in Kibera. Several genomes from Kibera formed a clade with genomes from Kenya, Malawi, South Africa, and Tanzania. The most recent common ancestor (MRCA) for these isolates was from around 1997. Another isolate from Kibera grouped with several isolates from Uganda, sharing a common ancestor from around 2009. In summary, S. Typhi in Kibera belong to four wgMLST clades one of which is frequently associated with MDR genes and this poses a challenge in treatment and control. |
The power, potential, benefits, and challenges of implementing high-throughput sequencing in food safety systems.
Imanian B , Donaghy J , Jackson T , Gummalla S , Ganesan B , Baker RC , Henderson M , Butler EK , Hong Y , Ring B , Thorp C , Khaksar R , Samadpour M , Lawless KA , MacLaren-Lee I , Carleton HA , Tian R , Zhang W , Wan J . NPJ Sci Food 2022 6 (1) 35 ![]() ![]() ![]() The development and application of modern sequencing technologies have led to many new improvements in food safety and public health. With unprecedented resolution and big data, high-throughput sequencing (HTS) has enabled food safety specialists to sequence marker genes, whole genomes, and transcriptomes of microorganisms almost in real-time. These data reveal not only the identity of a pathogen or an organism of interest in the food supply but its virulence potential and functional characteristics. HTS of amplicons, allow better characterization of the microbial communities associated with food and the environment. New and powerful bioinformatics tools, algorithms, and machine learning allow for development of new models to predict and tackle important events such as foodborne disease outbreaks. Despite its potential, the integration of HTS into current food safety systems is far from complete. Government agencies have embraced this new technology, and use it for disease diagnostics, food safety inspections, and outbreak investigations. However, adoption and application of HTS by the food industry have been comparatively slow, sporadic, and fragmented. Incorporation of HTS by food manufacturers in their food safety programs could reinforce the design and verification of effectiveness of control measures by providing greater insight into the characteristics, origin, relatedness, and evolution of microorganisms in our foods and environment. Here, we discuss this new technology, its power, and potential. A brief history of implementation by public health agencies is presented, as are the benefits and challenges for the food industry, and its future in the context of food safety. |
The Use of Whole-Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States.
Stevens EL , Carleton HA , Beal J , Tillman GE , Lindsey RL , Lauer AC , Pightling A , Jarvis KG , Ottesen A , Ramachandran P , Hintz L , Katz LS , Folster JP , Whichard JM , Trees E , Timme RE , McDermott P , Wolpert B , Bazaco M , Zhao S , Lindley S , Bruce BB , Griffin PM , Brown E , Allard M , Tallent S , Irvin K , Hoffmann M , Wise M , Tauxe R , Gerner-Smidt P , Simmons M , Kissler B , Defibaugh-Chavez S , Klimke W , Agarwala R , Lindsay J , Cook K , Austerman SR , Goldman D , McGarry S , Hale KR , Dessai U , Musser SM , Braden C . J Food Prot 2022 85 (5) 755-772 ![]() ![]() This multi-agency report developed under the Interagency Collaboration for Genomics for Food and Feed Safety (Gen-FS) provides an overview of the use of and transition to Whole-Genome Sequencing (WGS) technology to detect and characterize pathogens transmitted commonly by food and identify their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among federal agencies, including the National Institutes of Health (NIH); the Department of Health and Human Services' Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA); and the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS), Agricultural Research Service (ARS), and Animal and Plant Health Inspection Service (APHIS). We describe single nucleotide polymorphism (SNP), core-genome (cg) and whole-genome multi-locus sequence typing (wgMLST) data analysis methods as used in CDC's PulseNet and FDA's GenomeTrakr networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Gen-FS agency partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles), source attribution efforts, and increasing transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information (NCBI). Finally, we highlight the impact of current trends in the use of culture-independent diagnostics tests (CIDT) for human diagnostic testing on analytical approaches related to food safety. Lastly, we highlight what is next for WGS in food safety. |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.
Wagner DD , Carleton HA , Trees E , Katz LS . PeerJ 2021 9 e12446 ![]() ![]() Background. Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods. Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results. Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions. PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies. © 2021 PeerJ Inc.. All rights reserved. |
Genome-Enabled Molecular Subtyping and Serotyping for Shiga Toxin-Producing Escherichia coli
Im SB , Gupta S , Jain M , Chande AT , Carleton HA , Jordan IK , Rishishwar L . Front Sustain Food Syst 2021 5 ![]() Foodborne pathogens are a major public health burden in the United States, leading to 9.4 million illnesses annually. Since 1996, a national laboratory-based surveillance program, PulseNet, has used molecular subtyping and serotyping methods with the aim to reduce the burden of foodborne illness through early detection of emerging outbreaks. PulseNet affiliated laboratories have used pulsed-field gel electrophoresis (PFGE) and immunoassays to subtype and serotype bacterial isolates. Widespread use of serotyping and PFGE for foodborne illness surveillance over the years has resulted in the accumulation of a wealth of routine surveillance and outbreak epidemiological data. This valuable source of data has been used to understand seasonal frequency, geographic distribution, demographic information, exposure information, disease severity, and source of foodborne isolates. In 2019, PulseNet adopted whole genome sequencing (WGS) at a national scale to replace PFGE with higher-resolution methods such as the core genome multilocus sequence typing. Consequently, PulseNet's recent shift to genome-based subtyping methods has rendered the vast collection of historic surveillance data associated with serogroups and PFGE patterns potentially unusable. The goal of this study was to develop a bioinformatics method to associate the WGS data that are currently used by PulseNet for bacterial pathogen subtyping to previously characterized serogroup and PFGE patterns. Previous efforts to associate WGS to PFGE patterns relied on predicting DNA molecular weight based on restriction site analysis. However, these approaches failed owing to the non-uniform usage of genomic restriction sites by PFGE restriction enzymes. We developed a machine learning approach to classify isolates to their most probable serogroup and PFGE pattern, based on comparisons of genomic k-mer signatures. We applied our WGS classification method to 5,970 Shiga toxin-producing Escherichia coli (STEC) isolates collected as part of PulseNet's routine foodborne surveillance activities between 2003 and 2018. Our machine learning classifier is able to associate STEC WGS to higher-level serogroups with very high accuracy and lower-level PFGE patterns with somewhat lower accuracy. Taken together, these classifications support the ability of public health investigators to associate currently generated WGS data with historical epidemiological knowledge linked to serogroups and PFGE patterns in support of outbreak surveillance for food safety and public health. © Copyright © 2021 Im, Gupta, Jain, Chande, Carleton, Jordan and Rishishwar. |
A multinational listeriosis outbreak and the importance of sharing genomic data
Pettengill JB , Markell A , Conrad A , Carleton HA , Beal J , Rand H , Musser S , Brown EW , Allard MW , Huffman J , Harris S , Wise M , Locas A . Lancet Microbe 2020 1 (6) e233-e234 Our globalised food supply presents immense challenges to ensuring food safety, as shown by outbreaks of foodborne illnesses associated with imported foods.1 The speed with which such outbreaks are resolved often depends on how rapidly public health scientists communicate and disseminate actionable data. One such data source is whole-genome sequencing, which is the newest method of molecular subtyping and has superior discriminatory power compared with previous methods.2 Consequently, whole-genome sequencing has been and continues to be adopted by countries across the world as a tool to combat foodborne pathogens.3 Sequence data can be made publicly available through numerous databases (eg, the European Nucleotide Archive, the National Center for Biotechnology Information [NCBI] Sequence Read Archive, and the DNA Data Bank of Japan Sequence Read Archive). Laboratories are encouraged to share the genomes they have sequenced4 and, as new genomes are made public, isolates can be clustered into genetically similar groups to facilitate the detection of potential outbreaks and sources of contamination. |
Comparison of Molecular Subtyping and Antimicrobial Resistance Detection Methods Used in a Large Multi-State Outbreak of Extensively Drug-Resistant Campylobacter jejuni Infections Linked to Pet Store Puppies.
Joseph LA , Francois Watkins LK , Chen J , Tagg KA , Bennett C , Caidi H , Folster JP , Laughlin ME , Koski L , Silver R , Stevenson L , Robertson S , Pruckler J , Nichols M , Pouseele H , Carleton HA , Basler C , Friedman CR , Geissler A , Hise KB , Aubert RD . J Clin Microbiol 2020 58 (10) ![]() Campylobacter jejuni is a leading cause of enteric bacterial illness in the United States. Traditional molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE) and 7-gene multilocus sequencing typing (MLST), provided limited resolution to adequately identify C. jejuni outbreaks and separate out sporadic isolates during outbreak investigations. Whole genome sequencing (WGS) has emerged as a powerful tool for C. jejuni outbreak detection. In this investigation, 45 human and 11 puppy isolates obtained during a 2016-2018 outbreak linked to pet store puppies were sequenced. Core genome multilocus sequence typing (cgMLST) and high-quality single nucleotide polymorphism (hqSNP) analysis of the sequence data separated the isolates into the same two clades containing minor within clade differences; however, cgMLST analysis does not require selection of an appropriate reference genome making this method preferable to hqSNP analysis for Campylobacter surveillance and cluster detection. The isolates were classified as ST2109-a rarely seen MLST sequence type. PFGE was performed on 38 human and 10 puppy isolates; PFGE patterns did not reliably predict clustering by cgMLST analysis. Genetic detection of antimicrobial resistance determinants predicted that all outbreak-associated isolates would be resistant to six drug classes. Traditional antimicrobial susceptibility testing (AST) confirmed a high correlation between genotypic and phenotypic antimicrobial resistance determinations. WGS analysis linked C. jejuni isolates in humans and pet store puppies even when canine exposure information was unknown, aiding the epidemiological investigation during this outbreak. WGS data were also used to quickly identify the highly drug-resistant profile of these outbreak-associated C. jejuni isolates. |
Pathogen Genomics in Public Health.
Armstrong GL , MacCannell DR , Taylor J , Carleton HA , Neuhaus EB , Bradbury RS , Posey JE , Gwinn M . N Engl J Med 2019 381 (26) 2569-2580 ![]() ![]() Rapid advances in DNA sequencing technology ("next-generation sequencing") have inspired optimism about the potential of human genomics for "precision medicine." Meanwhile, pathogen genomics is already delivering "precision public health" through more effective investigations of outbreaks of foodborne illnesses, better-targeted tuberculosis control, and more timely and granular influenza surveillance to inform the selection of vaccine strains. In this article, we describe how public health agencies have been adopting pathogen genomics to improve their effectiveness in almost all domains of infectious disease. This momentum is likely to continue, given the ongoing development in sequencing and sequencing-related technologies. |
Implications of Mobile Genetic Elements for Salmonella enterica Single-Nucleotide Polymorphism Subtyping and Source Tracking Investigations.
Li S , Zhang S , Baert L , Jagadeesan B , Ngom-Bru C , Griswold T , Katz LS , Carleton HA , Deng X . Appl Environ Microbiol 2019 85 (24) ![]() Single-nucleotide polymorphisms (SNPs) are widely used for whole-genome sequencing (WGS)-based subtyping of foodborne pathogens in outbreak and source tracking investigations. Mobile genetic elements (MGEs) are commonly present in bacterial genomes and may affect SNP subtyping results if their evolutionary history and dynamics differ from that of the bacterial chromosomes. Using Salmonella enterica as a model organism, we surveyed major categories of MGEs, including plasmids, phages, insertion sequences, integrons, and integrative and conjugative elements (ICEs), in 990 genomes representing 21 major serotypes of S. enterica We evaluated whether plasmids and chromosomal MGEs affect SNP subtyping with 9 outbreak clusters of different serotypes found in the United States in 2018. The median total length of chromosomal MGEs accounted for 2.5% of a typical S. enterica chromosome. Of the 990 analyzed S. enterica isolates, 68.9% contained at least one assembled plasmid sequence. The median total length of assembled plasmids in these isolates was 93,671 bp. Plasmids that carry high densities of SNPs were found to substantially affect both SNP phylogenies and SNP distances among closely related isolates if they were present in the reference genome for SNP subtyping. In comparison, chromosomal MGEs were found to have limited impact on SNP subtyping. We recommend the identification of plasmid sequences in the reference genome and the exclusion of plasmid-borne SNPs from SNP subtyping analysis.IMPORTANCE Despite increasingly routine use of WGS and SNP subtyping in outbreak and source tracking investigations, whether and how MGEs affect SNP subtyping has not been thoroughly investigated. Besides chromosomal MGEs, plasmids are frequently entangled in draft genome assemblies and yet to be assessed for their impact on SNP subtyping. This study provides evidence-based guidance on the treatment of MGEs in SNP analysis for Salmonella to infer phylogenetic relationship and SNP distance between isolates. |
PulseNet and the Changing Paradigm of Laboratory-Based Surveillance for Foodborne Diseases.
Kubota KA , Wolfgang WJ , Baker DJ , Boxrud D , Turner L , Trees E , Carleton HA , Gerner-Smidt P . Public Health Rep 2019 134 22s-28s ![]() ![]() PulseNet, the National Molecular Subtyping Network for Foodborne Disease Surveillance, was established in 1996 through a collaboration with the Centers for Disease Control and Prevention; the US Department of Agriculture, Food Safety and Inspection Service; the US Food and Drug Administration; 4 state public health laboratories; and the Association of Public Health Laboratories. The network has since expanded to include 83 state, local, and food regulatory public health laboratories. In 2016, PulseNet was estimated to be helping prevent an estimated 270 000 foodborne illnesses annually. PulseNet is undergoing a transformation toward whole-genome sequencing (WGS), which provides better discriminatory power and precision than pulsed-field gel electrophoresis (PFGE). WGS improves the detection of outbreak clusters and could replace many traditional reference identification and characterization methods. This article highlights the contributions made by public health laboratories in transforming PulseNet's surveillance and describes how the transformation is changing local and national surveillance practices. Our data show that WGS is better at identifying clusters than PFGE, especially for clonal organisms such as Salmonella Enteritidis. The need to develop prioritization schemes for cluster follow-up and additional resources for both public health laboratory and epidemiology departments will be critical as PulseNet implements WGS for foodborne disease surveillance in the United States. |
Interpretation of Whole-Genome Sequencing for Enteric Disease Surveillance and Outbreak Investigation.
Besser JM , Carleton HA , Trees E , Stroika SG , Hise K , Wise M , Gerner-Smidt P . Foodborne Pathog Dis 2019 16 (7) 504-512 ![]() ![]() The routine use of whole-genome sequencing (WGS) as part of enteric disease surveillance is substantially enhancing our ability to detect and investigate outbreaks and to monitor disease trends. At the same time, it is revealing as never before the vast complexity of microbial and human interactions that contribute to outbreak ecology. Since WGS analysis is primarily used to characterize and compare microbial genomes with the goal of addressing epidemiological questions, it must be interpreted in an epidemiological context. In this article, we identify common challenges and pitfalls encountered when interpreting sequence data in an enteric disease surveillance and investigation context, and explain how to address them. |
Metagenomic Approaches for Public Health Surveillance of Foodborne Infections: Opportunities and Challenges.
Carleton HA , Besser J , Williams-Newkirk AJ , Huang A , Trees E , Gerner-Smidt P . Foodborne Pathog Dis 2019 16 (7) 474-479 ![]() ![]() Foodborne disease surveillance in the United States is at a critical point. Clinical and diagnostic laboratories are using culture-independent diagnostic tests (CIDTs) to identify the pathogen causing foodborne illness from patient specimens. CIDTs are molecular tests that allow doctors to rapidly identify the bacteria causing illness within hours. CIDTs, unlike previous gold standard methods such as bacterial culture, do not produce an isolate that can be subtyped as part of the national molecular subtyping network for foodborne disease surveillance, PulseNet. Without subtype information, cases can no longer be linked using molecular data to identify potentially related cases that are part of an outbreak. In this review, we discuss the public health needs for a molecular subtyping approach directly from patient specimen and highlight different approaches, including amplicon and shotgun metagenomic sequencing. |
Multistate outbreak of Salmonella Paratyphi B variant L(+) tartrate(+) and Salmonella Weltevreden infections linked to imported frozen raw tuna: USA, March-July 2015.
Hassan R , Tecle S , Adcock B , Kellis M , Weiss J , Saupe A , Sorenson A , Klos R , Blankenship J , Blessington T , Whitlock L , Carleton HA , Concepcion-Acevedo J , Tolar B , Wise M , Neil KP . Epidemiol Infect 2018 146 (11) 1-7 ![]() ![]() Foodborne non-typhoidal salmonellosis causes approximately 1 million illnesses annually in the USA. In April 2015, we investigated a multistate outbreak of 65 Salmonella Paratyphi B variant L(+) tartrate(+) infections associated with frozen raw tuna imported from Indonesia, which was consumed raw in sushi. Forty-six (92%) of 50 case-patients interviewed ate sushi during the week before illness onset, and 44 (98%) of 45 who specified ate sushi containing raw tuna. Two outbreak strains were isolated from the samples of frozen raw tuna. Traceback identified a single importer as a common source of tuna consumed by case-patients; this importer issued three voluntary recalls of tuna sourced from one Indonesian processor. Four Salmonella Weltevreden infections were also linked to this outbreak. Whole-genome sequencing was useful in establishing a link between Salmonella isolated from ill people and tuna. This outbreak highlights the continuing foodborne illness risk associated with raw seafood consumption, the importance of processing seafood in a manner that minimises contamination with pathogenic microorganisms and the continuing need to ensure imported foods are safe to eat. People at higher risk for foodborne illness should not consume undercooked animal products, such as raw seafood. |
An Assessment of Different Genomic Approaches for Inferring Phylogeny of Listeria monocytogenes.
Henri C , Leekitcharoenphon P , Carleton HA , Radomski N , Kaas RS , Mariet JF , Felten A , Aarestrup FM , Gerner Smidt P , Roussel S , Guillier L , Mistou MY , Hendriksen RS . Front Microbiol 2017 8 2351 ![]() Background/objectives: Whole genome sequencing (WGS) has proven to be a powerful subtyping tool for foodborne pathogenic bacteria like L. monocytogenes. The interests of genome-scale analysis for national surveillance, outbreak detection or source tracking has been largely documented. The genomic data however can be exploited with many different bioinformatics methods like single nucleotide polymorphism (SNP), core-genome multi locus sequence typing (cgMLST), whole-genome multi locus sequence typing (wgMLST) or multi locus predicted protein sequence typing (MLPPST) on either core-genome (cgMLPPST) or pan-genome (wgMLPPST). Currently, there are little comparisons studies of these different analytical approaches. Our objective was to assess and compare different genomic methods that can be implemented in order to cluster isolates of L. monocytogenes. Methods: The clustering methods were evaluated on a collection of 207 L. monocytogenes genomes of food origin representative of the genetic diversity of the Anses collection. The trees were then compared using robust statistical analyses. Results: The backward comparability between conventional typing methods and genomic methods revealed a near-perfect concordance. The importance of selecting a proper reference when calling SNPs was highlighted, although distances between strains remained identical. The analysis also revealed that the topology of the phylogenetic trees between wgMLST and cgMLST were remarkably similar. The comparison between SNP and cgMLST or SNP and wgMLST approaches showed that the topologies of phylogenic trees were statistically similar with an almost equivalent clustering. Conclusion: Our study revealed high concordance between wgMLST, cgMLST, and SNP approaches which are all suitable for typing of L. monocytogenes. The comparable clustering is an important observation considering that the two approaches have been variously implemented among reference laboratories. |
Unusually high illness severity and short incubation periods in two foodborne outbreaks of Salmonella Heidelberg infections with potential coincident Staphylococcus aureus intoxication
Nakao JH , Talkington D , Bopp CA , Besser J , Sanchez ML , Guarisco J , Davidson SL , Warner C , Mc Intyre Mg , Group JP , Comstock N , Xavier K , Pinsent TS , Brown J , Douglas JM , Gomez GA , Garrett NM , Carleton HA , Tolar B , Wise ME . Epidemiol Infect 2017 146 (1) 1-9 We describe the investigation of two temporally coincident illness clusters involving salmonella and Staphylococcus aureus in two states. Cases were defined as gastrointestinal illness following two meal events. Investigators interviewed ill persons. Stool, food and environmental samples underwent pathogen testing. Alabama: Eighty cases were identified. Median time from meal to illness was 5.8 h. Salmonella Heidelberg was identified from 27 of 28 stool specimens tested, and coagulase-positive S. aureus was isolated from three of 16 ill persons. Environmental investigation indicated that food handling deficiencies occurred. Colorado: Seven cases were identified. Median time from meal to illness was 4.5 h. Five persons were hospitalised, four of whom were admitted to the intensive care unit. Salmonella Heidelberg was identified in six of seven stool specimens and coagulase-positive S. aureus in three of six tested. No single food item was implicated in either outbreak. These two outbreaks were linked to infection with Salmonella Heidelberg, but additional factors, such as dual aetiology that included S. aureus or the dose of salmonella ingested may have contributed to the short incubation periods and high illness severity. The outbreaks underscore the importance of measures to prevent foodborne illness through appropriate washing, handling, preparation and storage of food. |
Next-Generation Sequencing Technologies and their Application to the Study and Control of Bacterial Infections.
Besser J , Carleton HA , Gerner-Smidt P , Lindsey RL , Trees E . Clin Microbiol Infect 2017 24 (4) 335-341 ![]() ![]() BACKGROUND: With the decreasing cost and efficiency of next generation sequencing, the technology is rapidly introduced into clinical and public health laboratory practice. AIMS: In this review, the historical background and principles of first, second and third generation sequencing are described as are the characteristics of the most commonly used sequencing instruments. SOURCES: Peer reviewed literature, white papers and meeting reports. CONTENT & IMPLICATIONS: Next generation sequencing is a technology that potentially could replace many traditional microbiological workflows, providing clinicians and public health specialists with more actionable information than hitherto achievable. Examples of the clinical and public health uses of the technology are provided. The challenge of comparability of different sequencing platforms is discussed. Finally, the future directions of the technology integrating it with laboratory management and public health surveillance systems, and moving it towards performing sequencing directly from the clinical specimen (metagenomics) could lead to yet another fundamental transformation of clinical diagnostics and public health surveillance. |
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.
Timme RE , Rand H , Shumway M , Trees EK , Simmons M , Agarwala R , Davis S , Tillman GE , Defibaugh-Chavez S , Carleton HA , Klimke WA , Katz LS . PeerJ 2017 2017 (10) e3893 ![]() Background. As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (referencebased SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and ``known'' phylogenetic trees in publiclyaccessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results. Our ``outbreak'' benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the ``known tree'' can be accurately called the ``true tree''. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. |
Comparison of classical multi-locus sequence typing software for next-generation sequencing data
Page AJ , Alikhan NF , Carleton HA , Seemann T , Keane JA , Katz LS . Microb Genom 2017 3 (8) e000124 Multi-locus sequence typing (MLST) is a widely used method for categorizing bacteria. Increasingly, MLST is being performed using next-generation sequencing (NGS) data by reference laboratories and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared eight of these applications against real and simulated data, and present results on: (1) the accuracy of each method against traditional typing methods, (2) the performance on real outbreak datasets, (3) the impact of contamination and varying depth of coverage, and (4) the computational resource requirements. |
A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens.
Katz LS , Griswold T , Williams-Newkirk AJ , Wagner D , Petkau A , Sieffert C , Van Domselaar G , Deng X , Carleton HA . Front Microbiol 2017 8 375 ![]() Modern epidemiology of foodborne bacterial pathogens in industrialized countries relies increasingly on whole genome sequencing (WGS) techniques. As opposed to profiling techniques such as pulsed-field gel electrophoresis, WGS requires a variety of computational methods. Since 2013, United States agencies responsible for food safety including the CDC, FDA, and USDA, have been performing whole-genome sequencing (WGS) on all Listeria monocytogenes found in clinical, food, and environmental samples. Each year, more genomes of other foodborne pathogens such as Escherichia coli, Campylobacter jejuni, and Salmonella enterica are being sequenced. Comparing thousands of genomes across an entire species requires a fast method with coarse resolution; however, capturing the fine details of highly related isolates requires a computationally heavy and sophisticated algorithm. Most L. monocytogenes investigations employing WGS depend on being able to identify an outbreak clade whose inter-genomic distances are less than an empirically determined threshold. When the difference between a few single nucleotide polymorphisms (SNPs) can help distinguish between genomes that are likely outbreak-associated and those that are less likely to be associated, we require a fine-resolution method. To achieve this level of resolution, we have developed Lyve-SET, a high-quality SNP pipeline. We evaluated Lyve-SET by retrospectively investigating 12 outbreak data sets along with four other SNP pipelines that have been used in outbreak investigation or similar scenarios. To compare these pipelines, several distance and phylogeny-based comparison methods were applied, which collectively showed that multiple pipelines were able to identify most outbreak clusters and strains. Currently in the US PulseNet system, whole genome multi-locus sequence typing (wgMLST) is the preferred primary method for foodborne WGS cluster detection and outbreak investigation due to its ability to name standardized genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Mar 17, 2025
- Content source:
- Powered by CDC PHGKB Infrastructure