Last data update: Nov 04, 2024. (Total: 48056 publications since 2009)
Records 1-30 (of 42 Records) |
Query Trace: Katz LS[original query] |
---|
Fasten: a toolkit for streaming operations on FASTQ files
Katz LS , Phan J , den Bakker HC . J Open Source Softw 2024 9 (94) 6030 |
Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics
Lindsey RL , Gladney LM , Huang AD , Griswold T , Katz LS , Dinsmore BA , Im MS , Kucerova Z , Smith PA , Lane C , Carleton HA . Front Microbiol 2023 14 1225207 Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods. |
Reoccurring Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks, 2016-2019
Chen JC , Patel K , Smith PA , Vidyaprakash E , Snyder C , Tagg KA , Webb HE , Schroeder MN , Katz LS , Rowe LA , Howard D , Griswold T , Lindsey RL , Carleton HA . Emerg Infect Dis 2023 29 (9) 1895-1899 Genomic characterization of an Escherichia coli O157:H7 strain linked to leafy greens-associated outbreaks dates its emergence to late 2015. One clade has notable accessory genomic content and a previously described mutation putatively associated with increased arsenic tolerance. This strain is a reoccurring, emerging, or persistent strain causing illness over an extended period. |
Predicting food sources of Listeria monocytogenes based on genomic profiling using random forest model
Gu W , Cui Z , Stroika S , Carleton HA , Conrad A , Katz LS , Richardson LC , Hunter J , Click ES , Bruce BB . Foodborne Pathog Dis 2023 20 (12) 579-586 Listeria monocytogenes can cause severe foodborne illness, including miscarriage during pregnancy or death in newborn infants. When outbreaks of L. monocytogenes illness occur, it may be possible to determine the food source of the outbreak. However, most reported L. monocytogenes illnesses do not occur as part of a recognized outbreak and most of the time the food source of sporadic L. monocytogenes illness in people cannot be determined. In the United States, L. monocytogenes isolates from patients, foods, and environments are routinely sequenced and analyzed by whole genome multilocus sequence typing (wgMLST) for outbreak detection by PulseNet, the national molecular surveillance system for foodborne illnesses. We investigated whether machine learning approaches applied to wgMLST allele call data could assist in attribution analysis of food source of L. monocytogenes isolates. We compiled isolates with a known source from five food categories (dairy, fruit, meat, seafood, and vegetable) using the metadata of L. monocytogenes isolates in PulseNet, deduplicated closely genetically related isolates, and developed random forest models to predict the food sources of isolates. Prediction accuracy of the final model varied across the food categories; it was highest for meat (65%), followed by fruit (45%), vegetable (45%), dairy (44%), and seafood (37%); overall accuracy was 49%, compared with the naive prediction accuracy of 28%. Our results show that random forest can be used to capture genetically complex features of high-resolution wgMLST for attribution of isolates to their sources. |
Characterization of a nonagglutinating toxigenic vibrio cholerae isolate
Gladney LM , Griswold T , Turnsek M , Im MS , Parsons MMB , Katz LS , Tarr CL , Lee CC . Microbiol Spectr 2023 11 (3) e0018223 Toxigenic Vibrio cholerae serogroup O1 is the etiologic agent of the disease cholera, and strains of this serogroup are responsible for pandemics. A few other serogroups have been found to carry cholera toxin genes-most notably, O139, O75, and O141-and public health surveillance in the United States is focused on these four serogroups. A toxigenic isolate was recovered from a case of vibriosis from Texas in 2008. This isolate did not agglutinate with any of the four different serogroups' antisera (O1, O139, O75, or O141) routinely used in phenotypic testing and did not display a rough phenotype. We investigated several hypotheses that might explain the recovery of this potential nonagglutinating (NAG) strain using whole-genome sequencing analysis and phylogenetic methods. The NAG strain formed a monophyletic cluster with O141 strains in a whole-genome phylogeny. Furthermore, a phylogeny of ctxAB and tcpA sequences revealed that the sequences from the NAG strain also formed a monophyletic cluster with toxigenic U.S. Gulf Coast (USGC) strains (O1, O75, and O141) that were recovered from vibriosis cases associated with exposures to Gulf Coast waters. A comparison of the NAG whole-genome sequence showed that the O-antigen-determining region of the NAG strain was closely related to those of O141 strains, and specific mutations were likely responsible for the inability to agglutinate. This work shows the utility of whole-genome sequence analysis tools for characterization of an atypical clinical isolate of V. cholerae originating from a USGC state. IMPORTANCE Clinical cases of vibriosis are on the rise due to climate events and ocean warming (1, 2), and increased surveillance of toxigenic Vibrio cholerae strains is now more crucial than ever. While traditional phenotyping using antisera against O1 and O139 is useful for monitoring currently circulating strains with pandemic or epidemic potential, reagents are limited for non-O1/non-O139 strains. With the increased use of next-generation sequencing technologies, analysis of less well-characterized strains and O-antigen regions is possible. The framework for advanced molecular analysis of O-antigen-determining regions presented herein will be useful in the absence of reagents for serotyping. Furthermore, molecular analyses based on whole-genome sequence data and using phylogenetic methods will help characterize both historical and novel strains of clinical importance. Closely monitoring emerging mutations and trends will improve our understanding of the epidemic potential of Vibrio cholerae to anticipate and rapidly respond to future public health emergencies. |
SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data.
Griswold T , Kapsak C , Chen JC , den Bakker HC , Williams G , Kelley A , Vidyaprakash E , Katz LS . J Open Source Softw 2021 6 (60) Laboratories that run Whole Genome Sequencing (WGS) produce a tremendous amount of data, up to 10 gigabytes for some common instruments. There is a need to standardize the quality assurance and quality control process (QA/QC). Therefore we have created SneakerNet to automate the QA/QC for WGS. |
Mashtree: a rapid comparison of whole genome sequence files
Katz LS , Griswold T , Morrison SS , Caravas JA , Zhang S , den Bakker HC , Deng X , Carleton HA . J Open Source Softw 2019 4 (44) In the past decade, the number of publicly available bacterial genomes has increased dramatically. These genomes have been generated for impactful initiatives, especially in the field of genomic epidemiology (Brown, Dessai, McGarry, & Gerner-Smidt, 2019; Timme et al., 2017). Genomes are sequenced, shared publicly, and subsequently analyzed for phylogenetic relatedness. If two genomes of epidemiological interest are found to be related, further investigation might be prompted. However, comparing the multitudes of genomes for phylogenetic relatedness is computationally expensive and, with large numbers, laborious. Consequently, there are many strategies to reduce the complexity of the data for downstream analysis, especially using nucleotide stretches of length k (kmers). |
Cronobacter sakazakii Infections in Two Infants Linked to Powdered Infant Formula and Breast Pump Equipment - United States, 2021 and 2022.
Haston JC , Miko S , Cope JR , McKeel H , Walters C , Joseph LA , Griswold T , Katz LS , Andújar AA , Tourdot L , Rounds J , Vagnone P , Medus C , Harris J , Geist R , Neises D , Wiggington A , Smith T , Im MS , Wheeler C , Smith P , Carleton HA , Lee CC . MMWR Morb Mortal Wkly Rep 2023 72 (9) 223-226 Cronobacter sakazakii, a species of gram-negative bacteria belonging to the Enterobacteriaceae family, is known to cause severe and often fatal meningitis and sepsis in young infants. C. sakazakii is ubiquitous in the environment, and most reported infant cases have been attributed to contaminated powdered infant formula (powdered formula) or breast milk that was expressed using contaminated breast pump equipment (1-3). Previous investigations of cases and outbreaks have identified C. sakazakii in opened powdered formula, breast pump parts, environmental surfaces in the home, and, rarely, in unopened powdered formula and formula manufacturing facilities (2,4-6). This report describes two infants with C. sakazakii meningitis reported to CDC in September 2021 and February 2022. CDC used whole genome sequencing (WGS) analysis to link one case to contaminated opened powdered formula from the patient's home and the other to contaminated breast pump equipment. These cases highlight the importance of expanding awareness about C. sakazakii infections in infants, safe preparation and storage of powdered formula, proper cleaning and sanitizing of breast pump equipment, and using WGS as a tool for C. sakazakii investigations. |
Genome Sequences from a Reemergence of Vibrio cholerae in Haiti, 2022 Reveal Relatedness to Previously Circulating Strains.
Walters C , Chen J , Stroika S , Katz LS , Turnsek M , Compère V , Im MS , Gomez S , McCullough A , Landaverde C , Putney J , Caidi H , Folster J , Carleton HA , Boncy J , Lee CC . J Clin Microbiol 2023 61 (3) e0014223 After more than 3 years without a documented cholera case, the Republic of Haiti reported its first resurgent case on 30 September 2022 (1–3). As of 18 February 2023, more than 27,000 cholera cases have been hospitalized and 594 deaths confirmed from all 10 departments (4). Here, we describe Vibrio cholerae isolates first characterized by the Laboratoire National de Santé Publique (LNSP) and include both genotypic and phenotypic antimicrobial resistance profiles. Whole-genome sequencing (WGS) analysis was compared with recently circulating cholera toxin-producing V. cholerae O1 in a maximum likelihood phylogeny. |
Genome Sequences of Hemolytic and Nonhemolytic Listeria innocua Strains from Human, Food, and Environmental Sources.
McIntosh T , Kucerova Z , Katz LS , Lilley CM , Rowe LA , Unoarumhi Y , Batra D , Burnett E , Smikle M , Lee C . Microbiol Resour Announc 2022 11 (12) e0072322 This report describes genome sequences for nine Listeria innocua strains that varied in hemolytic phenotypes on sheep blood agar. All strains were sequenced using Pacific Biosciences (PacBio) single-molecule real-time (SMRT) chemistry; overall, the average read length of these sequences was 2,869,880 bp, with an average GC content of 37%. |
Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.
Xiaoli L , Hagey JV , Park DJ , Gulvik CA , Young EL , Alikhan NF , Lawsin A , Hassell N , Knipe K , Oakeson KF , Retchless AC , Shakya M , Lo CC , Chain P , Page AJ , Metcalf BJ , Su M , Rowell J , Vidyaprakash E , Paden CR , Huang AD , Roellig D , Patel K , Winglee K , Weigand MR , Katz LS . PeerJ 2022 10 e13821 BACKGROUND: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. METHODS: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. RESULTS: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. DISCUSSION: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines. |
Molecular characterization of circulating Salmonella Typhi strains in an urban informal settlement in Kenya.
Ochieng C , Chen JC , Osita MP , Katz LS , Griswold T , Omballa V , Ng'eno E , Ouma A , Wamola N , Opiyo C , Achieng L , Munywoki PK , Hendriksen RS , Freeman M , Mikoleit M , Juma B , Bigogo G , Mintz E , Verani JR , Hunsperger E , Carleton HA . PLoS Negl Trop Dis 2022 16 (8) e0010704 A high burden of Salmonella enterica subspecies enterica serovar Typhi (S. Typhi) bacteremia has been reported from urban informal settlements in sub-Saharan Africa, yet little is known about the introduction of these strains to the region. Understanding regional differences in the predominant strains of S. Typhi can provide insight into the genomic epidemiology. We genetically characterized 310 S. Typhi isolates from typhoid fever surveillance conducted over a 12-year period (2007-2019) in Kibera, an urban informal settlement in Nairobi, Kenya, to assess the circulating strains, their antimicrobial resistance attributes, and how they relate to global S. Typhi isolates. Whole genome multi-locus sequence typing (wgMLST) identified 4 clades, with up to 303 pairwise allelic differences. The identified genotypes correlated with wgMLST clades. The predominant clade contained 290 (93.5%) isolates with a median of 14 allele differences (range 0-52) and consisted entirely of genotypes 4.3.1.1 and 4.3.1.2. Resistance determinants were identified exclusively in the predominant clade. Determinants associated with resistance to aminoglycosides were observed in 245 isolates (79.0%), sulphonamide in 243 isolates (78.4%), trimethoprim in 247 isolates (79.7%), tetracycline in 224 isolates (72.3%), chloramphenicol in 247 isolates (79.6%), β-lactams in 239 isolates (77.1%) and quinolones in 62 isolates (20.0%). Multidrug resistance (MDR) determinants (defined as determinants conferring resistance to ampicillin, chloramphenicol and cotrimoxazole) were found in 235 (75.8%) isolates. The prevalence of MDR associated genes was similar throughout the study period (2007-2012: 203, 76.3% vs 2013-2019: 32, 72.7%; Fisher's Exact Test: P = 0.5478, while the proportion of isolates harboring quinolone resistance determinants increased (2007-2012: 42, 15.8% and 2013-2019: 20, 45.5%; Fisher's Exact Test: P<0.0001) following a decline in S. Typhi in Kibera. Some isolates (49, 15.8%) harbored both MDR and quinolone resistance determinants. There were no determinants associated with resistance to cephalosporins or azithromycin detected among the isolates sequenced in this study. Plasmid markers were only identified in the main clade including IncHI1A and IncHI1B(R27) in 226 (72.9%) isolates, and IncQ1 in 238 (76.8%) isolates. Molecular clock analysis of global typhoid isolates and isolates from Kibera suggests that genotype 4.3.1 has been introduced multiple times in Kibera. Several genomes from Kibera formed a clade with genomes from Kenya, Malawi, South Africa, and Tanzania. The most recent common ancestor (MRCA) for these isolates was from around 1997. Another isolate from Kibera grouped with several isolates from Uganda, sharing a common ancestor from around 2009. In summary, S. Typhi in Kibera belong to four wgMLST clades one of which is frequently associated with MDR genes and this poses a challenge in treatment and control. |
The Use of Whole-Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States.
Stevens EL , Carleton HA , Beal J , Tillman GE , Lindsey RL , Lauer AC , Pightling A , Jarvis KG , Ottesen A , Ramachandran P , Hintz L , Katz LS , Folster JP , Whichard JM , Trees E , Timme RE , McDermott P , Wolpert B , Bazaco M , Zhao S , Lindley S , Bruce BB , Griffin PM , Brown E , Allard M , Tallent S , Irvin K , Hoffmann M , Wise M , Tauxe R , Gerner-Smidt P , Simmons M , Kissler B , Defibaugh-Chavez S , Klimke W , Agarwala R , Lindsay J , Cook K , Austerman SR , Goldman D , McGarry S , Hale KR , Dessai U , Musser SM , Braden C . J Food Prot 2022 85 (5) 755-772 This multi-agency report developed under the Interagency Collaboration for Genomics for Food and Feed Safety (Gen-FS) provides an overview of the use of and transition to Whole-Genome Sequencing (WGS) technology to detect and characterize pathogens transmitted commonly by food and identify their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among federal agencies, including the National Institutes of Health (NIH); the Department of Health and Human Services' Centers for Disease Control and Prevention (CDC) and the Food and Drug Administration (FDA); and the U.S. Department of Agriculture's Food Safety and Inspection Service (FSIS), Agricultural Research Service (ARS), and Animal and Plant Health Inspection Service (APHIS). We describe single nucleotide polymorphism (SNP), core-genome (cg) and whole-genome multi-locus sequence typing (wgMLST) data analysis methods as used in CDC's PulseNet and FDA's GenomeTrakr networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Gen-FS agency partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles), source attribution efforts, and increasing transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information (NCBI). Finally, we highlight the impact of current trends in the use of culture-independent diagnostics tests (CIDT) for human diagnostic testing on analytical approaches related to food safety. Lastly, we highlight what is next for WGS in food safety. |
Software testing in microbial bioinformatics: a call to action.
van der Putten BCL , Mendes CI , Talbot BM , de Korne-Elenbaas J , Mamede R , Vila-Cerqueira P , Coelho LP , Gulvik CA , Katz LS , The Asm Ngs Hackathon Participants . Microb Genom 2022 8 (3) Computational algorithms have become an essential component of research, with great efforts by the scientific community to raise standards on development and distribution of code. Despite these efforts, sustainability and reproducibility are major issues since continued validation through software testing is still not a widely adopted practice. Here, we report seven recommendations that help researchers implement software testing in microbial bioinformatics. We have developed these recommendations based on our experience from a collaborative hackathon organised prior to the American Society for Microbiology Next Generation Sequencing (ASM NGS) 2020 conference. We also present a repository hosting examples and guidelines for testing, available from https://github.com/microbinfie-hackathon2020/CSIS. |
Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package.
Griffiths EJ , Timme RE , Mendes CI , Page AJ , Alikhan NF , Fornika D , Maguire F , Campos J , Park D , Olawoye IB , Oluniyi PE , Anderson D , Christoffels A , da Silva AG , Cameron R , Dooley D , Katz LS , Black A , Karsch-Mizrachi I , Barrett T , Johnston A , Connor TR , Nicholls SM , Witney AA , Tyson GH , Tausch SH , Raphenya AR , Alcock B , Aanensen DM , Hodcroft E , Hsiao WWL , Vasconcelos ATR , MacCannell DR . Gigascience 2022 11 BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database. |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.
Wagner DD , Carleton HA , Trees E , Katz LS . PeerJ 2021 9 e12446 Background. Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods. Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results. Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions. PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies. © 2021 PeerJ Inc.. All rights reserved. |
Clinical and Laboratory Findings in Patients with Potential SARS-CoV-2 Reinfection, May-July 2020.
Lee JT , Hesse EM , Paulin HN , Datta D , Katz LS , Talwar A , Chang G , Galang RR , Harcourt JL , Tamin A , Thornburg NJ , Wong KK , Stevens V , Kim K , Tong S , Zhou B , Queen K , Drobeniuc J , Folster JM , Sexton DJ , Ramachandran S , Browne H , Iskander J , Mitruka K . Clin Infect Dis 2021 73 (12) 2217-2225 BACKGROUND: We investigated patients with potential SARS-CoV-2 reinfection in the United States during May-July 2020. METHODS: We conducted case finding for patients with potential SARS-CoV-2 reinfection through the Emerging Infections Network. Cases reported were screened for laboratory and clinical findings of potential reinfection followed by requests for medical records and laboratory specimens. Available medical records were abstracted to characterize patient demographics, comorbidities, clinical course, and laboratory test results. Submitted specimens underwent further testing, including RT-PCR, viral culture, whole genome sequencing, subgenomic RNA PCR, and testing for anti-SARS-CoV-2 total antibody. RESULTS: Among 73 potential reinfection patients with available records, 30 patients had recurrent COVID-19 symptoms explained by alternative diagnoses with concurrent SARS-CoV-2 positive RT-PCR, 24 patients remained asymptomatic after recovery but had recurrent or persistent RT-PCR, and 19 patients had recurrent COVID-19 symptoms with concurrent SARS-CoV-2 positive RT-PCR but no alternative diagnoses. These 19 patients had symptom recurrence a median of 57 days after initial symptom onset (interquartile range: 47 - 76). Six of these patients had paired specimens available for further testing, but none had laboratory findings confirming reinfections. Testing of an additional three patients with recurrent symptoms and alternative diagnoses also did not confirm reinfection. CONCLUSIONS: We did not confirm SARS-CoV-2 reinfection within 90 days of the initial infection based on the clinical and laboratory characteristics of cases in this investigation. Our findings support current CDC guidance around quarantine and testing for patients who have recovered from COVID-19. |
Sequencing and characterization of five extensively drug-resistant Salmonella enterica serotype Typhi isolates implicated in human infections from Punjab, Pakistan
Tagg KA , Amir A , Ikram A , Chen JC , Kim JY , Meservey E , Joung YJ , Halpin JL , Batra D , Leeper MM , Katz LS , Saeed A , Freeman M , Watkins LF , Salman M , Folster JP . Microbiol Resour Announc 2020 9 (13) A large outbreak of extensively drug-resistant (XDR) Salmonella enterica serotype Typhi infections is ongoing in Pakistan, predominantly in Sindh Province. Here, we report the sequencing and characterization of five XDR Salmonella Typhi isolates from the Punjab province of Pakistan that are closely related to the outbreak strain and carry the same IncY plasmid. |
Implications of Mobile Genetic Elements for Salmonella enterica Single-Nucleotide Polymorphism Subtyping and Source Tracking Investigations.
Li S , Zhang S , Baert L , Jagadeesan B , Ngom-Bru C , Griswold T , Katz LS , Carleton HA , Deng X . Appl Environ Microbiol 2019 85 (24) Single-nucleotide polymorphisms (SNPs) are widely used for whole-genome sequencing (WGS)-based subtyping of foodborne pathogens in outbreak and source tracking investigations. Mobile genetic elements (MGEs) are commonly present in bacterial genomes and may affect SNP subtyping results if their evolutionary history and dynamics differ from that of the bacterial chromosomes. Using Salmonella enterica as a model organism, we surveyed major categories of MGEs, including plasmids, phages, insertion sequences, integrons, and integrative and conjugative elements (ICEs), in 990 genomes representing 21 major serotypes of S. enterica We evaluated whether plasmids and chromosomal MGEs affect SNP subtyping with 9 outbreak clusters of different serotypes found in the United States in 2018. The median total length of chromosomal MGEs accounted for 2.5% of a typical S. enterica chromosome. Of the 990 analyzed S. enterica isolates, 68.9% contained at least one assembled plasmid sequence. The median total length of assembled plasmids in these isolates was 93,671 bp. Plasmids that carry high densities of SNPs were found to substantially affect both SNP phylogenies and SNP distances among closely related isolates if they were present in the reference genome for SNP subtyping. In comparison, chromosomal MGEs were found to have limited impact on SNP subtyping. We recommend the identification of plasmid sequences in the reference genome and the exclusion of plasmid-borne SNPs from SNP subtyping analysis.IMPORTANCE Despite increasingly routine use of WGS and SNP subtyping in outbreak and source tracking investigations, whether and how MGEs affect SNP subtyping has not been thoroughly investigated. Besides chromosomal MGEs, plasmids are frequently entangled in draft genome assemblies and yet to be assessed for their impact on SNP subtyping. This study provides evidence-based guidance on the treatment of MGEs in SNP analysis for Salmonella to infer phylogenetic relationship and SNP distance between isolates. |
Genome wide characterization of enterotoxigenic Escherichia coli serogroup O6 isolates from multiple outbreaks and sporadic infections from 1975-2016.
Pattabiraman V , Katz LS , Chen JC , McCullough AE , Trees E . PLoS One 2018 13 (12) e0208735 Enterotoxigenic Escherichia coli (ETEC) are an important cause of diarrhea globally, particularly among children under the age of five in developing countries. ETEC O6 is the most common ETEC serogroup, yet the genome wide population structure of isolates of this serogroup is yet to be determined. In this study, we have characterized 40 ETEC O6 isolates collected between 1975-2016 by whole genome sequencing (WGS) and by phenotypic antimicrobial susceptibility testing. To determine the relatedness of isolates, we evaluated two methods-whole genome high-quality single nucleotide polymorphism (whole genome-hqSNP) and core genome SNP analyses using Lyve-SET and Parsnp respectively. All isolates were tested for antimicrobial susceptibility using a panel of 14 antibiotics. ResFinder 2.1 and a custom quinolone resistance determinants workflow were used for resistance determinant detection. VirulenceFinder 1.5 was used for prediction of the virulence genes. Thirty-seven isolates clustered into three major clades (I, II, III) by whole genome-hqSNP and core genome SNP analyses, while three isolates included in the whole genome-hqSNP analysis only did not cluster with clades I-III by both analyses and formed a distantly related outgroup, designated clade IV. Median number of pairwise whole genome-hqSNPs in clonal ETEC O6 outbreaks ranged from 0 to 5. Of the 40 isolates tested for antimicrobial susceptibility, 18 isolates were pansusceptible. Twenty-two isolates were resistant to at least one antibiotic, nine of which were multidrug resistant. Phenotypic antimicrobial resistance (AR) correlated with AR determinants in 22 isolates. Thirty-two isolates harbored both enterotoxin virulence genes while the remaining 8 isolates had only one of the two virulence genes. In summary, whole genome-hqSNP and core genome SNP analyses from this study revealed similar evolutionary relationships and an overall diversity of ETEC O6 isolates independent of time of isolation. Less than 5 pairwise hqSNPs between ETEC O6 isolates is circumstantially indicative of an outbreak cluster. Findings from this study will be a basis for quicker outbreak detection and control by efficient subtyping by WGS. |
Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance.
Timme RE , Rand H , Shumway M , Trees EK , Simmons M , Agarwala R , Davis S , Tillman GE , Defibaugh-Chavez S , Carleton HA , Klimke WA , Katz LS . PeerJ 2017 2017 (10) e3893 Background. As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (referencebased SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and ``known'' phylogenetic trees in publiclyaccessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results. Our ``outbreak'' benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacter jejuni) and one simulated dataset where the ``known tree'' can be accurately called the ``true tree''. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines. |
Comparison of classical multi-locus sequence typing software for next-generation sequencing data
Page AJ , Alikhan NF , Carleton HA , Seemann T , Keane JA , Katz LS . Microb Genom 2017 3 (8) e000124 Multi-locus sequence typing (MLST) is a widely used method for categorizing bacteria. Increasingly, MLST is being performed using next-generation sequencing (NGS) data by reference laboratories and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared eight of these applications against real and simulated data, and present results on: (1) the accuracy of each method against traditional typing methods, (2) the performance on real outbreak datasets, (3) the impact of contamination and varying depth of coverage, and (4) the computational resource requirements. |
SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology.
Petkau A , Mabon P , Sieffert C , Knox NC , Cabral J , Iskander M , Weedmark K , Zaheer R , Katz LS , Nadon C , Reimer A , Taboada E , Beiko RG , Hsiao W , Brinkman F , Graham M , Van Domselaar G . Microb Genom 2017 3 (6) e000116 The recent widespread application of whole-genome sequencing (WGS) for microbial disease investigations has spurred the development of new bioinformatics tools, including a notable proliferation of phylogenomics pipelines designed for infectious disease surveillance and outbreak investigation. Transitioning the use of WGS data out of the research laboratory and into the front lines of surveillance and outbreak response requires user-friendly, reproducible and scalable pipelines that have been well validated. Single Nucleotide Variant Phylogenomics (SNVPhyl) is a bioinformatics pipeline for identifying highquality single-nucleotide variants (SNVs) and constructing a whole-genome phylogeny from a collection of WGS reads and a reference genome. Individual pipeline components are integrated into the Galaxy bioinformatics framework, enabling data analysis in a user-friendly, reproducible and scalable environment. We show that SNVPhyl can detect SNVs with high sensitivity and specificity, and identify and remove regions of high SNV density (indicative of recombination). SNVPhyl is able to correctly distinguish outbreak from non-outbreak isolates across a range of variant-calling settings, sequencing-coverage thresholds or in the presence of contamination. SNVPhyl is available as a Galaxy workflow, Docker and virtual machine images, and a Unix-based command-line application. SNVPhyl is released under the Apache 2.0 license and available at http://snvphyl.readthedocs.io/ or at https://github.com/phac-nml/snvphyl-galaxy. |
Whole genome and core genome multilocus sequence typing and single nucleotide polymorphism analyses of Listeria monocytogenes associated with an outbreak linked to cheese, United States, 2013.
Chen Y , Luo Y , Carleton H , Timme R , Melka D , Muruvanda T , Wang C , Kastanis G , Katz LS , Turner L , Fritzinger A , Moore T , Stones R , Blankenship J , Salter M , Parish M , Hammack TS , Evans PS , Tarr CL , Allard MW , Strain EA , Brown EW . Appl Environ Microbiol 2017 83 (15) Epidemiological findings of a listeriosis outbreak in 2013 implicated Hispanic-style cheese produced by Company A, and pulsed-field gel electrophoresis (PFGE) and whole genome sequencing (WGS) were performed on clinical isolates and representative isolates collected from Company A cheese and environmental samples during the investigation. The results strengthened the evidence for cheese as the vehicle. Surveillance sampling and WGS three months later revealed that the equipment purchased by Company B from Company A yielded an environmental isolate highly similar to all outbreak isolates. The whole genome and core genome multilocus sequence typing and single nucleotide polymorphism (SNP) analyses were compared to demonstrate the maximum discriminatory power obtained by using multiple analyses, which were needed to differentiate outbreak-associated isolates from a PFGE-indistinguishable isolate collected in a non-implicated food source in 2012. This unrelated isolate differed from the outbreak isolates by only 7 to 14 SNPs, and as a result, minimum spanning tree by the whole genome analyses and certain variant calling approach and phylogenetic algorithm for core genome-based analyses could not provide the differentiation between unrelated isolates. Our data also suggest that SNP/allele counts should always be combined with WGS clustering generated by phylogenetically meaningful algorithms on sufficient number of isolates, and SNP/allele threshold alone is not sufficient evidence to delineate an outbreak. The putative prophages were conserved across all the outbreak isolates. All outbreak isolates belonged to clonal complex 5 and serotype 1/2b, had an identical inlA sequence, which did not have premature stop codons.IMPORTANCE In this outbreak, multiple analytical approaches were used for maximum discriminatory power. A PFGE-matched, epidemiologically unrelated isolate had high genetic similarity to the outbreak-associated isolates, with as few as only 7 SNP differences. Therefore, the SNP/allele threshold should not be used as the only evidence to define the scope of an outbreak. It is critical that the SNP/allele counts be complemented by WGS clustering generated by phylogenetically meaningful algorithms to distinguish outbreak-associated isolates from epidemiologically unrelated isolates. Careful selection of a variant calling approach and phylogenetic algorithm is critical for core genome-based analyses. The whole genome-based analyses were able to construct the highly resolved phylogeny needed to support the findings of the outbreak investigation. Ultimately, epidemiologic evidence and multiple WGS analyses should be combined to increase the confidence in outbreak investigations. |
A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens.
Katz LS , Griswold T , Williams-Newkirk AJ , Wagner D , Petkau A , Sieffert C , Van Domselaar G , Deng X , Carleton HA . Front Microbiol 2017 8 375 Modern epidemiology of foodborne bacterial pathogens in industrialized countries relies increasingly on whole genome sequencing (WGS) techniques. As opposed to profiling techniques such as pulsed-field gel electrophoresis, WGS requires a variety of computational methods. Since 2013, United States agencies responsible for food safety including the CDC, FDA, and USDA, have been performing whole-genome sequencing (WGS) on all Listeria monocytogenes found in clinical, food, and environmental samples. Each year, more genomes of other foodborne pathogens such as Escherichia coli, Campylobacter jejuni, and Salmonella enterica are being sequenced. Comparing thousands of genomes across an entire species requires a fast method with coarse resolution; however, capturing the fine details of highly related isolates requires a computationally heavy and sophisticated algorithm. Most L. monocytogenes investigations employing WGS depend on being able to identify an outbreak clade whose inter-genomic distances are less than an empirically determined threshold. When the difference between a few single nucleotide polymorphisms (SNPs) can help distinguish between genomes that are likely outbreak-associated and those that are less likely to be associated, we require a fine-resolution method. To achieve this level of resolution, we have developed Lyve-SET, a high-quality SNP pipeline. We evaluated Lyve-SET by retrospectively investigating 12 outbreak data sets along with four other SNP pipelines that have been used in outbreak investigation or similar scenarios. To compare these pipelines, several distance and phylogeny-based comparison methods were applied, which collectively showed that multiple pipelines were able to identify most outbreak clusters and strains. Currently in the US PulseNet system, whole genome multi-locus sequence typing (wgMLST) is the preferred primary method for foodborne WGS cluster detection and outbreak investigation due to its ability to name standardized genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET. |
Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes.
Moura A , Criscuolo A , Pouseele H , Maury MM , Leclercq A , Tarr C , Bjorkman JT , Dallman T , Reimer A , Enouf V , Larsonneur E , Carleton H , Bracq-Dieye H , Katz LS , Jones L , Touchon M , Tourdjman M , Walker M , Stroika S , Cantinelli T , Chenal-Francisque V , Kucerova Z , Rocha EP , Nadon C , Grant K , Nielsen EM , Pot B , Gerner-Smidt P , Lecuit M , Brisse S . Nat Microbiol 2016 2 16185 Listeria monocytogenes (Lm) is a major human foodborne pathogen. Numerous Lm outbreaks have been reported worldwide and associated with a high case fatality rate, reinforcing the need for strongly coordinated surveillance and outbreak control. We developed a universally applicable genome-wide strain genotyping approach and investigated the population diversity of Lm using 1,696 isolates from diverse sources and geographical locations. We define, with unprecedented precision, the population structure of Lm, demonstrate the occurrence of international circulation of strains and reveal the extent of heterogeneity in virulence and stress resistance genomic features among clinical and food isolates. Using historical isolates, we show that the evolutionary rate of Lm from lineage I and lineage II is low ( approximately 2.5 x 10-7 substitutions per site per year, as inferred from the core genome) and that major sublineages (corresponding to so-called 'epidemic clones') are estimated to be at least 50-150 years old. This work demonstrates the urgent need to monitor Lm strains at the global level and provides the unified approach needed for global harmonization of Lm genome-based typing and population biology. |
Use of Whole Genome Sequencing and Patient Interviews To Link a Case of Sporadic Listeriosis to Consumption of Prepackaged Lettuce.
Jackson KA , Stroika S , Katz LS , Beal J , Brandt E , Nadon C , Reimer A , Major B , Conrad A , Tarr C , Jackson BR , Mody RK . J Food Prot 2016 79 (5) 806-809 We report on a case of listeriosis in a patient who probably consumed a prepackaged romaine lettuce-containing product recalled for Listeria monocytogenes contamination. Although definitive epidemiological information demonstrating exposure to the specific recalled product was lacking, the patient reported consumption of a prepackaged romaine lettuce-containing product of either the recalled brand or a different brand. A multinational investigation found that patient and food isolates from the recalled product were indistinguishable by pulsed-field gel electrophoresis and were highly related by whole genome sequencing, differing by four alleles by whole genome multilocus sequence typing and by five high-quality single nucleotide polymorphisms, suggesting a common source. To our knowledge, this is the first time prepackaged lettuce has been identified as a likely source for listeriosis. This investigation highlights the power of whole genome sequencing, as well as the continued need for timely and thorough epidemiological exposure data to identify sources of foodborne infections. |
Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation.
Jackson BR , Tarr C , Strain E , Jackson KA , Conrad A , Carleton H , Katz LS , Stroika S , Gould LH , Mody RK , Silk BJ , Beal J , Chen Y , Timme R , Doyle M , Fields A , Wise M , Tillman G , Defibaugh-Chavez S , Kucerova Z , Sabol A , Roache K , Trees E , Simmons M , Wasilenko J , Kubota K , Pouseele H , Klimke W , Besser J , Brown E , Allard M , Gerner-Smidt P . Clin Infect Dis 2016 63 (3) 380-6 Listeria monocytogenes(Lm) causes severe foodborne illness (listeriosis). Previous molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE), were critical in detecting outbreaks that led to food safety improvements and declining incidence, but PFGE provides limited genetic resolution. A multiagency collaboration began performing real-time, whole-genome sequencing (WGS) on all U.S.Lmisolates from patients, food, and the environment in September 2013, posting sequencing data into a public repository. Compared with the year before the project began, WGS, combined with epidemiologic and product trace-back data, detected more listeriosis clusters and solved more outbreaks (2 outbreaks in pre-WGS year, 5 in WGS year 1, and 9 in year 2). Whole-genome multilocus sequence typing and single nucleotide polymorphism analyses provided equivalent phylogenetic relationships relevant to investigations; results were most useful when interpreted in context of epidemiological data. WGS has transformed listeriosis outbreak surveillance and is being implemented for other foodborne pathogens. |
Two Listeria monocytogenes pseudo-outbreaks caused by contaminated laboratory culture media
Matanock A , Katz LS , Jackson KA , Kucerova Z , Conrad AR , Glover WA , Nguyen V , Mohr MC , Marsden-Haug N , Thompson D , Dunn JR , Stroika S , Melius B , Tarr C , Dietrich SE , Kao AS , Kornstein L , Li Z , Maroufi A , Marder EP , Meyer R , Perez-Osorio AC , Reddy V , Reporter R , Carleton H , Tweeten S , Waechter H , Yee LM , Wise ME , Davis K , Jackson B . J Clin Microbiol 2015 54 (3) 768-70 Listeriosis is a serious foodborne infection that disproportionately affects elderly adults, pregnant women, newborns, and immunocompromised individuals. Diagnosis is made by culturing Listeria monocytogenes from sterile body fluids or products of conception. This report describes investigations of two listeriosis pseudo-outbreaks caused by contaminated laboratory media made from sheep blood. |
Evolutionary Relationships of Outbreak-associated Listeria monocytogenes Strains of Serotypes 1/2a and 1/2b Determined by Whole Genome Sequencing.
Bergholz TM , den Bakker HC , Katz LS , Silk BJ , Jackson KA , Kucerova Z , Joseph LA , Turnsek M , Gladney LM , Halpin JL , Xavier K , Gossack J , Ward TJ , Frace M , Tarr CL . Appl Environ Microbiol 2015 82 (3) 928-38 We used whole genome sequencing to determine evolutionary relationships among 20 outbreak-associated clinical isolates of Listeria monocytogenes serotypes 1/2a and 1/2b. Isolates from six of eleven outbreaks fell outside of the clonal groups or 'epidemic clones' that have been previously associated with outbreaks, suggesting that epidemic potential may be widespread in L. monocytogenes and is not limited to the recognized epidemic clones. Pairwise comparisons between epidemiologically-related isolates within clonal complexes showed that genome-level variation differed by two orders of magnitude between different comparisons, and the distribution of point mutations (core versus accessory genome) also varied. In addition, genetic divergence between one closely related pair of isolates from a single outbreak was driven primarily by changes in phage regions. The evolutionary analysis showed the changes could be attributed to horizontal gene transfer; members of the diverse bacterial community found in the production facility could have served as the source of novel genetic material at some point in the production chain. The results raise the question of how to best utilize information contained within the accessory genome in outbreak investigations. The full magnitude and complexity of genetic changes revealed by genome sequencing could not be discerned from traditional subtyping methods and the results demonstrate the challenges of interpreting genetic variation among isolates recovered from a single outbreak. Epidemiological information remains critical for proper interpretation of nucleotide and structural diversity among isolates recovered during outbreaks, and will remain so until we understand more about how various population histories influence genetic variation. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Nov 04, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure