Last data update: May 16, 2025. (Total: 49299 publications since 2009)
Records 1-4 (of 4 Records) |
Query Trace: Wagner DD[original query] |
---|
Sequence-matching adapter trimmers generate consistent quality and assembly metrics for Illumina sequencing of RNA viruses
Nabakooza G , Wagner DD , Momin N , Marine RL , Weldon WC , Oberste MS . BMC Res Notes 2024 17 (1) 308 ![]() ![]() Trimming adapters and low-quality bases from next-generation sequencing (NGS) data is crucial for optimal analysis. We evaluated six trimming programs, implementing five different algorithms, for their effectiveness in trimming adapters and improving quality, contig assembly, and single-nucleotide polymorphism (SNP) quality and concordance for poliovirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and norovirus paired data sequenced on Illumina iSeq and MiSeq platforms. Trimmomatic and BBDuk effectively removed adapters from all datasets, unlike FastP, AdapterRemoval, SeqPurge, and Skewer. All trimmers improved read quality (Q ≥ 30, 87.8 - 96.1%) compared to raw reads (83.6 - 93.2%). Trimmers implementing traditional sequence-matching (Trimmomatic and AdapterRemoval) and overlapping algorithm (FastP) retained the highest-quality reads. While all trimmers improved the maximum contig length and genome coverage for iSeq and MiSeq viral assemblies, BBDuk-trimmed reads assembled the shortest contigs. SNP concordance was consistently high (> 97.7 - 100%) across trimmers. However, BBDuk-trimmed reads had the lowest quality SNPs. Overall, the two adapter trimmers that utilized the traditional sequence-matching algorithm performed consistently across the viral datasets analyzed. Our findings guide software selection and inform future versatile trimmer development for viral genome analysis. |
Genomics and metagenomics of Madurella mycetomatis, a causative agent of black grain mycetoma in Sudan
Litvintseva AP , Bakhiet S , Gade L , Wagner DD , Bagal UR , Batra D , Norris E , Rishishwar L , Beer KD , Siddig EE , Mhmoud NA , Chow NA , Fahal A . PLoS Negl Trop Dis 2022 16 (11) e0010787 ![]() Madurella mycetomatis is one of the main causative agents of mycetoma, a debilitating neglected tropical disease. Improved understanding of the genomic diversity of the fungal and bacterial causes of mycetoma is essential to advances in diagnosis and treatment. Here, we describe a high-quality genome assembly of M. mycetomatis and results of the whole genome sequence analysis of 26 isolates from Sudan. We demonstrate evidence of at least seven genetically diverse lineages and extreme clonality among isolates within these lineages. We also performed shotgun metagenomic analysis of DNA extracted from mycetoma grains and showed that M. mycetomatis reads were detected in all sequenced samples with the average of 11,317 reads (s.d. +/- 21,269) per sample. In addition, 10 (12%) of the 81 tested grain samples contained bacterial reads including Streptococcus sp., Staphylococcus sp. and others. |
VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data.
Wagner DD , Marine RL , Ramos E , Ng TFF , Castro CJ , Okomo-Adhiambo M , Harvey K , Doho G , Kelly R , Jain Y , Tatusov RL , Silva H , Rota PA , Khan AN , Oberste MS . Microbiol Spectr 2022 10 (2) e0256421 ![]() ![]() Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance. |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks.
Wagner DD , Carleton HA , Trees E , Katz LS . PeerJ 2021 9 e12446 ![]() ![]() Background. Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods. Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results. Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions. PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies. © 2021 PeerJ Inc.. All rights reserved. |
- Page last reviewed:Feb 1, 2024
- Page last updated:May 16, 2025
- Content source:
- Powered by CDC PHGKB Infrastructure