Last data update: Sep 30, 2024. (Total: 47785 publications since 2009)
Records 1-30 (of 108 Records) |
Query Trace: Khudyakov Y[original query] |
---|
Widespread hepatitis C virus transmission network among people who inject drugs in Kenya
Akiyama MJ , Khudyakov Y , Ramachandran S , Riback L , Ackerman M , Nyakowa M , Arthur L , Lizcano J , Walker J , Cherutich P , Kurth A . Int J Infect Dis 2024 107215 BACKGROUND: Hepatitis C virus (HCV) disproportionately affects among people who inject drugs (PWID) globally. Despite carrying a high HCV burden, little is known about transmission dynamics in low-and-middle income countries. METHODS: We recruited PWID from Nairobi and Coastal cities of Mombasa, Kilifi and Malindi in Kenya at needle and syringe programs. Next-generation sequencing data from HCV hypervariable region 1 were analyzed using Global Hepatitis Outbreak and Surveillance Technology (GHOST) to identify transmission clusters. RESULTS: HCV strains belonged to genotype 1a (n=64, 46.0%), 4a (n=72, 51.8%), and were mixed HCV/1a/4a (n=3, 2.2%). HCV/1a was dominant (61.2%) in Nairobi while HCV/4a was dominant in Malindi (85.7%) and Kilifi (60.9%); whereas both genotypes were evenly identified in Mombasa (45.3%, for HCV/1a and 50.9% for HCV/4a). GHOST identified 11 transmission clusters involving 90 cases. Strains in the two largest clusters (n=38 predominantly HCV/4a, and n=32 HCV/1a) were sampled from all four cities. CONCLUSION: Transmission clusters involving 64.7% of cases indicate an effective sampling of major HCV strains circulating among PWID. Large clusters involving 77.8% of strains from Nairobi and Coast suggest successful introduction of two ancestral HCV/1a and HCV/4a strains to PWID, with widely spread progeny. Disruption of the country-wide transmission network is essential for HCV elimination. |
Coordinated evolution among hepatitis C virus genomic sites is coupled to host factors and resistance to interferon.
Lara J , Tavis JE , Donlin MJ , Lee WM , Yuan HJ , Pearlman BL , Vaughan G , Forbi JC , Xia GL , Khudyakov YE . In Silico Biol 2011 11 213-24 Machine-learning methods in the form of Bayesian networks (BN), linear projection (LP) and self-organizing tree (SOT) models were used to explore association among polymorphic sites within the HVR1 and NS5a regions of the HCV genome, host demographic factors (ethnicity, gender and age) and response to the combined interferon (IFN) and ribavirin (RBV) therapy. The BN models predicted therapy outcomes, gender and ethnicity with accuracy of 90%, 90% and 88.9%, respectively. The LP and SOT models strongly confirmed associations of the HVR1 and NS5A structures with response to therapy and demographic host factors identified by BN. The data indicate host specificity of HCV evolution and suggest the application of these models to predict outcomes of IFN/RBV therapy. |
Association of antigenic properties to structure of the hepatitis C virus NS3 protein.
Lara J , Khudyakov Y . In Silico Biol 2011 11 203-12 Sequence heterogeneity substantially affects antigenic properties of the major epitope in the hepatitis C virus (HCV) NS3 protein. To facilitate protein engineering of NS3 antigens immunologically reactive with antibody against the broad diversity of HCV variants we constructed a set of Bayesian Networks (BN) for predicting antigenicity based on structural parameters. Using homology modeling, tertiary (3D) structures of NS3 variants with known antigenic properties were predicted. Energy force field estimated using the 3D-models was found to be most strongly associated with the antigenic properties. The best BN-models showed 100% accuracy of prediction of immunological reactivity with tested serum specimens in 10-fold cross validation. Bootstrap analyses of BN's constructed using selected features showed that secondary structure and electrostatic potential assessed from 3D-models are the most robust attributes associated with immunological reactivity of NS3 antigens. The data suggest that the BN models may guide the development of NS3 antigens with improved diagnostically relevant properties. |
Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry.
Dimitrova Z , Campo DS , Ramachandran S , Vaughan G , Ganova-Raeva L , Lin Y , Forbi JC , Xia G , Skums P , Pearlman B , Khudyakov Y . In Silico Biol 2011 11 183-92 Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods. |
Hepatitis C virus antigenic convergence.
Campo DS , Dimitrova Z , Yokosawa J , Hoang D , Perez NO , Ramachandran S , Khudyakov Y . Sci Rep 2012 2 267 Vaccine development against hepatitis C virus (HCV) is hindered by poor understanding of factors defining cross-immunoreactivity among heterogeneous epitopes. Using synthetic peptides and mouse immunization as a model, we conducted a quantitative analysis of cross-immunoreactivity among variants of the HCV hypervariable region 1 (HVR1). Analysis of 26,883 immunological reactions among pairs of peptides showed that the distribution of cross-immunoreactivity among HVR1 variants was skewed, with antibodies against a few variants reacting with all tested peptides. The HVR1 cross-immunoreactivity was accurately modeled based on amino acid sequence alone. The tested peptides were mapped in the HVR1 sequence space, which was visualized as a network of 11,319 sequences. The HVR1 variants with a greater network centrality showed a broader cross-immunoreactivity. The entire sequence space is explored by each HCV genotype and subtype. These findings indicate that HVR1 antigenic diversity is extensively convergent and effectively limited, suggesting significant implications for vaccine development. |
Primary case inference in viral outbreaks through analysis of intra-host variant population (preprint)
Gussler JW , Campo DS , Dimitrova Z , Skums P , Khudyakov Y . bioRxiv 2020 2020.09.18.303131 Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Genetic characterization of intra-host viral populations, although highly efficient in the identification of transmission clusters, is not as efficient in routing transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important ‘No primary case’ prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios. These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens.A version of this software is publicly available at the following url: https://www.github.com/walkergussler/PYCIVO |
Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections (preprint)
Icer Baykal PB , Lara J , Khudyakov Y , Zelikovsky A , Skums P . bioRxiv 2020 2020.06.17.157792 Background Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution.Methods We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected parameters measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties.Findings Distributions of the viral population parameters differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a Machine Learning classifier for the infection staging, which yielded a detection accuracy of 95.22%, thus providing a higher accuracy than other genomic-based models.Interpretation The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, that could potentially complement and/or substitute standard laboratory assays.Funding AZ and PS were supported by NIH grant 1R01EB025022. PIB was supported by GSU MBD fellowship.Competing Interest StatementThe authors have declared no competing interest. |
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants (preprint)
Tsyvina V , Campo DS , Sims S , Zelikovsky A , Khudyakov Y , Skums P . bioRxiv 2018 324418 Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naĻeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj |
SOPHIE: Viral Outbreak Investigation and Transmission History Reconstruction in a Joint Phylogenetic and Network Theory Framework
Skums Pavel , Mohebbi Fatemeh , Tsyvina Vyacheslav , Icer Pelin , Ramachandran Sumathi , Khudyakov Yury . Res Comput Mol Biol 2022 369-370 Reconstruction of transmission networks from viral genomes sampled from infected individuals is a major computational problem of genomic epidemiology. For this problem, we propose a maximum likelihood framework SOPHIE (SOcial and PHilogenetic Investigation of Epidemics) based on the integration of phylogenetic and random graph models. SOPHIE is scalable, accounts for intra-host diversity and accurately infers transmissions without case-specific epidemiological data. |
SOPHIE: viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework (preprint)
Skums P , Mohebbi F , Tsyvina V , Baykal PI , Nemira A , Ramachandran S , Khudyakov Y . bioRxiv 2022 05 (10) 844-856 e4 Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, while common source outbreaks violate this assumption. We propose a maximum likelihood framework SOPHIE (SOcial and PHilogenetic Investigation of Epidemics) based on integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modelled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity and accurately infers transmissions without case-specific epidemiological data. SOPHIE code is freely available at https://github.com/compbel/SOPHIE/ Copyright The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. |
Polyvalent immunization elicits a synergistic broadly neutralizing immune response to hypervariable region 1 variants of hepatitis C virus
Mosa AI , Campo DS , Khudyakov Y , AbouHaidar MG , Gehring AJ , Zahoor A , Ball JK , Urbanowicz RA , Feld JJ . Proc Natl Acad Sci U S A 2023 120 (24) e2220294120 A hepatitis C virus (HCV) vaccine is urgently needed. Vaccine development has been hindered by HCV's genetic diversity, particularly within the immunodominant hypervariable region 1 (HVR1). Here, we developed a strategy to elicit broadly neutralizing antibodies to HVR1, which had previously been considered infeasible. We first applied a unique information theory-based measure of genetic distance to evaluate phenotypic relatedness between HVR1 variants. These distances were used to model the structure of HVR1's sequence space, which was found to have five major clusters. Variants from each cluster were used to immunize mice individually, and as a pentavalent mixture. Sera obtained following immunization neutralized every variant in a diverse HCVpp panel (n = 10), including those resistant to monovalent immunization, and at higher mean titers (1/ID(50) = 435) than a glycoprotein E2 (1/ID(50) = 205) vaccine. This synergistic immune response offers a unique approach to overcoming antigenic variability and may be applicable to other highly mutable viruses. |
A Novel Information-Theory-Based Genetic Distance That Approximates Phenotypic Differences.
Campo DS , Mosa A , Khudyakov Y . J Comput Biol 2023 30 (4) 420-431 Application of genetic distances to measure phenotypic relatedness is a challenging task, reflecting the complex relationship between genotype and phenotype. Accurate assessment of proximity among sequences with different phenotypic traits depends on how strongly the chosen distance is associated with structural and functional properties. In this study, we present a new distance measure Mutual Information and Entropy H (MIH) for categorical data such as nucleotide or amino acid sequences. MIH applies an information matrix (IM), which is calculated from the data and captures heterogeneity of individual positions as measured by Shannon entropy and coordinated substitutions among positions as measured by mutual information. In general, MIH assigns low weights to differences occurring at high entropy positions or at dependent positions. MIH distance was compared with other common distances on two experimental and two simulated data sets. MIH showed the best ability to distinguish cross-immunoreactive sequence pairs from non-cross-immunoreactive pairs of variants of the hepatitis C virus hypervariable region 1 (26,883 pairwise comparisons), and Major Histocompatibility Complex (MHC) binding peptides (n = 181) from non-binding peptides (n = 129). Analysis of 74 simulated RNA secondary structures also showed that the ratio between MIH distance of sequences from the same RNA structure and MIH of sequences from different structures is three orders of magnitude greater than for Hamming distances. These findings indicate that lower MIH between two sequences is associated with greater probability of the sequences to belong to the same phenotype. Examination of rule-based phenotypes generated in silico showed that (1) MIH is strongly associated with phenotypic differences, (2) IM of sequences under selection is very different from IM generated under random scenarios, and (3) IM is robust to sampling. In conclusion, MIH strongly approximates structural/functional distances and should have important applications to a wide range of biological problems, including evolution, artificial selection of biological functions and structures, and measuring phenotypic similarity. |
SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework.
Skums P , Mohebbi F , Tsyvina V , Baykal PI , Nemira A , Ramachandran S , Khudyakov Y . Cell Syst 2022 13 (10) 844-856.e4 Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data. |
Hepatitis C virus transmission cluster among injection drug users in Pakistan
Sahibzada KI , Ganova-Raeva L , Dimitrova Z , Ramachandran S , Lin Y , Longmire G , Arthur L , Xia GL , Khudyakov Y , Khan I , Sadaf S . PLoS One 2022 17 (7) e0270910 Hepatitis C virus (HCV) infections are public health problem across the globe, particularly in developing countries. Pakistan has the second highest prevalence of HCV infection worldwide. Limited data exist from Pakistan about persons who inject drugs (PWID) and are at significant risk of exposure to HCV infection and transmission. Serum specimens (n = 110) collected from PWID residing in four provinces were tested for molecular markers of HCV infection. Next generation sequencing (NGS) of the hypervariable region (HVR1) of HCV and Global Hepatitis Outbreak and Surveillance Technology (GHOST) were used to determine HCV genotype, genetic heterogeneity, and construct transmission networks. Among tested specimens, 47.3% were found anti-HCV positive and 34.6% were HCV RNA-positive and belonged to four genotypes, with 3a most prevalent followed by 1a, 1b and 4a. Variants sampled from five cases formed phylogenetic cluster and a transmission network. One case harbored infection with two different genotypes. High prevalence of infections and presence of various genotypes indicate frequent introduction and transmission of HCV among PWID in Pakistan. Identification of a transmission cluster across three provinces, involving 20% of all cases, suggests the existence of a countrywide transmission network among PWIDs. Understanding the structure of this network should assist in devising effective public health strategies to eliminate HCV infection in Pakistan. |
Primary case inference in viral outbreaks through analysis of intra-host variant population.
Gussler JW , Campo DS , Dimitrova Z , Skums P , Khudyakov Y . BMC Bioinformatics 2022 23 (1) 62 BACKGROUND: Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled. RESULTS: These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens. CONCLUSIONS: PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important 'No primary case' prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios. |
Changing Molecular Epidemiology of Hepatitis A Virus Infection, United States, 1996-2019.
Ramachandran S , Xia GL , Dimitrova Z , Lin Y , Montgomery M , Augustine R , Kamili S , Khudyakov Y . Emerg Infect Dis 2021 27 (6) 1742-1745 Hepatitis A virus (HAV) genotype IA was most common among strains tested in US outbreak investigations and surveillance during 1996-2015. However, HAV genotype IB gained prominence during 2016-2019 person-to-person multistate outbreaks. Detection of previously uncommon strains highlights the changing molecular epidemiology of HAV infection in the United States. |
Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections.
Icer Baykal PB , Lara J , Khudyakov Y , Zelikovsky A , Skums P . Virus Evol 2021 7 (1) veaa103 Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays. |
Convex hulls in hamming space enable efficient search for similarity and clustering of genomic sequences.
Campo DS , Khudyakov Y . BMC Bioinformatics 2020 21 482 BACKGROUND: In molecular epidemiology, comparison of intra-host viral variants among infected persons is frequently used for tracing transmissions in human population and detecting viral infection outbreaks. Application of Ultra-Deep Sequencing (UDS) immensely increases the sensitivity of transmission detection but brings considerable computational challenges when comparing all pairs of sequences. We developed a new population comparison method based on convex hulls in hamming space. We applied this method to a large set of UDS samples obtained from unrelated cases infected with hepatitis C virus (HCV) and compared its performance with three previously published methods. RESULTS: The convex hull in hamming space is a data structure that provides information on: (1) average hamming distance within the set, (2) average hamming distance between two sets; (3) closeness centrality of each sequence; and (4) lower and upper bound of all the pairwise distances among the members of two sets. This filtering strategy rapidly and correctly removes 96.2% of all pairwise HCV sample comparisons, outperforming all previous methods. The convex hull distance (CHD) algorithm showed variable performance depending on sequence heterogeneity of the studied populations in real and simulated datasets, suggesting the possibility of using clustering methods to improve the performance. To address this issue, we developed a new clustering algorithm, k-hulls, that reduces heterogeneity of the convex hull. This efficient algorithm is an extension of the k-means algorithm and can be used with any type of categorical data. It is 6.8-times more accurate than k-mode, a previously developed clustering algorithm for categorical data. CONCLUSIONS: CHD is a fast and efficient filtering strategy for massively reducing the computational burden of pairwise comparison among large samples of sequences, and thus, aiding the calculation of transmission links among infected individuals using threshold-based methods. In addition, the convex hull efficiently obtains important summary metrics for intra-host viral populations. |
Accurate spatiotemporal mapping of drug overdose deaths by machine learning of drug-related web-searches.
Campo DS , Gussler JW , Sue A , Skums P , Khudyakov Y . PLoS One 2020 15 (12) e0243622 Persons who inject drugs (PWID) are at increased risk for overdose death (ODD), infections with HIV, hepatitis B (HBV) and hepatitis C virus (HCV), and noninfectious health conditions. Spatiotemporal identification of PWID communities is essential for developing efficient and cost-effective public health interventions for reducing morbidity and mortality associated with injection-drug use (IDU). Reported ODDs are a strong indicator of the extent of IDU in different geographic regions. However, ODD quantification can take time, with delays in ODD reporting occurring due to a range of factors including death investigation and drug testing. This delayed ODD reporting may affect efficient early interventions for infectious diseases. We present a novel model, Dynamic Overdose Vulnerability Estimator (DOVE), for assessment and spatiotemporal mapping of ODDs in different U.S. jurisdictions. Using Google® Web-search volumes (i.e., the fraction of all searches that include certain words), we identified a strong association between the reported ODD rates and drug-related search terms for 2004-2017. A machine learning model (Extremely Random Forest) was developed to produce yearly ODD estimates at state and county levels, as well as monthly estimates at state level. Regarding the total number of ODDs per year, DOVE's error was only 3.52% (Median Absolute Error, MAE) in the United States for 2005-2017. DOVE estimated 66,463 ODDs out of the reported 70,237 (94.48%) during 2017. For that year, the MAE of the individual ODD rates was 4.43%, 7.34%, and 12.75% among yearly estimates for states, yearly estimates for counties, and monthly estimates for states, respectively. These results indicate suitability of the DOVE ODD estimates for dynamic IDU assessment in most states, which may alert for possible increased morbidity and mortality associated with IDU. ODD estimates produced by DOVE offer an opportunity for a spatiotemporal ODD mapping. Timely identification of potential mortality trends among PWID might assist in developing efficient ODD prevention and HBV, HCV, and HIV infection elimination programs by targeting public health interventions to the most vulnerable PWID communities. |
Complex genetic encoding of the hepatitis B virus on-drug persistence.
Thai H , Lara J , Xu X , Kitrinos K , Gaggar A , Chan HLY , Xia GL , Ganova-Raeva L , Khudyakov Y . Sci Rep 2020 10 (1) 15574 Tenofovir disoproxil fumarate (TDF) is one of the nucleotide analogs capable of inhibiting the reverse transcriptase (RT) activity of HIV and hepatitis B virus (HBV). There is no known HBV resistance to TDF. However, detectable variation in duration of HBV persistence in patients on TDF therapy suggests the existence of genetic mechanisms of on-drug persistence that reduce TDF efficacy for some HBV strains without affording actual resistance. Here, the whole genome of intra-host HBV variants (N = 1,288) was sequenced from patients with rapid (RR, N = 5) and slow response (SR, N = 5) to TDF. Association of HBV genomic and protein polymorphic sites to RR and SR was assessed using phylogenetic analysis and Bayesian network methods. We show that, in difference to resistance to nucleotide analogs, which is mainly associated with few specific mutations in RT, the HBV on-TDF persistence is defined by genetic variations across the entire HBV genome. Analysis of the inferred 3D-structures indicates no difference in affinity of TDF binding by RT encoded by intra-host HBV variants that rapidly decline or persist in presence of TDF. This finding suggests that effectiveness of TDF recognition and binding does not contribute significantly to on-drug persistence. Differences in patterns of genetic associations to TDF response between HBV genotypes B and C and lack of a single pattern of mutations among intra-host variants sensitive to TDF indicate a complex genetic encoding of the trait. We hypothesize that there are many genetic mechanisms of on-drug persistence, which are differentially available to HBV strains. These pervasive mechanisms are insufficient to prevent viral inhibition completely but may contribute significantly to robustness of actual resistance. On-drug persistence may reduce the overall effectiveness of therapy and should be considered for development of more potent drugs. |
Machine learning can accelerate discovery and application of cyber-molecular cancer diagnostics.
Campo DS , Khudyakov Y . J Med Artif Intell 2020 3 (7) Accurate and early cancer diagnosis is fundamental for clinical management and public health. Unfortunately, the biological complexity of cancer confounds the development of effective diagnostic approaches to its detection. Histological examination of tissue samples obtained by biopsy directly from solid tumors and imaging technologies remain as the mainstays of cancer diagnostics. The liquid biopsy concept aims to overcome the shortcomings of these onco-diagnostics by detecting tumor-derived biomarkers such as circulating tumor cells, extracellular vesicles, nucleosomes, proteins, antigens, and extracellular nucleic acids in blood (1). |
A Phylogenetic Analysis of HCV Transmission, Relapse, and Reinfection Among People Who Inject Drugs Receiving Opioid Agonist Therapy.
Akiyama MJ , Lipsey D , Ganova-Raeva L , Punkova L , Agyemang L , Sue A , Ramachandran S , Khudyakov Y , Litwin AH . J Infect Dis 2020 222 (3) 488-498 BACKGROUND: Understanding hepatitis C virus (HCV) transmission among people who inject drugs (PWID) is essential for HCV elimination. We aimed to differentiate reinfections from treatment failures and to identify transmission linkages and associated factors in a cohort of PWID receiving opioid agonist therapy (OAT). METHODS: We analyzed baseline and follow-up specimens from 150 PWID from three OAT clinics in the Bronx, NY. NGS data from the hypervariable region 1 of HCV were analyzed using Global Hepatitis Outbreak and Surveillance Technology. RESULTS: There were three transmission linkages between study participants. Nine participants did not achieve sustained virologic response (SVR): seven had follow-up specimens with similar sequences to baseline and two passed away. Four additional participants achieved SVR but became viremic at later follow-up: two were reinfected with different strains, one had a late treatment failure, and one was transiently viremic 17 months post-treatment. All transmission linkages were from the same OAT clinic and involved spousal or common-law partnerships. CONCLUSION: This study highlights the use of next generation sequencing (NGS) as an important tool for identifying viral transmission and to help distinguish relapse and reinfection among PWID. Results reinforce the need for harm reduction interventions among couples and those who report ongoing risk factors following SVR. |
Long-term virological and adherence outcomes to antiviral treatment in a 4-year cohort chronic HBV study
Abreu RM , Bassit LC , Tao S , Jiang Y , Ferreira AS , Hori PC , Ganova-Raeva LM , Khudyakov Y , Schinazi RF , Carrilho FJ , Ono SK . Antivir Ther 2019 24 (8) 567-579 BACKGROUND: Chronic hepatitis B (CHB) treatment adherence has been poorly studied worldwide. We evaluated long term virological and adherence outcomes to antiviral treatment in CHB patients. METHODS: A prospective 183 Brazilian CHB patients cohort treated with monotherapy or combination adefovir dipivoxil, entecavir, lamivudine and / or tenofovir disoproxil fumarate was studied in a reference tertiary center. Treatment adherence was evaluated by a validated questionnaire named "Assessment of Adherence to Antiviral Therapy Questionnaire" (CEAT-HBV) within three year-periods (2010/2011, 2013/2014 and 2014/2015). RESULTS: CEAT-HBV identified 43% (79/183) patients with non-adherence to antiviral treatment and among them, 67% (53/79) were viral load positive. The main causes associated with non-response to antiviral treatment were drug resistance variants followed by non-adherence, insufficient treatment duration and other causes. Single-dose pharmacokinetics demonstrated 35% (23/65) antiviral non-adherence. Two years after the first assessment, the CEAT-HBV indicated that 71% (101/143) subjects adhered to treatment (per-protocol population). However, 21% (40/183) of the patients could not be evaluated and were excluded. The main reasons for exclusion were death (20/183), 11 out 20 deaths due to hepatocellular carcinoma. Hepatitis B virus (HBV) booklet was used for medical education. The third CEAT-HBV assessment (2014/2015) showed that 83% (112/135) patients were compliant with treatment adherence (per-protocol population). Long-term evaluation showed that adherence rate based on CEAT-HBV continue to increase after 4-years (p<0.001). CONCLUSIONS: The results highlight the importance of CHB therapy adherence assessment monitoring. Long-term adherence outcomes were dynamic and it is possible to increase the migration rate to adherence/HBV DNA negative group. |
Hepatitis B virus mutant infections in hemodialysis patients: A case series
Apata IW , Nguyen DB , Khudyakov Y , Mixson-Hayden T , Rosenberg J , Zahn M , Greenko J , Clement E , Portney AE , Kulkarni PA , Comer M , Adams E , Kamili S , Patel PR , Moorman AC . Kidney Med 2019 1 (6) 347-353 Rationale & Objective: Hepatitis B virus (HBV) transmission in hemodialysis units has become a rare event since implementation of hemodialysis-specific infection control guidelines: performing hemodialysis for hepatitis B surface antigen (HBsAg)-positive patients in an HBV isolation room, vaccinating HBV-susceptible (HBV surface antibody and HBsAg negative) patients, and monthly HBsAg testing in HBV-susceptible patients. Mutations in HBsAg can result in false-negative HBsAg results, leading to failure to identify HBsAg seroconversion from negative to positive. We describe 4 unique cases of HBsAg seroconversion caused by mutant HBV infection or reactivation in hemodialysis patients. Study Design: Following identification of a possible HBsAg seroconversion and mutant HBV infection, public health investigations were launched to conduct further HBV testing of case patients and potentially exposed patients. A case patient was defined as a hemodialysis patient with suspected mutant HBV infection because of false-negative HBsAg testing results. Confirmed case patients had HBV DNA sequences demonstrating S-gene mutations. Setting & Participants: Case patients and patients potentially exposed to the case patient in the respective hemodialysis units in multiple US states. Results: 4 cases of mutant HBV infection in hemodialysis patients were identified; 3 cases were confirmed using molecular sequencing. Failure of some HBsAg testing platforms to detect HBV mutations led to delays in applying HBV isolation procedures. Testing of potentially exposed patients did not identify secondary transmissions. Limitations: Lack of access to information on past HBsAg testing platforms and results led to challenges in ascertaining when HBsAg seroconversion occurred and identifying and testing all potentially exposed patients. Conclusions: Mutant HBV infections should be suspected in patients who test HBsAg negative and concurrently test positive for HBV DNA at high levels. Dialysis providers should consider using HBsAg assays that can also detect mutant HBV strains for routine HBV testing. |
Entropy of mitochondrial DNA circulating in blood is associated with hepatocellular carcinoma.
Campo DS , Nayak V , Srinivasamoorthy G , Khudyakov Y . BMC Med Genomics 2019 12 74 BACKGROUND: Ultra-Deep Sequencing (UDS) enabled identification of specific changes in human genome occurring in malignant tumors, with current approaches calling for the detection of specific mutations associated with certain cancers. However, such associations are frequently idiosyncratic and cannot be generalized for diagnostics. Mitochondrial DNA (mtDNA) has been shown to be functionally associated with several cancer types. Here, we study the association of intra-host mtDNA diversity with Hepatocellular Carcinoma (HCC). RESULTS: UDS mtDNA exome data from blood of patients with HCC (n = 293) and non-cancer controls (NC, n = 391) were used to: (i) measure the genetic heterogeneity of nucleotide sites from the entire population of intra-host mtDNA variants rather than to detect specific mutations, and (ii) apply machine learning algorithms to develop a classifier for HCC detection. Average total entropy of HCC mtDNA is 1.24-times lower than of NC mtDNA (p = 2.84E-47). Among all polymorphic sites, 2.09% had a significantly different mean entropy between HCC and NC, with 0.32% of the HCC mtDNA sites having greater (p < 0.05) and 1.77% of the sites having lower mean entropy (p < 0.05) as compared to NC. The entropy profile of each sample was used to further explore the association between mtDNA heterogeneity and HCC by means of a Random Forest (RF) classifier The RF-classifier separated 232 HCC and 232 NC patients with accuracy of up to 99.78% and average accuracy of 92.23% in the 10-fold cross-validation. The classifier accurately separated 93.08% of HCC (n = 61) and NC (n = 159) patients in a validation dataset that was not used for the RF parameter optimization. CONCLUSIONS: Polymorphic sites contributing most to the mtDNA association with HCC are scattered along the mitochondrial genome, affecting all mitochondrial genes. The findings suggest that application of heterogeneity profiles of intra-host mtDNA variants from blood may help overcome barriers associated with the complex association of specific mutations with cancer, enabling the development of accurate, rapid, inexpensive and minimally invasive diagnostic detection of cancer. |
HCV transmission in high-risk communities in Bulgaria.
Ganova-Raeva L , Dimitrova Z , Alexiev I , Punkova L , Sue A , Xia GL , Gancheva A , Dimitrova R , Kostadinova A , Golkocheva-Markova E , Khudyakov Y . PLoS One 2019 14 (3) e0212350 BACKGROUND: The rate of HIV infection in Bulgaria is low. However, the rate of HCV-HIV-coinfection and HCV infection is high, especially among high-risk communities. The molecular epidemiology of those infections has not been studied before. METHODS: Consensus Sanger sequences of HVR1 and NS5B from 125 cases of HIV/HCV coinfections, collected during 2010-2014 in 15 different Bulgarian cities, were used for preliminary phylogenetic evaluation. Next-generation sequencing (NGS) data of the hypervariable region 1 (HVR1) analyzed via the Global Hepatitis Outbreak and Surveillance Technology (GHOST) were used to evaluate genetic heterogeneity and possible transmission linkages. Links between pairs that were below and above the established genetic distance threshold, indicative of transmission, were further examined by generating k-step networks. RESULTS: Preliminary genetic analyses showed predominance of HCV genotype 1a (54%), followed by 1b (20.8%), 2a (1.4%), 3a (22.3%) and 4a (1.4%), indicating ongoing transmission of many HCV strains of different genotypes. NGS of HVR1 from 72 cases showed significant genetic heterogeneity of intra-host HCV populations, with 5 cases being infected with 2 different genotypes or subtypes and 6 cases being infected with 2 strains of same subtype. GHOST revealed 8 transmission clusters involving 30 cases (41.7%), indicating a high rate of transmission. Four transmission clusters were found in Sofia, three in Plovdiv, and one in Peshtera. The main risk factor for the clusters was injection drug use. Close genetic proximity among HCV strains from the 3 Sofia clusters, and between HCV strains from Peshtera and one of the two Plovdiv clusters confirms a long and extensive transmission history of these strains in Bulgaria. CONCLUSIONS: Identification of several HCV genotypes and many HCV strains suggests a frequent introduction of HCV to the studied high-risk communities. GHOST detected a broad transmission network, which sustains circulation of several HCV strains since their early introduction in the 3 cities. This is the first report on the molecular epidemiology of HIV/HCV coinfections in Bulgaria. |
Recent and occult hepatitis B virus infections among blood donors in the United States
Ramachandran S , Groves JA , Xia GL , Saa P , Notari EP , Drobeniuc J , Poe A , Khudyakov N , Schillie SF , Murphy TV , Kamili S , Teo CG , Dodd RY , Khudyakov YE , Stramer SL . Transfusion 2018 59 (2) 601-611 BACKGROUND: Characteristics of US blood donors with recent (RBI) or occult (OBI) hepatitis B virus (HBV) infection are not well defined. METHODS: Donors with RBI and OBI were identified by nucleic acid and serologic testing among 34.4 million donations during 2009-2015. Consenting donors were interviewed and their HBV S-gene sequenced. RESULTS: The overall rate of HBV-infected donors was 7.95 per 100,000; of these, 0.35 per 100,000 and 1.70 per 100,000 were RBI and OBI, respectively. RBI (n = 120) and OBI (n = 583) donors constituted 26% of all HBV-infected (n = 2735) donors. Detection of HBV DNA in 92% of OBI donors required individual donation nucleic acid testing. Donors with OBI compared to RBI were older (mean age, 48 vs 39 years; p < 0.0001) with lower median viral loads (9 vs. 529 IU/mL; p < 0.0001). A higher proportion of OBI than RBI donors were born or resided in an endemic country (39% vs. 5%; p = 0.0078). Seventy-seven percent of all RBI and OBI donors had multiple sex partners, an HBV-risk factor. Of 40 RBI and 10 OBI donors whose S gene was sequenced, 33 (83%) and 6 (60%), respectively, carried HBV subgenotype A2; 18 (55%) and 2 (33%), respectively, shared an identical sequence. Infection with 1 or more putative HBV-immune-escape mutants was identified in 5 (50%) of OBI but no RBI donors. CONCLUSION: RBI and OBI continue to be identified at low rates, confirming the importance of comprehensive HBV DNA screening of US blood donations. HBV-infected donors require referral for care and evaluation and contact tracing; their HBV strains may provide important information on emergent genotypes. |
A large HCV transmission network enabled a fast-growing HIV outbreak in rural Indiana, 2015.
Ramachandran S , Thai H , Forbi JC , Galang RR , Dimitrova Z , Xia GL , Lin Y , Punkova LT , Pontones PR , Gentry J , Blosser SJ , Lovchik J , Switzer WM , Teshale E , Peters P , Ward J , Khudyakov Y . EBioMedicine 2018 37 374-381 BACKGROUND: A high prevalence (92.3%) of hepatitis C virus (HCV) co-infection among HIV patients identified during a large HIV outbreak associated with injection of oxymorphone in Indiana prompted genetic analysis of HCV strains. METHODS: Molecular epidemiological analysis of HCV-positive samples included genotyping, sampling intra-host HVR1 variants by next-generation sequencing (NGS) and constructing transmission networks using Global Hepatitis Outbreak and Surveillance Technology (GHOST). FINDINGS: Results from the 492 samples indicate predominance of HCV genotypes 1a (72.2%) and 3a (20.4%), and existence of 2 major endemic NS5B clusters involving 49.8% of the sequenced strains. Among 76 HIV co-infected patients, 60.5% segregated into 2 endemic clusters. NGS analyses of 281 cases identified 826,917 unique HVR1 sequences and 51 cases of mixed subtype/genotype infections. GHOST mapped 23 transmission clusters. One large cluster (n=130) included 50 cases infected with >/=2 subtypes/genotypes and 43 cases co-infected with HIV. Rapid strain replacement and superinfection with different strains were found among 7 of 12 cases who were followed up. INTERPRETATION: GHOST enabled mapping of HCV transmission networks among persons who inject drugs (PWID). Findings of numerous transmission clusters, mixed-genotype infections and rapid succession of infections with different HCV strains indicate a high rate of HCV spread. Co-localization of HIV co-infected patients in the major HCV clusters suggests that HIV dissemination was enabled by existing HCV transmission networks that likely perpetuated HCV in the community for years. Identification of transmission networks is an important step to guiding efficient public health interventions for preventing and interrupting HCV and HIV transmission among PWID. FUND: US Centers for Disease Control and Prevention, and US state and local public health departments. |
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants.
Tsyvina V , Campo DS , Sims S , Zelikovsky A , Khudyakov Y , Skums P . BMC Bioinformatics 2018 19 360 BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naieve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data. |
Automated quality control for a molecular surveillance system.
Sims S , Longmire AG , Campo DS , Ramachandran S , Medrzycki M , Ganova-Raeva L , Lin Y , Sue A , Thai H , Zelikovsky A , Khudyakov Y . BMC Bioinformatics 2018 19 358 BACKGROUND: Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically. RESULTS: The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of "chordless" cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol. CONCLUSIONS: Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Sep 30, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure