Last data update: Oct 28, 2024. (Total: 48004 publications since 2009)
Records 1-30 (of 36 Records) |
Query Trace: Satten GA[original query] |
---|
Interaction of HLA-DRB1 * 1501 and TNF-Alpha in a Population-based Case-control Study of Multiple Sclerosis.
Williamson DM , Marrie RA , Ashley-Koch A , Satten GA . Immunol Infect Dis 2013 1 (1) 10-17 This study was conducted to determine whether single nucleotide polymorphisms (SNPs) in nine genes (human leukocyte antigen (HLA), T cell receptor beta (TCA receptor β), tumor necrosis factor α (TNF α), tumor necrosis factor β (TNF β), apolipoprotein E (APOE), interleukin 7 receptor alpha chain (IL7RA) interleukin 2 receptor alpha chain (IL2RA) myelin basic protein (MBP) and vitamin D receptor (VDR)) associated with multiple sclerosis (MS) could be replicated in a population-based sample, and to determine if these associations are modified by presence of HLA DRB1*1501. DNA was available from 722 individuals (223 with MS and 499 controls) who participated in a population-based case-control study. Cases and controls were matched on ancestry, age, gender and geographic area. HLA DRB1*1501 risk allele (T) was confirmed in this population using a genotypic test, controlling for multiple comparisons. Examining the effect of each SNP in the presence or absence of the HLA DRB1*1501 risk allele identified significant associations with TNF α -1031 (rs1799964) among those without the HLA risk allele. No additional interactions were significant in a cases-only analysis. Our results indicate that an interaction between SNPs in TNF α and HLA DRB1*1501 may influence the risk of developing MS. |
Heavy metals, organic solvents, and multiple sclerosis: An exploratory look at gene-environment interactions.
Napier MD , Poole C , Satten GA , Ashley-Koch A , Marrie RA , Williamson DM . Arch Environ Occup Health 2016 71 (1) 26-34 Exposure to heavy metals and organic solvents are potential etiologic factors for multiple sclerosis (MS), but their interaction with MS-associated genes is under-studied. The authors explored the relationship between environmental exposure to lead, mercury, and solvents and 58 single-nucleotide polymorphisms (SNPs) in MS-associated genes. Data from a population-based case-control study of 217 prevalent MS cases and 496 age-, race-, gender-, and geographically matched controls were used to fit conditional logistic regression models of the association between the chemical, gene, and MS, adjusting for education and ancestry. MS cases were more likely than controls to report lead (odds ratio [OR] = 2.03; 95% confidence interval [CI]: 1.07, 3.86) and mercury exposure (OR = 2.06; 95% CI: 1.08, 3.91). Findings of potential gene-environment interactions between SNPs in TNF-α, TNF-β, TCA-β, VDR, MBP, and APOE, and lead, mercury, or solvents should be considered cautiously due to limited sample size. |
Ranked severe maternal morbidity index for population-level surveillance at delivery hospitalization based on hospital discharge data
Kuklina EV , Ewing AC , Satten GA , Callaghan WM , Goodman DA , Ferre CD , Ko JY , Womack LS , Galang RR , Kroelinger CD . PLoS One 2023 18 (11) e0294140 BACKGROUND: Severe maternal morbidity (SMM) is broadly defined as an unexpected and potentially life-threatening event associated with labor and delivery. The Centers for Disease Control and Prevention (CDC) produced 21 different indicators based on International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) hospital diagnostic and procedure codes to identify cases of SMM. OBJECTIVES: To examine existing SMM indicators and determine which indicators identified the most in-hospital mortality at delivery hospitalization. METHODS: Data from the 1993-2015 and 2017-2019 Healthcare Cost and Utilization Project's National Inpatient Sample were used to report SMM indicator-specific prevalences, in-hospital mortality rates, and population attributable fractions (PAF) of mortality. We hierarchically ranked indicators by their overall PAF of in-hospital mortality. Predictive modeling determined if SMM prevalence remained comparable after transition to ICD-10-CM coding. RESULTS: The study population consisted of 18,198,934 hospitalizations representing 87,864,173 US delivery hospitalizations. The 15 top ranked indicators identified 80% of in-hospital mortality; the proportion identified by the remaining indicators was negligible (2%). The top 15 indicators were: restoration of cardiac rhythm; cardiac arrest; mechanical ventilation; tracheostomy; amniotic fluid embolism; aneurysm; acute respiratory distress syndrome; acute myocardial infarction; shock; thromboembolism, pulmonary embolism; cerebrovascular disorders; sepsis; both DIC and blood transfusion; acute renal failure; and hysterectomy. The overall prevalence of the top 15 ranked SMM indicators (~22,000 SMM cases per year) was comparable after transition to ICD-10-CM coding. CONCLUSIONS: We determined the 15 indicators that identified the most in-hospital mortality at delivery hospitalization in the US. Continued testing of SMM indicators can improve measurement and surveillance of the most severe maternal complications at the population level. |
Testing hypotheses about the microbiome using the linear decomposition model (LDM) (preprint)
Hu YJ , Satten GA . bioRxiv 2020 229831 Motivation Methods for analyzing microbiome data generally fall into one of two groups: tests of the global hypothesis of any microbiome effect, which do not provide any information on the contribution of individual operational taxonomic units (OTUs); and tests for individual OTUs, which do not typically provide a global test of microbiome effect. Without a unified approach, the findings of a global test may be hard to resolve with the findings at the individual OTU level. Further, many tests of individual OTU effects do not preserve the false discovery rate (FDR).Results We introduce the linear decomposition model (LDM), that provides a single analysis path that includes global tests of any effect of the microbiome, tests of the effects of individual OTUs while accounting for multiple testing by controlling the FDR, and a connection to distance-based ordination. The LDM accommodates both continuous and discrete variables (e.g., clinical outcomes, environmental factors) as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based p-values that can control for correlation. The LDM can also be applied to transformed data, and an “omnibus” test can easily combine results from analyses conducted on different transformation scales. We also provide a new implementation of PERMANOVA based on our approach. For global testing, our simulations indicate the LDM provided correct type I error and can have comparable power to existing distance-based methods. For testing individual OTUs, our simulations indicate the LDM controlled the FDR well. In contrast, DESeq2 often had inflated FDR; MetagenomeSeq generally had the lowest sensitivity. The flexibility of the LDM for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. We also show that our implementation of PERMANOVA can outperform existing implementations. |
Fresh vs. frozen embryo transfer: new approach to minimize the limitations of using national surveillance data for clinical research
Weiss MS , Luo C , Zhang Y , Chen Y , Kissin DM , Satten GA , Barnhart KT . Fertil Steril 2022 119 (2) 186-194 OBJECTIVE: To assess the benefit of frozen vs. fresh elective single embryo transfer using traditional and novel methods of controlling for confounding. DESIGN: Retrospective cohort study using data from the National Assisted Reproductive Technology Surveillance System. SETTING: Not applicable. PATIENT(S): A total of 44,750 women aged 20-35 years undergoing their first lifetime oocyte retrieval and embryo transfer in 2016-2017, who had ≥4 embryos cryopreserved. INTERVENTION(S): Fresh elective single embryo transfer and frozen elective single embryo transfer. MAIN OUTCOME MEASURE(S): The primary outcome was a singleton live birth. Secondary outcomes included rates of total live birth (singleton plus multiple gestations), twin live birth, clinical intrauterine gestation, total pregnancy loss, biochemical pregnancy, and ectopic pregnancy. Outcomes for infants included gestational age at delivery, birth weight, and being small for gestational age. RESULT(S): The eligibility criteria were met by 6,324 fresh and 2,318 frozen cycles. Patients undergoing fresh and frozen transfer had comparable mean age (30.69 [standard deviation {SD} 0.08] years vs. 31.06 [SD 0.08] years) and body mass index (24.76 [SD 0.20] vs. 25.65 [SD 0.15]); however, women in the frozen cohort created more embryos (8.1 [SD 0.12] vs. 6.8 [SD 0.08]). Singleton live birth rates in the fresh vs. frozen groups were 51.4% vs. 48.8% (risk ratio 1.05; 95% confidence interval [CI], 1.00-1.10). After adjustment with a log-linear regression model and propensity score analysis, the difference in singleton live birth rates remained nonsignificant (adjusted risk ratio, 1.05; 95% CI, 0.97-1.14 and 1.02; 95% CI, 0.96-1.08, respectively). A novel dynamical model confirmed inherent fertility (probability of ever achieving a pregnancy) was balanced between groups (odds ratio, 1.23; 95% CI 0.78-1.95]). The per-cycle probability of singleton live birth was not different between groups (odds ratio 1.11 [95% CI 0.94-1.3]). CONCLUSION(S): In this retrospective cohort study of fresh vs. frozen elective single embryo transfer, there was no statistically significant difference in singleton live birth rate after adjustment using log-linear models and propensity score analysis. The successful application of a novel dynamical model, which incorporates multiple assisted reproductive technology cycles from the same woman as a surrogate for inherent fertility, offers a novel and complementary perspective for assessing interventions using national surveillance data. |
What Can We Learn about the Bias of Microbiome Studies from Analyzing Data from Mock Communities?
Li M , Tyx RE , Rivera AJ , Zhao N , Satten GA . Genes (Basel) 2022 13 (10) It is known that data from both 16S and shotgun metagenomics studies are subject to biases that cause the observed relative abundances of taxa to differ from their true values. Model community analyses, in which the relative abundances of all taxa in the sample are known by construction, seem to offer the hope that these biases can be measured. However, it is unclear whether the bias we measure in a mock community analysis is the same as we measure in a sample in which taxa are spiked in at known relative abundance, or if the biases we measure in spike-in samples is the same as the bias we would measure in a real (e.g., biological) sample. Here, we consider these questions in the context of 16S rRNA measurements on three sets of samples: the commercially available Zymo cells model community; the Zymo model community mixed with Swedish Snus, a smokeless tobacco product that is virtually bacteria-free; and a set of commercially available smokeless tobacco products. Each set of samples was subject to four different extraction protocols. The goal of our analysis is to determine whether the patterns of bias observed in each set of samples are the same, i.e., can we learn about the bias in the commercially available smokeless tobacco products by studying the Zymo cells model community? |
Associations between microbial communities and key chemical constituents in U.S. domestic moist snuff.
Tyxobert RE , Rivera AJ , Satten GA , Keong LM , Kuklenyik P , Lee GE , Lawler TS , Kimbrell JB , Stanfill SB , Valentin-Blasini L , Watson CH . PLoS One 2022 17 (5) e0267104 BACKGROUND: Smokeless tobacco (ST) products are widely used throughout the world and contribute to morbidity and mortality in users through an increased risk of cancers and oral diseases. Bacterial populations in ST contribute to taste, but their presence can also create carcinogenic, Tobacco-Specific N-nitrosamines (TSNAs). Previous studies of microbial communities in tobacco products lacked chemistry data (e.g. nicotine, TSNAs) to characterize the products and identify associations between carcinogen levels and taxonomic groups. This study uses statistical analysis to identify potential associations between microbial and chemical constituents in moist snuff products. METHODS: We quantitatively analyzed 38 smokeless tobacco products for TSNAs using liquid chromatography with tandem mass spectrometry (LC-MS/MS), and nicotine using gas chromatography with mass spectrometry (GC-MS). Moisture content determinations (by weight loss on drying), and pH measurements were also performed. We used 16S rRNA gene sequencing to characterize the microbial composition, and additionally measured total 16S bacterial counts using a quantitative PCR assay. RESULTS: Our findings link chemical constituents to their associated bacterial populations. We found core taxonomic groups often varied between manufacturers. When manufacturer and flavor were controlled for as confounding variables, the genus Lactobacillus was found to be positively associated with TSNAs. while the genera Enteractinococcus and Brevibacterium were negatively associated. Three genera (Corynebacterium, Brachybacterium, and Xanthomonas) were found to be negatively associated with nicotine concentrations. Associations were also investigated separately for products from each manufacturer. Products from one manufacturer had a positive association between TSNAs and bacteria in the genus Marinilactibacillus. Additionally, we found that TSNA levels in many products were lower compared with previously published chemical surveys. Finally, we observed consistent results when either relative or absolute abundance data were analyzed, while results from analyses of log-ratio-transformed abundances were divergent. |
Constraining PERMANOVA and LDM to within-set comparisons by projection improves the efficiency of analyses of matched sets of microbiome data.
Zhu Z , Satten GA , Mitchell C , Hu YJ . Microbiome 2021 9 (1) 133 BACKGROUND: Matched-set data arise frequently in microbiome studies. For example, we may collect pre- and post-treatment samples from a set of individuals, or use important confounding variables to match data from case participants to one or more control participants. Thus, there is a need for statistical methods for data comprised of matched sets, to test hypotheses against traits of interest (e.g., clinical outcomes or environmental factors) at the community level and/or the operational taxonomic unit (OTU) level. Optimally, these methods should accommodate complex data such as those with unequal sample sizes across sets, confounders varying within sets, and continuous traits of interest. METHODS: PERMANOVA is a commonly used distance-based method for testing hypotheses at the community level. We have also developed the linear decomposition model (LDM) that unifies the community-level and OTU-level tests into one framework. Here we present a new strategy that can be used with both PERMANOVA and the LDM for analyzing matched-set data. We propose to include an indicator variable for each set as covariates, so as to constrain comparisons between samples within a set, and also permute traits within each set, which can account for exchangeable sample correlations. The flexible nature of PERMANOVA and the LDM allows discrete or continuous traits or interactions to be tested, within-set confounders to be adjusted, and unbalanced data to be fully exploited. RESULTS: Our simulations indicate that our proposed strategy outperformed alternative strategies, including the commonly used one that utilizes restricted permutation only, in a wide range of scenarios. Using simulation, we also explored optimal designs for matched-set studies. The flexibility of PERMANOVA and the LDM for a variety of matched-set microbiome data is illustrated by the analysis of data from two real studies. CONCLUSIONS: Including set indicator variables and permuting within sets when analyzing matched-set data with PERMANOVA or the LDM is a strategy that performs well and is capable of handling the complex data structures that frequently occur in microbiome studies. Video Abstract. |
A Bottom-up Approach to Testing Hypotheses That Have a Branching Tree Dependence Structure, with Error Rate Control.
Li Y , Hu YJ , Satten GA . J Am Stat Assoc 2020 117 (538) 664-677 Modern statistical analyses often involve testing large numbers of hypotheses. In many situations, these hypotheses may have an underlying tree structure that both helps determine the order that tests should be conducted but also imposes a dependency between tests that must be accounted for. Our motivating example comes from testing the association between a trait of interest and groups of microbes that have been organized into operational taxonomic units (OTUs) or amplicon sequence variants (ASVs). Given p-values from association tests for each individual OTU or ASV, we would like to know if we can declare a certain species, genus, or higher taxonomic group to be associated with the trait. For this problem, a bottom-up testing algorithm that starts at the lowest level of the tree (OTUs or ASVs) and proceeds upward through successively higher taxonomic groupings (species, genus, family, etc.) is required. We develop such a bottom-up testing algorithm that controls a novel error rate that we call the false selection rate. By simulation, we also show that our approach is better at finding driver taxa, the highest level taxa below which there are dense association signals. We illustrate our approach using data from a study of the microbiome among patients with ulcerative colitis and healthy controls. Supplementary materials for this article are available online. © 2020 American Statistical Association. |
Testing hypotheses about the microbiome using the linear decomposition model (LDM).
Hu YJ , Satten GA . Bioinformatics 2020 36 (14) 4106-4115 MOTIVATION: Methods for analyzing microbiome data generally fall into one of two groups: tests of the global hypothesis of any microbiome effect, which do not provide any information on the contribution of individual operational taxonomic units (OTUs); and tests for individual OTUs, which do not typically provide a global test of microbiome effect. Without a unified approach, the findings of a global test may be hard to resolve with the findings at the individual OTU level. Further, many tests of individual OTU effects do not preserve the false discovery rate (FDR). RESULTS: We introduce the linear decomposition model (LDM), that provides a single analysis path that includes global tests of any effect of the microbiome, tests of the effects of individual OTUs while accounting for multiple testing by controlling the FDR, and a connection to distance-based ordination. The LDM accommodates both continuous and discrete variables (e.g., clinical outcomes, environmental factors) as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based p-values that can control for correlation. The LDM can also be applied to transformed data, and an "omnibus" test can easily combine results from analyses conducted on different transformation scales. We also provide a new implementation of PERMANOVA based on our approach. For global testing, our simulations indicate the LDM provided correct type I error and can have comparable power to existing distance-based methods. For testing individual OTUs, our simulations indicate the LDM controlled the FDR well. In contrast, DESeq2 often had inflated FDR; MetagenomeSeq generally had the lowest sensitivity. The flexibility of the LDM for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. We also show that our implementation of PERMANOVA can outperform existing implementations. AVAILABILITY: The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
Stability of the vaginal, oral, and gut microbiota across pregnancy among African American women: the effect of socioeconomic status and antibiotic exposure.
Dunlop AL , Knight AK , Satten GA , Cutler AJ , Wright ML , Mitchell RM , Read TD , Mulle J , Hertzberg VS , Hill CC , Smith AK , Corwin EJ . PeerJ 2019 7 e8004 Objective: A growing body of research has investigated the human microbiota and pregnancy outcomes, especially preterm birth. Most studies of the prenatal microbiota have focused on the vagina, with fewer investigating other body sites during pregnancy. Although pregnancy involves profound hormonal, immunological and metabolic changes, few studies have investigated either shifts in microbiota composition across pregnancy at different body sites or variation in composition at any site that may be explained by maternal characteristics. The purpose of this study was to investigate: (1) the stability of the vaginal, oral, and gut microbiota from early (8-14 weeks) through later (24-30 weeks) pregnancy among African American women according to measures of socioeconomic status, accounting for prenatal antibiotic use; (2) whether measures of socioeconomic status are associated with changes in microbiota composition over pregnancy; and (3) whether exposure to prenatal antibiotics mediate any observed associations between measures of socioeconomic status and stability of the vaginal, oral, and gut microbiota across pregnancy. Methods: We used paired vaginal, oral, or gut samples available for 16S rRNA gene sequencing from two time points in pregnancy (8-14 and 24-30 weeks) to compare within-woman changes in measures of alpha diversity (Shannon and Chao1) and beta-diversity (Bray-Curtis dissimilarity) among pregnant African American women (n = 110). Multivariable linear regression was used to examine the effect of level of education and prenatal health insurance as explanatory variables for changes in diversity, considering antibiotic exposure as a mediator, adjusting for age, obstetrical history, and weeks between sampling. Results: For the oral and gut microbiota, there were no significant associations between measures of socioeconomic status or prenatal antibiotic use and change in Shannon or Chao1 diversity. For the vaginal microbiota, low level of education (high school or less) was associated with an increase in Shannon and Chao1 diversity over pregnancy, with minimal attenuation when controlling for prenatal antibiotic use. Conversely, for within-woman Bray-Curtis dissimilarity for early compared to later pregnancy, low level of education and prenatal antibiotics were associated with greater dissimilarity for the oral and gut sites, with minimal attenuation when controlling for prenatal antibiotics, and no difference in dissimilarity for the vaginal site. Conclusions: Measures of maternal socioeconomic status are variably associated with changes in diversity across pregnancy for the vaginal, oral, and gut microbiota, with minimal attenuation by prenatal antibiotic exposure. Studies that evaluate stability of the microbiota across pregnancy in association with health outcomes themselves associated with socioeconomic status (such as preterm birth) should incorporate measures of socioeconomic status to avoid finding spurious relationships. |
Multisample adjusted U-statistics that account for confounding covariates
Satten GA , Kong M , Datta S . Stat Med 2018 37 (23) 3357-3372 Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants. |
Robust Inference of Population Structure from Next-Generation Sequencing Data with Systematic Differences in Sequencing.
Liao P , Satten GA , Hu YJ . Bioinformatics 2017 34 (7) 1157-1163 Motivation: Inferring population structure is important for both population genetics and genetic epidemiology. Principal components analysis (PCA) has been effective in ascertaining population structure with array genotype data but can be difficult to use with sequencing data, especially when low depth leads to uncertainty in called genotypes. Because PCA is sensitive to differences in variability, PCA using sequencing data can result in components that correspond to differences in sequencing quality (read depth and error rate), rather than differences in population structure. We demonstrate that even existing methods for PCA specifically designed for sequencing data can still yield biased conclusions when used with data having sequencing properties that are systematically different across different groups of samples (i.e., sequencing groups). This situation can arise in population genetics when combining sequencing data from different studies, or in genetic epidemiology when using historical controls such as samples from the 1000 Genomes Project. Results: To allow inference on population structure using PCA in these situations, we provide an approach that is based on using sequencing reads directly without calling genotypes. Our approach is to adjust the data from different sequencing groups to have the same read depth and error rate so that PCA does not generate spurious components representing sequencing quality. To accomplish this, we have developed a subsampling procedure to match the depth distributions in different sequencing groups, and a read-flipping procedure to match the error rates. We average over subsamples and read flips to minimize loss of information. We demonstrate the utility of our approach using two datasets from 1000 Genomes, and further evaluate it using simulation studies. Availability and Implementation: TASER-PC software is publicly available at http://web1.sph.emory.edu/users/yhu30/software.html. Contact: yijuan.hu@emory.edu. Supplementary information: Supplementary data are available at Bioinformatics online. |
Changes in vaginal community state types reflect major shifts in the microbiome
Brooks JP , Buck GA , Chen G , Diao L , Edwards DJ , Fettweis JM , Huzurbazar S , Rakitin A , Satten GA , Smirnova E , Waks Z , Wright ML , Yanover C , Zhou YH . Microb Ecol Health Dis 2017 28 (1) 1303265 Background: Recent studies of various human microbiome habitats have revealed thousands of bacterial species and the existence of large variation in communities of microorganisms in the same habitats across individual human subjects. Previous efforts to summarize this diversity, notably in the human gut and vagina, have categorized microbiome profiles by clustering them into community state types (CSTs). The functional relevance of specific CSTs has not been established. Objective: We investigate whether CSTs can be used to assess dynamics in the microbiome. Design: We conduct a re-analysis of five sequencing-based microbiome surveys derived from vaginal samples with repeated measures. Results: We observe that detection of a CST transition is largely insensitive to choices in methods for normalization or clustering. We find that healthy subjects persist in a CST for two to three weeks or more on average, while those with evidence of dysbiosis tend to change more often. Changes in CST can be gradual or occur over less than one day. Upcoming CST changes and switches to high-risk CSTs can be predicted with high accuracy in certain scenarios. Finally, we observe that presence of Gardnerella vaginalis is a strong predictor of an upcoming CST change. Conclusion: Overall, our results show that the CST concept is useful for studying microbiome dynamics. |
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.
Liao P , Satten GA , Hu YJ . Genet Epidemiol 2017 41 (5) 375-387 A fundamental challenge in analyzing next-generation sequencing (NGS) data is to determine an individual's genotype accurately, as the accuracy of the inferred genotype is essential to downstream analyses. Correctly estimating the base-calling error rate is critical to accurate genotype calls. Phred scores that accompany each call can be used to decide which calls are reliable. Some genotype callers, such as GATK and SAMtools, directly calculate the base-calling error rates from phred scores or recalibrated base quality scores. Others, such as SeqEM, estimate error rates from the read data without using any quality scores. It is also a common quality control procedure to filter out reads with low phred scores. However, choosing an appropriate phred score threshold is problematic as a too high threshold may lose data, while a too low threshold may introduce errors. We propose a new likelihood-based genotype-calling approach that exploits all reads and estimates the per-base error rates by incorporating phred scores through a logistic regression model. The approach, which we call PhredEM, uses the expectation-maximization (EM) algorithm to obtain consistent estimates of genotype frequencies and logistic regression parameters. It also includes a simple, computationally efficient screening algorithm to identify loci that are estimated to be monomorphic, so that only loci estimated to be nonmonomorphic require application of the EM algorithm. Like GATK, PhredEM can be used together with a linkage-disequilibrium-based method such as Beagle, which can further improve genotype calling as a refinement step. We evaluate the performance of PhredEM using both simulated data and real sequencing data from the UK10K project and the 1000 Genomes project. The results demonstrate that PhredEM performs better than either GATK or SeqEM, and that PhredEM is an improved, robust, and widely applicable genotype-calling approach for NGS studies. The relevant software is freely available. |
Restoring the Duality between Principal Components of a Distance Matrix and Linear Combinations of Predictors, with Application to Studies of the Microbiome.
Satten GA , Tyx RE , Rivera AJ , Stanfill S . PLoS One 2017 12 (1) e0168131 Appreciation of the importance of the microbiome is increasing, as sequencing technology has made it possible to ascertain the microbial content of a variety of samples. Studies that sequence the 16S rRNA gene, ubiquitous in and nearly exclusive to bacteria, have proliferated in the medical literature. After sequences are binned into operational taxonomic units (OTUs) or species, data from these studies are summarized in a data matrix with the observed counts from each OTU for each sample. Analysis often reduces these data further to a matrix of pairwise distances or dissimilarities; plotting the first two or three principal components (PCs) of this distance matrix often reveals meaningful groupings in the data. However, once the distance matrix is calculated, it is no longer clear which OTUs or species are important to the observed clustering; further, the PCs are hard to interpret and cannot be calculated for subsequent observations. We show how to construct approximate decompositions of the data matrix that pair PCs with linear combinations of OTU or species frequencies, and show how these decompositions can be used to construct biplots, select important OTUs and partition the variability in the data matrix into contributions corresponding to PCs of an arbitrary distance or dissimilarity matrix. To illustrate our approach, we conduct an analysis of the bacteria found in 45 smokeless tobacco samples. |
Dysbiosis, inflammation, and response to treatment: a longitudinal study of pediatric subjects with newly diagnosed inflammatory bowel disease.
Shaw KA , Bertha M , Hofmekler T , Chopra P , Vatanen T , Srivatsa A , Prince J , Kumar A , Sauer C , Zwick ME , Satten GA , Kostic AD , Mulle JG , Xavier RJ , Kugathasan S . Genome Med 2016 8 (1) 75 BACKGROUND: Gut microbiome dysbiosis has been demonstrated in subjects with newly diagnosed and chronic inflammatory bowel disease (IBD). In this study we sought to explore longitudinal changes in dysbiosis and ascertain associations between dysbiosis and markers of disease activity and treatment outcome. METHODS: We performed a prospective cohort study of 19 treatment-naive pediatric IBD subjects and 10 healthy controls, measuring fecal calprotectin and assessing the gut microbiome via repeated stool samples. Associations between clinical characteristics and the microbiome were tested using generalized estimating equations. Random forest classification was used to predict ultimate treatment response (presence of mucosal healing at follow-up colonoscopy) or non-response using patients' pretreatment samples. RESULTS: Patients with Crohn's disease had increased markers of inflammation and dysbiosis compared to controls. Patients with ulcerative colitis had even higher inflammation and dysbiosis compared to those with Crohn's disease. For all cases, the gut microbial dysbiosis index associated significantly with clinical and biological measures of disease severity, but did not associate with treatment response. We found differences in specific gut microbiome genera between cases/controls and responders/non-responders including Akkermansia, Coprococcus, Fusobacterium, Veillonella, Faecalibacterium, and Adlercreutzia. Using pretreatment microbiome data in a weighted random forest classifier, we were able to obtain 76.5 % accuracy for prediction of responder status. CONCLUSIONS: Patient dysbiosis improved over time but persisted even among those who responded to treatment and achieved mucosal healing. Although dysbiosis index was not significantly different between responders and non-responders, we found specific genus-level differences. We found that pretreatment microbiome signatures are a promising avenue for prediction of remission and response to treatment. |
Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls.
Hu YJ , Liao P , Johnston HR , Allen AS , Satten GA . PLoS Genet 2016 12 (5) e1006040 Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available. |
Characterization of Bacterial Communities in Selected Smokeless Tobacco Products Using 16S rDNA Analysis.
Tyx RE , Stanfill SB , Keong LM , Rivera AJ , Satten GA , Watson CH . PLoS One 2016 11 (1) e0146939 The bacterial communities present in smokeless tobacco (ST) products have not previously reported. In this study, we used Next Generation Sequencing to study the bacteria present in U.S.-made dry snuff, moist snuff and Sudanese toombak. Sample diversity and taxonomic abundances were investigated in these products. A total of 33 bacterial families from four phyla, Actinobacteria, Firmicutes, Proteobacteria and Bacteroidetes, were identified. U.S.-produced dry snuff products contained a diverse distribution of all four phyla. Moist snuff products were dominated by Firmicutes. Toombak samples contained mainly Actinobacteria and Firmicutes (Aerococcaceae, Enterococcaceae, and Staphylococcaceae). The program PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) was used to impute the prevalence of genes encoding selected bacterial toxins, antibiotic resistance genes and other pro-inflammatory molecules. PICRUSt also predicted the presence of specific nitrate reductase genes, whose products can contribute to the formation of carcinogenic nitrosamines. Characterization of microbial community abundances and their associated genomes gives us an indication of the presence or absence of pathways of interest and can be used as a foundation for further investigation into the unique microbiological and chemical environments of smokeless tobacco products. |
Impact of the 5As brief counseling on smoking cessation among pregnant clients of Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) clinics in Ohio
Olaiya O , Sharma AJ , Tong VT , Dee D , Quinn C , Agaku IT , Conrey EJ , Kuiper NM , Satten GA . Prev Med 2015 81 438-43 OBJECTIVES: We assessed whether smoking cessation improved among pregnant smokers who attended Women, Infants and Children (WIC) Supplemental Nutrition Program clinics trained to implement a brief smoking cessation counseling intervention, the 5As: ask, advise, assess, assist, arrange. METHODS: In Ohio, staff in 38 WIC clinics were trained to deliver the 5As from 2006 through 2010. Using 2005-2011 Pregnancy Nutrition Surveillance System data, we performed conditional logistic regression, stratified by clinic, to estimate the relationship between women's exposure to the 5As and the odds of self-reported quitting during pregnancy. Reporting bias for quitting was assessed by examining whether differences in infants' birth weight by quit status differed by clinic training status. RESULTS: Of 71,526 pregnant smokers at WIC enrollment, 23% quit. Odds of quitting were higher among women who attended a clinic after versus before clinic staff was trained (adjusted odds ratio, 1.16; 95% confidence interval, 1.04-1.29). The adjusted mean infant birth weight was, on average, 96g higher among women who reported quitting (P<0.0001), regardless of clinic training status. CONCLUSIONS: Training all Ohio WIC clinics to deliver the 5As may promote quitting among pregnant smokers, and thus is an important strategy to improve maternal and child health outcomes. |
A statistical approach for rare-variant association testing in affected sibships.
Epstein MP , Duncan R , Ware EB , Jhun MA , Bielak LF , Zhao W , Smith JA , Peyser PA , Kardia SL , Satten GA . Am J Hum Genet 2015 96 (4) 543-54 Sequencing and exome-chip technologies have motivated development of novel statistical tests to identify rare genetic variation that influences complex diseases. Although many rare-variant association tests exist for case-control or cross-sectional studies, far fewer methods exist for testing association in families. This is unfortunate, because cosegregation of rare variation and disease status in families can amplify association signals for rare variants. Many researchers have begun sequencing (or genotyping via exome chips) familial samples that were either recently collected or previously collected for linkage studies. Because many linkage studies of complex diseases sampled affected sibships, we propose a strategy for association testing of rare variants for use in this study design. The logic behind our approach is that rare susceptibility variants should be found more often on regions shared identical by descent by affected sibling pairs than on regions not shared identical by descent. We propose both burden and variance-component tests of rare variation that are applicable to affected sibships of arbitrary size and that do not require genotype information from unaffected siblings or independent controls. Our approaches are robust to population stratification and produce analytic p values, thereby enabling our approach to scale easily to genome-wide studies of rare variation. We illustrate our methods by using simulated data and exome chip data from sibships ascertained for hypertension collected as part of the Genetic Epidemiology Network of Arteriopathy (GENOA) study. |
Genetic Analysis Workshop 18: Methods and strategies for analyzing human sequence and phenotype data in members of extended pedigrees.
Bickeboller H , Bailey JN , Beyene J , Cantor RM , Cordell HJ , Culverhouse RC , Engelman CD , Fardo DW , Ghosh S , Konig IR , Lorenzo Bermejo J , Melton PE , Santorico SA , Satten GA , Sun L , Tintle NL , Ziegler A , MacCluer JW , Almasy L . BMC Proc 2014 8 S1 Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication. |
Population-based association and gene by environment interactions in Genetic Analysis Workshop 18.
Satten GA , Biswas S , Papachristou C , Turkmen A , Konig IR . Genet Epidemiol 2014 38 Suppl 1 S49-56 In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study. |
Utilizing population controls in rare-variant case-parent association tests.
Jiang Y , Satten GA , Han Y , Epstein MP , Heinzen EL , Goldstein DB , Allen AS . Am J Hum Genet 2014 94 (6) 845-53 There is great interest in detecting associations between human traits and rare genetic variation. To address the low power implicit in single-locus tests of rare genetic variants, many rare-variant association approaches attempt to accumulate information across a gene, often by taking linear combinations of single-locus contributions to a statistic. Using the right linear combination is key-an optimal test will up-weight true causal variants, down-weight neutral variants, and correctly assign the direction of effect for causal variants. Here, we propose a procedure that exploits data from population controls to estimate the linear combination to be used in an case-parent trio rare-variant association test. Specifically, we estimate the linear combination by comparing population control allele frequencies with allele frequencies in the parents of affected offspring. These estimates are then used to construct a rare-variant transmission disequilibrium test (rvTDT) in the case-parent data. Because the rvTDT is conditional on the parents' data, using parental data in estimating the linear combination does not affect the validity or asymptotic distribution of the rvTDT. By using simulation, we show that our new population-control-based rvTDT can dramatically improve power over rvTDTs that do not use population control information across a wide variety of genetic architectures. It also remains valid under population stratification. We apply the approach to a cohort of epileptic encephalopathy (EE) trios and find that dominant (or additive) inherited rare variants are unlikely to play a substantial role within EE genes previously identified through de novo mutation studies. |
Robust regression analysis of copy number variation data based on a univariate score.
Satten GA , Allen AS , Ikeda M , Mulle JG , Warren ST . PLoS One 2014 9 (2) e86272 MOTIVATION: The discovery that copy number variants (CNVs) are widespread in the human genome has motivated development of numerous algorithms that attempt to detect CNVs from intensity data. However, all approaches are plagued by high false discovery rates. Further, because CNVs are characterized by two dimensions (length and intensity) it is unclear how to order called CNVs to prioritize experimental validation. RESULTS: We developed a univariate score that correlates with the likelihood that a CNV is true. This score can be used to order CNV calls in such a way that calls having larger scores are more likely to overlap a true CNV. We developed cnv.beast, a computationally efficient algorithm for calling CNVs that uses robust backward elimination regression to keep CNV calls with scores that exceed a user-defined threshold. Using an independent dataset that was measured using a different platform, we validated our score and showed that our approach performed better than six other currently-available methods. AVAILABILITY: cnv.beast is available at http://www.duke.edu/~asallen/Software.html. |
Effects of maternal smokeless tobacco use on selected pregnancy outcomes in Alaska Native women: a case-control study
England LJ , Kim SY , Shapiro-Mendoza CK , Wilson HG , Kendrick JS , Satten GA , Lewis CA , Tucker MJ , Callaghan WM . Acta Obstet Gynecol Scand 2013 92 (6) 648-55 OBJECTIVE: To examine the potential effects of prenatal smokeless tobacco use on selected birth outcomes. DESIGN: A population-based, case-control study using a retrospective medical record review. POPULATION: Singleton deliveries 1997-2005 to Alaska Native women residing in western Alaska. METHODS: Hospital discharge codes were used to identify potential case deliveries and a random control sample. Data on tobacco use and confirmation of pregnancy outcomes were abstracted from medical records for 1123 deliveries. Logistic regression was used to examine associations between tobacco use and pregnancy outcomes. Adjusted odds ratios (OR), 95% confidence intervals (95% CI), and p-values were calculated. MAIN OUTCOMES MEASURES: Preterm delivery, pregnancy-associated hypertension, and placental abruption. RESULTS: In unadjusted analysis, smokeless tobacco use was not significantly associated with preterm delivery (OR 1.44, 95% CI 0.97-2.15). After adjustment for parity, pre-pregnancy body mass index, and maternal age, the point estimate was attenuated and remained non-significant. No significant associations were observed between smokeless tobacco use and pregnancy-associated hypertension (adjusted OR 0.92, 95% CI 0.56-1.51) or placental abruption (adjusted OR 1.11, 95% CI 0.53-2.33). CONCLUSIONS: Prenatal smokeless tobacco use does not appear to reduce risk of pregnancy-associated hypertension or to substantially increase risk of abruption. An association between smokeless tobacco and preterm delivery could not be ruled out. Components in tobacco other than nicotine likely play a major role in decreased pre-eclampsia risk in smokers. Nicotine adversely affects fetal neurodevelopment and our results should not be construed to mean that smokeless tobacco use is safe during pregnancy. |
A permutation procedure to correct for confounders in case-control studies, including tests of rare variation.
Epstein MP , Duncan R , Jiang Y , Conneely KN , Allen AS , Satten GA . Am J Hum Genet 2012 91 (2) 215-23 Many case-control tests of rare variation are implemented in statistical frameworks that make correction for confounders like population stratification difficult. Simple permutation of disease status is unacceptable for resolving this issue because the replicate data sets do not have the same confounding as the original data set. These limitations make it difficult to apply rare-variant tests to samples in which confounding most likely exists, e.g., samples collected from admixed populations. To enable the use of such rare-variant methods in structured samples, as well as to facilitate permutation tests for any situation in which case-control tests require adjustment for confounding covariates, we propose to establish the significance of a rare-variant test via a modified permutation procedure. Our procedure uses Fisher's noncentral hypergeometric distribution to generate permuted data sets with the same structure present in the actual data set such that inference is valid in the presence of confounding factors. We use simulated sequence data based on coalescent models to show that our permutation strategy corrects for confounding due to population stratification that, if ignored, would otherwise inflate the size of a rare-variant test. We further illustrate the approach by using sequence data from the Dallas Heart Study of energy metabolism traits. Researchers can implement our permutation approach by using the R package BiasedUrn. |
Age-associated DNA methylation in pediatric populations.
Alisch RS , Barwick BG , Chopra P , Myrick LK , Satten GA , Conneely KN , Warren ST . Genome Res 2012 22 (4) 623-32 DNA methylation (DNAm) plays diverse roles in human biology, but this dynamic epigenetic mark remains far from fully characterized. Although earlier studies uncovered loci that undergo age-associated DNAm changes in adults, little is known about such changes during childhood. Despite profound DNAm plasticity during embryogenesis, monozygotic twins show indistinguishable childhood methylation, suggesting that DNAm is highly coordinated throughout early development. Here we examine the methylation of 27,578 CpG dinucleotides in peripheral blood DNA from a cross-sectional study of 398 boys, aged 3-17 yr, and find significant age-associated changes in DNAm at 2078 loci. These findings correspond well with pyrosequencing data and replicate in a second pediatric population (N = 78). Moreover, we report a deficit of age-related loci on the X chromosome, a preference for specific nucleotides immediately surrounding the interrogated CpG dinucleotide, and a primary association with developmental and immune ontological functions. Meta-analysis (N = 1158) with two adult populations reveals that despite a significant overlap of age-associated loci, most methylation changes do not follow a lifelong linear pattern due to a threefold to fourfold higher rate of change in children compared with adults; consequently, the vast majority of changes are more accurately modeled as a function of logarithmic age. We therefore conclude that age-related DNAm changes in peripheral blood occur more rapidly during childhood and are imperfectly accounted for by statistical corrections that are linear in age, further suggesting that future DNAm studies should be matched closely for age. |
California Very Preterm Birth Study: design and characteristics of the population- and biospecimen bank-based nested case-control study
Kharrazi M , Pearl M , Yang J , Delorenze GN , Bean CJ , Callaghan WM , Grant A , Lackritz E , Romero R , Satten GA , Simhan H , Torres AR , Westover JB , Yolken R , Williamson DM . Paediatr Perinat Epidemiol 2012 26 (3) 250-263 Very preterm birth (VPTB) is a leading cause of infant mortality, morbidity and racial disparity in the US. The underlying causes of VPTB are multiple and poorly understood. The California Very Preterm Birth Study was conducted to discover maternal and infant genetic and environmental factors associated with VPTB. This paper describes the study design, population, data and specimen collection, laboratory methods and characteristics of the study population. Using a large, population-based cohort created through record linkage of livebirths delivered from 2000 to 2007 in five counties of southern California, and existing data and banked specimens from statewide prenatal and newborn screening, 1100 VPTB cases and 796 control mother-infant pairs were selected for study (385/200 White, 385/253 Hispanic and 330/343 Black cases/controls, respectively). Medical record abstraction of cases was conducted at over 50 hospitals to identify spontaneous VPTB, improve accuracy of gestational age, obtain relevant clinical data and exclude cases that did not meet eligibility criteria. VPTB was defined as birth at <32 weeks in Whites and Hispanics and <34 weeks in Blacks. Approximately 55% of all VPTBs were spontaneous and 45% had medical indications or other exclusions. Of the spontaneous VPTBs, approximately 41% were reported to have chorioamnionitis. While the current focus of the California Very Preterm Birth Study is to assess the role of candidate genetic markers on spontaneous VPTB, its design enables the pursuit of other research opportunities to identify social, clinical and biological determinants of different types of VPTB with the ultimate aim of reducing infant mortality, morbidity and racial disparities in these health outcomes in the US and elsewhere. |
Maternal smokeless tobacco use in Alaska Native women and singleton infant birth size
England LJ , Kim SY , Shapiro-Mendoza CK , Wilson HG , Kendrick JS , Satten GA , Lewis CA , Whittern P , Tucker MJ , Callaghan WM . Acta Obstet Gynecol Scand 2011 91 (1) 93-103 OBJECTIVE: To examine the effects of maternal prenatal smokeless tobacco use on infant birth size. DESIGN: A retrospective medical record review of 502 randomly selected deliveries. POPULATION: Singleton deliveries to Alaska Native women residing in a defined geographical region in western Alaska, 1997-2005. MATERIAL AND METHODS: A regional medical center's electronic records were used to identify singleton deliveries. Data on maternal tobacco exposure and pregnancy outcomes were abstracted from medical records. Logistic models were used to estimate adjusted mean birthweight, length, and head circumference for deliveries to women who used no tobacco (n=121), used smokeless tobacco (n=237), or smoked cigarettes (n=59). Differences in mean birthweight, length, and head circumference, 95% confidence intervals, and p-values were calculated using non-users as the reference group. MAIN OUTCOME MEASURES: Infant birthweight, crown-heel length, and head circumference. RESULTS: After adjustment for gestational age and other potential confounders, the mean birthweight of infants of smokeless tobacco users was reduced by 78g compared with that of infants of non-users (p=0.18), and by 331g in infants of smokers (p<0.01). No association was found between maternal smokeless tobacco use and infant length or infant head circumference. CONCLUSIONS: We found a modest but non-significant reduction in the birthweight of infants of smokeless tobacco users compared with infants of tobacco non-users. Because smokeless tobacco contains many toxic compounds that could affect other pregnancy outcomes, results of this study should not be construed to mean that smokeless tobacco use is safe during pregnancy. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Oct 28, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure