Last data update: Oct 28, 2024. (Total: 48004 publications since 2009)
Records 1-13 (of 13 Records) |
Query Trace: Mirel L[original query] |
---|
Proposed framework for adopting privacy-preserving record linkage for public health action
Pathak A , Serrer L , Bhalla M , King R , Mirel LB , Srinivasan A , Baier P , Zapata D , David-Ferdon C , Luxenberg S , Gundlapalli AV . J Public Health Manag Pract 2024 OBJECTIVES: To propose a framework for adoption of privacy-preserving record linkage (PPRL) for public health applications. METHODS: Twelve interviews with subject matter experts (SMEs) were conducted virtually and coded using an inductive approach. A collaborative session was conducted with SMEs to identify key steps in the PPRL project lifecycle which informed development of a PPRL implementation checklist. RESULTS: This framework has 2 decision-making levels: the organization level and the project or program level. Organization-level considerations include PPRL governance, the optimal choice among approved PPRL solutions, the need for longitudinal linkages, the potential issue of vendor lock-in, and costs. Program-level considerations include characteristics of the PPRL use case, linkage quality and accuracy, data privacy and use, security thresholds, compatibility with data owners' data architecture, and trade-offs between open-source and commercial PPRL solutions. A PPRL implementation checklist was developed to guide public health practitioners considering PPRL for data linkage. CONCLUSIONS: The framework may be considered by public health entities to guide adoption and implementation of PPRL in public health research and surveillance. Public health experts may refer to this framework and the PPRL implementation checklist when determining the appropriateness of PPRL for specific use cases and implementation planning. |
Privacy preserving record linkage for public health action: opportunities and challenges
Pathak A , Serrer L , Zapata D , King R , Mirel LB , Sukalac T , Srinivasan A , Baier P , Bhalla M , David-Ferdon C , Luxenberg S , Gundlapalli AV . J Am Med Inform Assoc 2024 OBJECTIVES: To understand the landscape of privacy preserving record linkage (PPRL) applications in public health, assess estimates of PPRL accuracy and privacy, and evaluate factors for PPRL adoption. MATERIALS AND METHODS: A literature scan examined the accuracy, data privacy, and scalability of PPRL in public health. Twelve interviews with subject matter experts were conducted and coded using an inductive approach to identify factors related to PPRL adoption. RESULTS: PPRL has a high level of linkage quality and accuracy. PPRL linkage quality was comparable to that of clear text linkage methods (requiring direct personally identifiable information [PII]) for linkage across various settings and research questions. Accuracy of PPRL depended on several components, such as PPRL technique, and the proportion of missingness and errors in underlying data. Strategies to increase adoption include increasing understanding of PPRL, improving data owner buy-in, establishing governance structure and oversight, and developing a public health implementation strategy for PPRL. DISCUSSION: PPRL protects privacy by eliminating the need to share PII for linkage, but the accuracy and linkage quality depend on factors including the choice of PPRL technique and specific PII used to create encrypted identifiers. Large-scale implementations of PPRL linking millions of observations-including PCORnet, National Institutes for Health N3C, and the Centers for Disease Control and Prevention COVID-19 project have demonstrated the scalability of PPRL for public health applications. CONCLUSIONS: Applications of PPRL in public health have demonstrated their value for the public health community. Although gaps must be addressed before wide implementation, PPRL is a promising solution to data linkage challenges faced by the public health ecosystem. |
Evaluating data quality for blended data using a data quality framework
Parker JD , Mirel LB , Lee P , Mintz R , Tungate A , Vaidyanathan A . Stat J IAOS 2024 40 (1) 125-136 In 2020 the U.S. Federal Committee on Statistical Methodology (FCSM) released 'A Framework for Data Quality', organized by 11 dimensions of data quality grouped among three domains of quality (utility, objectivity, integrity). This paper addresses the use of the FCSM Framework for data quality assessments of blended data. The FCSM Framework applies to all types of data, however best practices for implementation have not been documented. We applied the FCSM Framework for three health-research related case studies. For each case study, assessments of data quality dimensions were performed to identify threats to quality, possible mitigations of those threats, and trade-offs among them. From these assessments the authors concluded: 1) data quality assessments are more complex in practice than anticipated and expert guidance and documentation are important; 2) each dimension may not be equally important for different data uses; 3) data quality assessments can be subjective and having a quantitative tool could help explain the results, however, quantitative assessments may be closely tied to the intended use of the dataset; 4) there are common trade-offs and mitigations for some threats to quality among dimensions. This paper is one of the first to apply the FCSM Framework to specific use-cases and illustrates a process for similar data uses. © 2024 - IOS Press. All rights reserved. |
A methodological assessment of privacy preserving record linkage using survey and administrative data
Mirel LB , Resnick DM , Aram J , Cox CS . Stat J IAOS 2022 38 (2) 413-421 BACKGROUND: The National Center for Health Statistics (NCHS) links data from surveys to administrative data sources, but privacy concerns make accessing new data sources difficult. Privacy-preserving record linkage (PPRL) is an alternative to traditional linkage approaches that may overcome this barrier. However, prior to implementing PPRL techniques it is important to understand their effect on data quality. METHODS: Results from PPRL were compared to results from an established linkage method, which uses unencrypted (plain text) identifiers and both deterministic and probabilistic techniques. The established method was used as the gold standard. Links performed with PPRL were evaluated for precision and recall. An initial assessment and a refined approach were implemented. The impact of PPRL on secondary data analysis, including match and mortality rates, was assessed. RESULTS: The match rates for all approaches were similar, 5.1% for the gold standard, 5.4% for the initial PPRL and 5.0% for the refined PPRL approach. Precision ranged from 93.8% to 98.9% and recall ranged from 98.7% to 97.8%, depending on the selection of tokens from PPRL. The impact of PPRL on secondary data analysis was minimal. DISCUSSION: The findings suggest PPRL works well to link patient records to the National Death Index (NDI) since both sources have a high level of non-missing personally identifiable information, especially among adults 65 and older who may also have a higher likelihood of linking to the NDI. CONCLUSION: The results from this study are encouraging for first steps for a statistical agency in the implementation of PPRL approaches, however, future research is still needed. © 2022-IOS Press. All rights reserved. |
Using supervised machine learning to identify efficient blocking schemes for record linkage.
Campbell SR , Resnick DM , Cox CS , Mirel LB . Stat J IAOS 2021 37 (2) 673-680 Record linkage enables survey data to be integrated with other data sources, expanding the analytic potential of both sources. However, depending on the number of records being linked, the processing time can be prohibitive. This paper describes a case study using a supervised machine learning algorithm, known as the Sequential Coverage Algorithm (SCA). The SCA was used to develop the join strategy for two data sources, the National Center for Health Statistics' (NCHS) 2016 National Hospital Care Survey (NHCS) and the Center for Medicare & Medicaid Services (CMS) Enrollment Database (EDB), during record linkage. Due to the size of the CMS data, common record joining methods (i.e. blocking) were used to reduce the number of pairs that need to be evaluated to identify the vast majority of matches. NCHS conducted a case study examining how the SCA improved the efficiency of blocking. This paper describes how the SCA was used to design the blocking used in this linkage. © 2021-IOS Press. All rights reserved. |
Using synthetic data to replace linkage derived elements: A case study
Resnick DM , Cox CS , Mirel LB . Health Serv Outcomes Res Methodol 2021 21 389-406 While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is to replace some of the information added through linkage with synthetic data elements. This paper describes a case study using the National Hospital Care Survey (NHCS), which collects patient records under a pledge of protecting patient privacy from a sample of U.S. hospitals for statistical analysis purposes. The NHCS data were linked to the National Death Index (NDI) to enhance the survey with mortality information. The added information from NDI linkage enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed causes of death, having it joined with the patient records increases the risk of patient re-identification (albeit only for deceased persons). For this reason, an approach was tested to develop synthetic data that uses models from survival analysis to replace vital status and actual dates-of-death with synthetic values and uses classification tree analysis to replace actual causes of death with synthesized causes of death. The degree to which analyses performed on the synthetic data replicate results from analysis on the actual data is measured by comparing survival analysis parameter estimates from both data files. Because synthetic data only have value to the degree that they can be used to produce statistical estimates that are like those based on the actual data, this evaluation is an essential first step in assessing the potential utility of synthetic mortality data. |
Cancer Informatics for Cancer Centers (CI4CC): Building a Community Focused on Sharing Ideas and Best Practices to Improve Cancer Care and Patient Outcomes.
Barnholtz-Sloan JS , Rollison DE , Basu A , Borowsky AD , Bui A , DiGiovanna J , Garcia-Closas M , Genkinger JM , Gerke T , Induni M , Lacey JVJr , Mirel L , Permuth JB , Saltz J , Shenkman EA , Ulrich CM , Zheng WJ , Nadaf S , Kibbe WA . JCO Clin Cancer Inform 2020 4 108-116 Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. Although each of the participating cancer centers is structured differently, and leaders' titles vary, we know firsthand there are similarities in both the issues we face and the solutions we achieve. As a consortium, we have initiated a dedicated listserv, an open-initiatives program, and targeted biannual face-to-face meetings. These meetings are a place to review our priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues we, as informatics leaders, individually face at our respective institutions and cancer centers. Here we provide a brief history of the CI4CC organization and meeting highlights from the latest CI4CC meeting that took place in Napa, California from October 14-16, 2019. The focus of this meeting was "intersections between informatics, data science, and population science." We conclude with a discussion on "hot topics" on the horizon for cancer informatics. |
Using linked survey paradata to improve sampling strategies in the Medical Expenditure Panel Survey
Mirel LB , Chowdhury SR . J Off Stat 2017 33 (2) 367-383 Using paradata from a prior survey that is linked to a new survey can help a survey organization develop more effective sampling strategies. One example of this type of linkage or subsampling is between the National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS). MEPS is a nationally representative sample of the U.S. civilian, noninstitutionalized population based on a complex multi-stage sample design. Each year a new sample is drawn as a subsample of households from the prior year’s NHIS. The main objective of this article is to examine how paradata from a prior survey can be used in developing a sampling scheme in a subsequent survey. A framework for optimal allocation of the sample in substrata formed for this purpose is presented and evaluated for the relative effectiveness of alternative substratification schemes. The framework is applied, using real MEPS data, to illustrate how utilizing paradata from the linked survey offers the possibility of making improvements to the sampling scheme for the subsequent survey. The improvements aim to reduce the data collection costs while maintaining or increasing effective responding sample sizes and response rates for a harder to reach population. |
Characterization of large structural genetic mosaicism in human autosomes.
Machiela MJ , Zhou W , Sampson JN , Dean MC , Jacobs KB , Black A , Brinton LA , Chang IS , Chen C , Chen C , Chen K , Cook LS , Crous Bou M , De Vivo I , Doherty J , Friedenreich CM , Gaudet MM , Haiman CA , Hankinson SE , Hartge P , Henderson BE , Hong YC , Hosgood HD 3rd , Hsiung CA , Hu W , Hunter DJ , Jessop L , Kim HN , Kim YH , Kim YT , Klein R , Kraft P , Lan Q , Lin D , Liu J , Le Marchand L , Liang X , Lissowska J , Lu L , Magliocco AM , Matsuo K , Olson SH , Orlow I , Park JY , Pooler L , Prescott J , Rastogi R , Risch HA , Schumacher F , Seow A , Setiawan VW , Shen H , Sheng X , Shin MH , Shu XO , VanDen Berg D , Wang JC , Wentzensen N , Wong MP , Wu C , Wu T , Wu YL , Xia L , Yang HP , Yang PC , Zheng W , Zhou B , Abnet CC , Albanes D , Aldrich MC , Amos C , Amundadottir LT , Berndt SI , Blot WJ , Bock CH , Bracci PM , Burdett L , Buring JE , Butler MA , Carreon T , Chatterjee N , Chung CC , Cook MB , Cullen M , Davis FG , Ding T , Duell EJ , Epstein CG , Fan JH , Figueroa JD , Fraumeni JF Jr , Freedman ND , Fuchs CS , Gao YT , Gapstur SM , Patino-Garcia A , Garcia-Closas M , Gaziano JM , Giles GG , Gillanders EM , Giovannucci EL , Goldin L , Goldstein AM , Greene MH , Hallmans G , Harris CC , Henriksson R , Holly EA , Hoover RN , Hu N , Hutchinson A , Jenab M , Johansen C , Khaw KT , Koh WP , Kolonel LN , Kooperberg C , Krogh V , Kurtz RC , LaCroix A , Landgren A , Landi MT , Li D , Liao LM , Malats N , McGlynn KA , McNeill LH , McWilliams RR , Melin BS , Mirabello L , Peplonska B , Peters U , Petersen GM , Prokunina-Olsson L , Purdue M , Qiao YL , Rabe KG , Rajaraman P , Real FX , Riboli E , Rodriguez-Santiago B , Rothman N , Ruder AM , Savage SA , Schwartz AG , Schwartz KL , Sesso HD , Severi G , Silverman DT , Spitz MR , Stevens VL , Stolzenberg-Solomon R , Stram D , Tang ZZ , Taylor PR , Teras LR , Tobias GS , Viswanathan K , Wacholder S , Wang Z , Weinstein SJ , Wheeler W , White E , Wiencke JK , Wolpin BM , Wu X , Wunder JS , Yu K , Zanetti KA , Zeleniuch-Jacquotte A , Ziegler RG , de Andrade M , Barnes KC , Beaty TH , Bierut LJ , Desch KC , Doheny KF , Feenstra B , Ginsburg D , Heit JA , Kang JH , Laurie CA , Li JZ , Lowe WL , Marazita ML , Melbye M , Mirel DB , Murray JC , Nelson SC , Pasquale LR , Rice K , Wiggs JL , Wise A , Tucker M , Perez-Jurado LA , Laurie CC , Caporaso NE , Yeager M , Chanock SJ . Am J Hum Genet 2015 96 (3) 487-97 Analyses of genome-wide association study (GWAS) data have revealed that detectable genetic mosaicism involving large (>2 Mb) structural autosomal alterations occurs in a fraction of individuals. We present results for a set of 24,849 genotyped individuals (total GWAS set II [TGSII]) in whom 341 large autosomal abnormalities were observed in 168 (0.68%) individuals. Merging data from the new TGSII set with data from two prior reports (the Gene-Environment Association Studies and the total GWAS set I) generated a large dataset of 127,179 individuals; we then conducted a meta-analysis to investigate the patterns of detectable autosomal mosaicism (n = 1,315 events in 925 [0.73%] individuals). Restricting to events >2 Mb in size, we observed an increase in event frequency as event size decreased. The combined results underscore that the rate of detectable mosaicism increases with age (p value = 5.5 x 10(-31)) and is higher in men (p value = 0.002) but lower in participants of African ancestry (p value = 0.003). In a subset of 47 individuals from whom serial samples were collected up to 6 years apart, complex changes were noted over time and showed an overall increase in the proportion of mosaic cells as age increased. Our large combined sample allowed for a unique ability to characterize detectable genetic mosaicism involving large structural events and strengthens the emerging evidence of non-random erosion of the genome in the aging population. |
The prevalence of using iodine-containing supplements is low among reproductive-age women, National Health and Nutrition Examination Survey 1999-2006
Gahche JJ , Bailey RL , Mirel LB , Dwyer JT . J Nutr 2013 143 (6) 872-7 During pregnancy, the iodine requirement rises to meet demands for neurological development and fetal growth. If these requirements are not met, irreversible pathological cognitive and behavioral changes to the fetus may ensue. This study estimated the prevalence of iodine-containing dietary supplement (DS) use and intakes of iodine from DSs among pregnant women and nonpregnant women of reproductive age (15-39 y) who were interviewed and examined in NHANES 1999-2006 (n = 6404). Although 77.5% of pregnant women reported taking one or more DSs in the past 30 d, only 22.3% consumed an iodine-containing supplement. Most pregnant women reported using one DS and reported taking this product daily. The vast majority of iodine-containing DSs reported by pregnant women claimed an iodine content of 150 mcg iodine/serving on the label. Pregnant women using at least one DS containing iodine had a mean daily iodine intake of 122 mcg/d from supplements; the median value was 144 mcg/d. Median urinary iodine concentrations (UICs) were similar for pregnant and nonpregnant women in the population aged 15-39 y. The median UIC was 148 mcg/L for pregnant women and 133 mcg/L for nonpregnant women. The WHO has established a cutoff for insufficient iodine intake at <150 mcg/L for pregnant women and <100 mg/L for those who are not pregnant. This suggests that as a population, we may not be meeting adequate intakes of iodine for pregnant women. More research is needed on the iodine intakes of pregnant women and women of reproductive age on their total iodine intake from all sources, not just DSs. |
Serum soluble transferrin receptor concentrations in US preschool children and non-pregnant women of childbearing age from the National Health and Nutrition Examination Survey 2003-2010
Mei Z , Pfeiffer CM , Looker AC , Flores-Ayala RC , Lacher DA , Mirel LB , Grummer-Strawn LM . Clin Chim Acta 2012 413 1479-84 BACKGROUND: Serum soluble transferrin receptor (sTfR) is recommended as a sensitive and accurate measure of iron deficiency (ID) in populations when only a single indicator can be used. The lack of assay standardization and of representative data on the distribution of sTfR in at-risk populations currently limits its utility. METHODS: Using data from NHANES 2003-2010, we examined the distribution of sTfR and developed assay-specific cutoff values for defining elevated sTfR in 2 US populations groups: children aged 1-5 y (n=2820) and non-pregnant women aged 15-49 y (n=6575). RESULTS: On average, children had higher geometric mean sTfR concentrations (4.09mg/l; 95% CI: 4.04-4.14) than non-pregnant women (3.31mg/l; 95% CI: 3.26-3.35) (p<0.001). Among children, those aged 1-2 y (compared to those aged 3-5 y), boys (compared to girls), and non-Hispanic black (NHB) children (compared to non-Hispanic white (NHW) and Mexican-American (MA) children) had higher sTfR concentrations. Among non-pregnant women, adolescents (15-19 y) had higher sTfR concentrations than adults aged 20-34 y but not compared to adults aged 35-49 y; NHB women (compared to NHW and MA women) and multiparous women (compared to nulliparous women) had higher sTfR concentrations. The derived cutoff values (97.5th percentile in a defined healthy reference population) for defining elevated sTfR in the US were 6.00mg/l for children 1-5 y and 5.33mg/l for non-pregnant women 15-49 y. CONCLUSIONS: A different sTfR cutoff value may be needed in children and non-pregnant women to define ID. |
Levels of plasma trans-fatty acids in non-Hispanic white adults in the United States in 2000 and 2009
Vesper HW , Kuiper HC , Mirel LB , Johnson CL , Pirkle JL . JAMA 2012 307 (6) 562-3 Levels of trans-fatty acids (TFAs) in blood come from natural sources, such as milk, and industrial sources, such as partially hydrogenated vegetable oils. Dietary intake of TFAs increases low-density lipoprotein cholesterol (LDL-C) and has other adverse metabolic effects.1 Changing to a diet low in TFAs may lower the LDL-C level and decrease the risk for cardiovascular disease. To assist consumers, the Food and Drug Administration amended its regulations in 2003 to require that TFA content be declared on the nutrition label of foods and dietary supplements.2 Some community and state health departments have required restaurants to limit TFAs and reductions have been shown in supermarket and restaurant products. | The public health impact of these changes on TFA blood levels in the population is unknown. A preliminary study was conducted to determine plasma concentrations of TFAs in a subset of non-Hispanic white adults in the National Health and Nutrition Examination Survey (NHANES) in 2000 and 2009. |
Multiple imputation of missing dual-energy X-ray absorptiometry data in the National Health and Nutrition Examination Survey
Schenker N , Borrud LG , Burt VL , Curtin LR , Flegal KM , Hughes J , Johnson CL , Looker AC , Mirel L . Stat Med 2010 30 (3) 260-76 In 1999, dual-energy x-ray absorptiometry (DXA) scans were added to the National Health and Nutrition Examination Survey (NHANES) to provide information on soft tissue composition and bone mineral content. However, in 1999-2004, DXA data were missing in whole or in part for about 21 per cent of the NHANES participants eligible for the DXA examination; and the missingness is associated with important characteristics such as body mass index and age. To handle this missing-data problem, multiple imputation of the missing DXA data was performed. Several features made the project interesting and challenging statistically, including the relationship between missingness on the DXA measures and the values of other variables; the highly multivariate nature of the variables being imputed; the need to transform the DXA variables during the imputation process; the desire to use a large number of non-DXA predictors, many of which had small amounts of missing data themselves, in the imputation models; the use of lower bounds in the imputation procedure; and relationships between the DXA variables and other variables, which helped both in creating and evaluating the imputations. This paper describes the imputation models, methods, and evaluations for this publicly available data resource and demonstrates properties of the imputations via examples of analyses of the data. The analyses suggest that imputation helps to correct biases that occur in estimates based on the data without imputation, and that it helps to increase the precision of estimates as well. Moreover, multiple imputation usually yields larger estimated standard errors than those obtained with single imputation. Published in 2010 by John Wiley & Sons, Ltd. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Oct 28, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure