Last data update: Apr 18, 2025. (Total: 49119 publications since 2009)
Records 1-8 (of 8 Records) |
Query Trace: Oganian A[original query] |
---|
Interior-point methods for monotone linear complementarity problems based on the new kernel function with applications to control tabular adjustment problem
Lesaja G , Oganian A , Williams T , Iacob I , Iqbal M . Stat Optim Inf Comput 2025 13 (3) 900-921 We present a feasible kernel-based interior point method (IPM) to solve the monotone linear complementarity problem (LCP) which is based on an eligible kernel function with a new logarithmic barrier term. This kernel function defines the new search direction and the neighborhood of the central path. We show the global convergence of the algorithm and derive the iteration bounds for short- and long-step versions of the algorithm. We applied the method to solve a continuous Control Tabular Adjustment (CTA) problem which is an important Statistical Disclosure Limitation (SDL) model for protection of tabular data. Numerical results on a test example show that this algorithm is a viable option to the existing methods for solving continuous CTA problems. We also apply the algorithm to the set of randomly generated monotone LCPs showing that the initial implementation performs well on these instances of LCPs. However, this limited numerical testing is done for illustration purposes; an extensive numerical study is necessary to draw more definite conclusions on the behavior of the algorithm. © (2025), (International Academic Press). All rights reserved. |
Grouping of variables to facilitate statistical disclosure limitation methods in multivariate data sets
Oganian A , Iacob I , Lesaja G . Priv Stat Databases 2018 Data sets that are subject to Statistical Disclosure Limitation (SDL) often have many variables of different types that need to be altered for disclosure limitation. To produce a good quality public data set, the data protector needs to account for the relationships between the variables. Hence, ideally SDL methods should not be univariate, that is, treating each variable independently of others, but multivariate, handling many variables at the same time. However, if a data set has many variables, as most government survey data do, the task of developing and implementing a multivariate approach for SDL becomes difficult. In this paper we propose a pre-masking data processing procedure which consists of clustering the variables of high dimensional data sets, so that different groups of variables can be masked independently, thus reducing the complexity of SDL. We consider different hierarchical clustering methods, including our version of hierarchical clustering algorithm, that we call K-Link, and outline how the data protector can define an appropriate number of clusters for these methods. We implemented and applied these methods to two genuine multivariate data sets. The results of the experiments show that K-Link has a potential to solve this problem efficiently. The success of the method, however, depends on the correlation structure of the data. For the data sets where most of the variables are correlated, clustering of variables and subsequent independent application of SDL methods to different clusters may lead to attenuated correlation in the masked data, even for efficient clustering methods. Thereby, the proposed approach is a trade-off between the computational complexity of multivariate SDL methods and data utility loss due to independent treatment of different clusters by SDL methods. Keywords and phrases: Statistical disclosure limitation (SDL), hierarchical clustering, dimensionality reduction. |
A Full Nesterov-Todd Step Infeasible Interior-point Method for Symmetric Optimization in the Wider Neighborhood of the Central Path
Lesaja G , Wang GQ , Oganian A . Stat Optim Inf Comput 2021 9 (2) 250-267 In this paper, an improved Interior-Point Method (IPM) for solving symmetric optimization problems is presented. Symmetric optimization (SO) problems are linear optimization problems over symmetric cones. In particular, the method can be efficiently applied to an important instance of SO, a Controlled Tabular Adjustment (CTA) problem which is a method used for Statistical Disclosure Limitation (SDL) of tabular data. The presented method is a full Nesterov-Todd step infeasible IPM for SO. The algorithm converges to ε-approximate solution from any starting point whether feasible or infeasible. Each iteration consists of the feasibility step and several centering steps, however, the iterates are obtained in the wider neighborhood of the central path in comparison to the similar algorithms of this type which is the main improvement of the method. However, the currently best known iteration bound known for infeasible short-step methods is still achieved. |
On Different Formulations of a Continuous CTA Model
Lesaja G , Iacob I , Oganian A . Priv Stat Databases 2020 12276 166-179 In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using ℓ (1) or ℓ (2) norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of "closeness" between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a Chi-square CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models. |
Multivariate Top-Coding for Statistical Disclosure Limitation
Oganian A , Iacob I , Lesaja G . Priv Stat Databases 2020 12276 136-148 One of the most challenging problems for national statistical agencies is how to release to the public microdata sets with a large number of attributes while keeping the disclosure risk of sensitive information of data subjects under control. When statistical agencies alter microdata in order to limit the disclosure risk, they need to take into account relationships between the variables to produce a good quality public data set. Hence, Statistical Disclosure Limitation (SDL) methods should not be univariate (treating each variable independently of others), but preferably multivariate, that is, handling several variables at the same time. Statistical agencies are often concerned about disclosure risk associated with the extreme values of numerical variables. Thus, such observations are often top or bottom-coded in the public use files. Top-coding consists of the substitution of extreme observations of the numerical variable by a threshold, for example, by the 99th percentile of the corresponding variable. Bottom coding is defined similarly but applies to the values in the lower tail of the distribution. We argue that a univariate form of top/bottom-coding may not offer adequate protection for some subpopulations which are different in terms of a top-coded variable from other subpopulations or the whole population. In this paper, we propose a multivariate form of top-coding based on clustering the variables into groups according to some metric of closeness between the variables and then forming the rules for the multivariate top-codes using techniques of Association Rule Mining within the clusters of variables obtained on the previous step. Bottom-coding procedures can be defined in a similar way. We illustrate our method on a genuine multivariate data set of realistic size. |
Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion
Oganian A , Domingo-Ferrer J . Trans Data Priv 2017 10 (1) 61-81 Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. |
A second order cone formulation of continuous CTA model
Lesaja G , Castro J , Oganian A . Priv Stat Databases 2016 9867 41-53 In this paper we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using 1or 2norm; with each measure having its advantages and disadvantages. Recently, in [4] a regularization of the 1-CTA using Pseudo-Huber function was introduced in an attempt to combine positive characteristics of both 1-CTA and 2-CTA. All three models can be solved using appropriate versions of Interior-Point Methods (IPM). It is known that IPM in general works better on well structured problems such as conic optimization problems, thus, reformulation of these CTA models as conic optimization problem may be advantageous. We present reformulation of Pseudo-Huber-CTA, and 1-CTA as Second-Order Cone (SOC) optimization problems and test the validity of the approach on the small example of two-dimensional tabular data set. |
Propensity score based conditional group swapping for disclosure limitation of strata-defining variables
Oganian A , Lesaja G . Priv Stat Databases 2016 6896 69-80 In this paper we propose a method for statistical disclosure limitation of categorical variables that we call Conditional Group Swapping. This approach is suitable for design and strata-defining variables, the cross-classification of which leads to the formation of important groups or subpopulations. These groups are considered important because from the point of view of data analysis it is desirable to preserve analytical characteristics within them. In general data swapping can be quite distorting [13,16,20], especially for the relationships between the variables not only within the subpopulations but for the overall data. To reduce the damage incurred by swapping, we propose to choose the records for swapping using conditional probabilities which depend on the characteristics of the exchanged records. In particular, our approach exploits the results of propensity scores methodology for the computation of swapping probabilities. The experimental results presented in the paper show good utility properties of the method. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Apr 18, 2025
- Content source:
- Powered by CDC PHGKB Infrastructure