att_abstract={{Abstract—Data glitches are errors in a data set; they are
complex entities that often span multiple attributes and records.
When they co-occur in data, the presence of one type of glitch can
hinder the detection of another type of glitch. This phenomenon
is called masking.
In this paper, we define two important types of masking, and
we propose a novel, statistically rigorous indicator called masking
index for quantifying the hidden glitches in four cases of masking:
outliers masked by missing values, outliers masked by duplicates,
duplicates masked by missing values and duplicates masked by
outliers. The masking index is critical for data quality profiling
and exploration and enables a user to measure the extent of
masking and hence the confidence in the data. In this sense, it is
a valuable data quality index for measuring the true cleanliness of
the data. It is also an objective and quantitative basis for choosing
a glitch detection method that is best suited for the glitches that
are present in any given data set.
We demonstrate the utility and effectiveness of the masking
index by intensive experiments on synthetic and real-world
	att_copyright_notice={{This version of the work is reprinted here with permission of IEEE for your personal use. Not for redistribution. The definitive version was published in 2013 ]. {{, 2013-12-15}}
	att_tags={Anomaly detection,  data cleaning,  duplicate record identification,  masking,  missing values,  outlier detection},
	author={Ji Meng Loh and Tamraparni Dasu and Laure Berti-Equille},
	institution={{ICDM 2013}},
	title={{A Masking Index for Quantifying Hidden Glitches}},