att_abstract={{Data integration is a challenging task due to the large numbers of
autonomous data sources. This necessitates the development of
techniques to reason about the benefits and costs of acquiring and
integrating data. Recently the problem of source selection (i.e.,
identifying the subset of sources that maximizes the profit from integration)
was introduced as a preprocessing step before the actual
integration. The problem was studied for static sources and used
the accuracy of data fusion to quantify the integration profit.

In this paper, we study the problem of source selection considering
dynamic data sources whose content changes over time. We
define a set of time-dependent metrics, including coverage, freshness
and accuracy, to characterize the quality of integrated data.
We show how statistical models for the evolution of sources can
be used to estimate these metrics. While source selection is NP-complete,
we show that for a large class of practical cases, near-optimal
solutions can be found, propose an algorithmic framework
with theoretical guarantees for our problem and show its effectiveness
on an extensive experimental evaluation with both real-world
and synthetic data.}},
	att_categories={C_BB.1, C_NSS.2, C_IIS.1, C_IIS.5},
	att_copyright_notice={{(c) ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 2014 {{, 2014-06-22}}.
	author={Divesh Srivastava and Theodoros Rekatsinas and Xin Luna Dong},
	institution={{ACM SIGMOD 2014}},
	title={{Characterizing and Selecting Fresh Data Sources}},