att_abstract={{Data is becoming a commodity of tremendous value for many domains. This is leading to a rapid increase in the number of data sources and public access data services, such as cloud-based data markets and data portals, that facilitate the collection, publishing and trading of data. Data sources typically exhibit wide variety and heterogeneity in the types or schemas of the data they provide, their quality, and the fees they charge for accessing their data. Users who want to build upon such publicly available data, must (i) identify sources that are relevant to their applications, (ii) discover sources that collectively satisfy the quality and budget requirements of their applications, with few effective clues about the quality of the sources, and (iii) repeatedly invest many person-hours in assessing the eventual usefulness of data sources. All three steps require investigating the content of sources manually, integrating them and evaluating the actual benefit of the integration result for a desired application. Unfortunately, when the number of data sources is large, humans have a limited capability of reasoning about the actual quality of sources and the trade-offs between the benefits and costs of acquiring and integrating sources. In this paper we explore the problems of automatically appraising the quality of data sources and identifying the most valuable sources for diverse applications.We introduce our vision for a new data source management system that automatically assesses the quality of data sources based on
a collection of rigorous data quality metrics and enables the automated and interactive discovery of valuable sources for user applications. We argue that the proposed system can dramatically simplify the Discover-Appraise-Evaluate interaction loop that many users follow today to discover sources for their applications.}},
	att_categories={C_BB.1, C_NSS.2, C_IIS.5, C_IIS.6},
	att_copyright_notice={{(c) ACM, 2014. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 2014 {{, 2015-01-01}}.
	author={Divesh Srivastava and Theodoros Rekatsinas and Xin Luna Dong and Lise Getoor},
	institution={{Conference on Innovative Data Systems Research (CIDR)}},
	title={{Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration}},