A key step in validating a proposed idea or system
is to evaluate over a suitable data set. However, to this date there
have been no useful tools for researchers to understand which
datasets have been used for what purpose, or in what prior
work. Instead, they have to manually browse through papers
to find suitable datasets and their URLs, which is laborious and
inefficient. To better aid the data discovery process, and provide a
better understanding of how and where datasets have been used,
we propose a framework to effectively identify datasets within the
scientific corpus. The key technical challenges are identification
of datasets, and discovery of the association between a dataset
and the URLs where they can be accessed. Based on this, we
have built a user friendly web-based search interface for users
to conveniently explore the dataset-paper relationships, and find
relevant datasets and their properties.
	A Dataset Search Engine for the Research Document Corpus