
180 Park Ave - Building 103
Florham Park, NJ
Summary Graphs for Relational Database Schemas
Cecilia Procopiuc, Divesh Srivastava, Xiaoyan Yang
VLDB Conference,
2011.
[PDF]
[BIB]
VLDB Foundation Copyright
The definitive version was published in Very Large Databases, 2011. , 2011-08-29
{Increasingly complex databases need ever more sophisticated tools
to help users understand their schemas and interact with the data.
Existing tools fall short of either providing the ``big picture,''
or of presenting useful connectivity information.
In this paper we define summary graphs, a novel approach for
summarizing schemas. Given a set of user-specified query tables,
the summary graph automatically computes the most relevant tables
and joins for that query set. The output preserves the most
informative join paths between the query tables, while meeting
size constraints. In the process, we define a novel
information-theoretic measure over join edges. Unlike most
subgraph extraction work, we allow metaedges (i.e., edges in the
transitive closure) to help reduce output complexity. We prove
that the problem is NP-Hard, and solve it as an integer program.
Our extensive experimental study shows that our method returns
high-quality summaries under independent quality measures.}

Automatic Discovery of Attributes in Relational Databases
Meihui Zhang, Beng Chin Ooi, Divesh Srivastava, Cecilia Procopiuc, Marios Hadjieleftheriou
ACM SIGMOD 2011,
2011.
[PDF]
[BIB]
ACM Copyright
(c) ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM SIGMOD 2011 , 2011-06-12, http://www.acm.org/.
{In this work we try to cluster relational columns into attributes, i.e., to identify strong relationships between columns
based on the common properties and characteristics of the values they contain.
For example, identifying whether a certain set of columns refers
to telephone numbers versus social security numbers, or names of customers versus names of employees.
Traditional relational database schema languages use very limited primitive data types and
simple foreign key constraints to express relationships between columns. Object oriented schema
languages allow the definition of custom data types, but still, certain relationships between
columns might be unknown at design time or they might appear only in a particular database instance.
Nevertheless, these relationships are an invaluable tool for schema matching, and generally for
better understanding and working with the data. Here, we introduce data oriented solutions (we do not consider solutions that
assume the existence of any external knowledge), that use statistical measures to identify strong relationships
between the values of a set of columns. Interpreting the
database as a graph where nodes correspond to database columns and edges correspond to column
relationships, we decompose the graph into connected components and cluster sets of columns into
attributes. To test the quality of our solutions, we also provide a
comprehensive experimental evaluation using real and synthetic datasets.
}
Rectangular Layouts and Contact Graphs
Adam L. Buchsbaum, Emden R. Gansner, Cecilia Magdalena Procopiuc, Suresh Venkatasubramanian
CoRR,
vabs/cs/0611107,
2006.
[BIB]
Computer Systems, Methods And Computer Program Products For Dta Anonymization For Aggregate Query Answering,
Tue Feb 07 12:50:29 EST 2012
Computer program products are provided for anonymizing a database that includes tuples. A respective tuple includes at least one quasi-identifier and sensitive attributes associated with the quasi-identifier. These computer program products include computer readable program code that is configured to (k,e)-anonymize the tuples over a number k of different values in a range e of values, while preserving coupling at least two of the sensitive attributes to one another in the sets of attributes that are anonymized to provide a (k,e)-anonymized database. Related computer systems and methods are also provided.