|
Spatial
Transformations of Maps
Dan Keim, Stephen North, Christian Panse, Mike Sips
Many data sources are geographic:
census statistics,
public health records,
environmental data
and business transactions.
When data has geographic coordinates, they are often a key
to understanding trends, clusters and other patterns.
Cartographic software and GIS (Geographic Information Systems)
offer practical methods of exploring geo-related data.
Conventional maps only show data in
relation to land area, not population or data set size.
For visual data mining, we are interested in map transformations
that show data in proportion to the size of the data set under study.
Intentional spatial transformation is actually part of any map of a
section on the surface of the earth's sphere projected on a
2-D page or screen.
This implies some nonlinear distortion, though usually the goal
is to minimize its effects when showing land features such
as land area, mountains, rivers and cities.
On the other hand, to visualize geospatial statistical data sets,
cartographers have proposed intentionally distorting individual map
regions (such as states, provinces, and counties) so that their areas
are proportional to an input parameter such as population, wealth or
occurence of disease.
These spatially transformed maps are called
cartograms.
In a cartogram, the size of map regions depends on the data
under study, not raw land area. This is clearly relevant to
visual data mining with maps.
The challenge in making cartograms automatically
is to preserve the shapes of the input map
(individual regions as well as the overall map)
while making each region's area close to its statistical target
and not changing the connectivity between regions.
This may be formulated as a nonlinear optimization problem,
where the objective is to minimize some function of the total
shape and area error, subject to constraints that preserve
the input map's topology. Approaching the solution by general
optimizaton techniques, though, has been unsuccessful due to
the computational complexity of this problem. (In fact, it is
infeasible in general- a "perfect" solution is not even possible,
so some constraints must be relaxed to get an approximate answer.)
In 2001-02, Keim, Panse and North developed the
CartoDraw heuristic. This method stretches or shrinks parts of a
map with respect to scanlines placed through some of the map's regions.
The heuristic is effective because candidate adjustments are inexpensive
to compute, and a high enough proportion actually improve the solution
without violating topological constraints. Scanlines can be generated
automatically, or even placed interactively - the interactive option
permits some manual control over the optimization. CartoDraw is scalable
enough to make a cartogram of the 3000 counties of the United States.
(This scale is well beyond previous techniques that explicitly compute
shape error, though admittedly, with 3000 regions, the output starts to
resemble the continuous case that can be dealt with by simpler
"rubber-sheet" models that ignore shapes.)
Note that rectangular cartograms are a valuable alternative
to classical shape-preserving cartograms. They make it easier to
visually compare areas, and they avoid visual noise by drawing each
region cleanly. They may relax some of the topological properties
of the original map, allowing some adjacent regions to be separate.
Dan Keim's group recently contributed a rectangular
cartogram heuristic, RecMap, to the CartoView system. (An impressive
topology-preserving approach proposed by van Kreveld and Speckmanm
demonstrates the difficulty of this problem and the value of fast heuristics.)
PixelMaps - Revealing Clusters in Dense Point Sets
Many real-world data sets are much too large to show completely.
Worse, they are highly non-uniform, with
interesting patterns concentrated in very dense areas.
Some of our recent work addresses problems of non-uniformity
and scale in geographic data sets.
Occlusion of data items due to overplotting is a significant problem.
One common way to avoid that is by aggregating items.
For example, when showing household income, we can aggregate households
up to zip-5 (five digit postal code) regions.
The drawback is that interesting small clusters and outliers are lost.
For example, a small low or high income cluster may not be visible
at the postal code level.
Daniel Keim,
Christian Panse,
and
Mike Sips
(of the University of Konstanz), with
Stephen North,
proposed
PixelMaps as a way of overcoming this difficulty. The PixelMap
heuristic is based on clustering. Each point is assigned to a unique pixel,
so data is not lost. Pixel placement is adjusted locally, but in
a way that not only respects but intentionally "pulls out" clusters.
For example, if there is a small cluster of outbreaks of disease
within a large data set, we can form those points into a distinct,
spatially coherent cluster so the cluster is more noticable and not
lost in the noise. The PixelMap heuristic allows setting the tradeoffs
between absolute position preservation of related data points, relative
position preservation, and clustering factors (size and affinity).
The above examples show the year 2000 census block-level median
household income in New York State. Note how it is difficult to
find income patterns in the raw data (left). In the PixelMap view
(right), one can view and compare patterns in the entire state.
Income clusters on the east side of Central Park and Long Island
can be identified and compared with others in Syracuse and Buffalo.
<<
Back to Projects & Software
|