Title:
Vista: Looking Into the Clusters in Very Large Multidimensional Datasets
Vista: Looking Into the Clusters in Very Large Multidimensional Datasets
Authors
Chen, Keke
Liu, Ling
Liu, Ling
Authors
Person
Advisors
Advisors
Associated Organizations
Organizational Unit
Series
Collections
Supplementary to
Permanent Link
Abstract
Information Visualization is commonly recognized as a useful method for
understanding sophistication in large datasets. In this paper, we introduce
an efficient and flexible clustering approach that combines visual
clustering and fast disk labelling for very large datasets. This paper has
three contributions. First, we propose a framework Vista that incorporates
information visualization methods into the clustering process in order to
enhance the understanding of the intermediate clustering results and allow
user to revise the clustering results before disk labelling phase. Second, we introduce a fast
and flexible disk-labelling algorithm ClusterMap, which utilizes the visual
clustering result to improve the overall performance of clustering on very
large datasets. Third, we develop a visualization model that maps
multidimensional dataset to 2D visualization while preserving or partially
preserving clusters. Experiments show that Vista combining with ClusterMap,
is faster and has lower error rate than existing algorithms for very large
datasets. It is also flexible because the "cluster map" can be easily
adjusted to meet application-specific clustering requirements.
Sponsor
Date Issued
2002
Extent
752783 bytes
Resource Type
Text
Resource Subtype
Technical Report