Title:
Vista: Looking Into the Clusters in Very Large Multidimensional Datasets

Thumbnail Image
Author(s)
Chen, Keke
Liu, Ling
Authors
Person
Advisor(s)
Advisor(s)
Editor(s)
Associated Organization(s)
Organizational Unit
Supplementary to
Abstract
Information Visualization is commonly recognized as a useful method for understanding sophistication in large datasets. In this paper, we introduce an efficient and flexible clustering approach that combines visual clustering and fast disk labelling for very large datasets. This paper has three contributions. First, we propose a framework Vista that incorporates information visualization methods into the clustering process in order to enhance the understanding of the intermediate clustering results and allow user to revise the clustering results before disk labelling phase. Second, we introduce a fast and flexible disk-labelling algorithm ClusterMap, which utilizes the visual clustering result to improve the overall performance of clustering on very large datasets. Third, we develop a visualization model that maps multidimensional dataset to 2D visualization while preserving or partially preserving clusters. Experiments show that Vista combining with ClusterMap, is faster and has lower error rate than existing algorithms for very large datasets. It is also flexible because the "cluster map" can be easily adjusted to meet application-specific clustering requirements.
Sponsor
Date Issued
2002
Extent
752783 bytes
Resource Type
Text
Resource Subtype
Technical Report
Rights Statement
Rights URI