Vista: Looking Into the Clusters in Very Large Multidimensional Datasets

Chen, Keke; Liu, Ling

Title:

Vista: Looking Into the Clusters in Very Large Multidimensional Datasets

Files

GIT-CC-02-30.pdf (735.14 KB)

Author(s)

Chen, Keke
Liu, Ling

Authors

Person

Liu, Ling

Associated Organization(s)

Organizational Unit

College of Computing

Abstract

Information Visualization is commonly recognized as a useful method for understanding sophistication in large datasets. In this paper, we introduce an efficient and flexible clustering approach that combines visual clustering and fast disk labelling for very large datasets. This paper has three contributions. First, we propose a framework Vista that incorporates information visualization methods into the clustering process in order to enhance the understanding of the intermediate clustering results and allow user to revise the clustering results before disk labelling phase. Second, we introduce a fast and flexible disk-labelling algorithm ClusterMap, which utilizes the visual clustering result to improve the overall performance of clustering on very large datasets. Third, we develop a visualization model that maps multidimensional dataset to 2D visualization while preserving or partially preserving clusters. Experiments show that Vista combining with ClusterMap, is faster and has lower error rate than existing algorithms for very large datasets. It is also flexible because the "cluster map" can be easily adjusted to meet application-specific clustering requirements.

Date Issued

2002

Extent

752783 bytes

Resource Type

Text

Resource Subtype

Technical Report

Full item page

Title:

Vista: Looking Into the Clusters in Very Large Multidimensional Datasets

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Vista: Looking Into the Clusters in Very Large Multidimensional Datasets

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Vista: Looking Into the Clusters in Very Large Multidimensional Datasets