Title:
Towards Finding Optimal Partitions of Categorical Datasets

dc.contributor.author Chen, Keke en_US
dc.contributor.author Liu, Ling
dc.date.accessioned 2005-06-17T17:38:12Z
dc.date.available 2005-06-17T17:38:12Z
dc.date.issued 2003 en_US
dc.description.abstract A considerable amount of work has been dedicated to clustering numerical data sets, but only a handful of categorical clustering algorithms are reported to date. Furthermore, almost none has addressed the following two important cluster validity problems: (1) Given a data set and a clustering algorithm that partitions the data set into k clusters, how can we determine the best k with respect to the given dataset? (2) Given a dataset and a set of clustering algorithms with a fixed k, how to determine which one will produce k clusters of the best quality? In this paper, we investigate the entropy and expected-entropy concepts for clustering categorical data, and propose a cluster validity method based on the characteristics of expected-entropy. In addition, we develop an agglomerative hierarchical algorithm (HierEntro) to incorporate the proposed cluster validity method into the clustering process. We report our initial experimental results showing the effectiveness of the proposed clustering validity method and the benefits of the HierEntro clustering algorithm. en_US
dc.format.extent 171223 bytes
dc.format.mimetype application/pdf
dc.identifier.uri http://hdl.handle.net/1853/6516
dc.language.iso en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries CC Technical Report; GIT-CC-03-56 en_US
dc.subject Cluster validity
dc.title Towards Finding Optimal Partitions of Categorical Datasets en_US
dc.type Text
dc.type.genre Technical Report
dspace.entity.type Publication
local.contributor.author Liu, Ling
local.contributor.corporatename College of Computing
local.relation.ispartofseries College of Computing Technical Report Series
relation.isAuthorOfPublication 96391b98-ac42-4e2c-93ee-79a5e16c2dfb
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isSeriesOfPublication 35c9e8fc-dd67-4201-b1d5-016381ef65b8
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
GIT-CC-03-56.pdf
Size:
167.21 KB
Format:
Adobe Portable Document Format
Description: