Title:
Towards Finding Optimal Partitions of Categorical Datasets
Towards Finding Optimal Partitions of Categorical Datasets
dc.contributor.author | Chen, Keke | en_US |
dc.contributor.author | Liu, Ling | |
dc.date.accessioned | 2005-06-17T17:38:12Z | |
dc.date.available | 2005-06-17T17:38:12Z | |
dc.date.issued | 2003 | en_US |
dc.description.abstract | A considerable amount of work has been dedicated to clustering numerical data sets, but only a handful of categorical clustering algorithms are reported to date. Furthermore, almost none has addressed the following two important cluster validity problems: (1) Given a data set and a clustering algorithm that partitions the data set into k clusters, how can we determine the best k with respect to the given dataset? (2) Given a dataset and a set of clustering algorithms with a fixed k, how to determine which one will produce k clusters of the best quality? In this paper, we investigate the entropy and expected-entropy concepts for clustering categorical data, and propose a cluster validity method based on the characteristics of expected-entropy. In addition, we develop an agglomerative hierarchical algorithm (HierEntro) to incorporate the proposed cluster validity method into the clustering process. We report our initial experimental results showing the effectiveness of the proposed clustering validity method and the benefits of the HierEntro clustering algorithm. | en_US |
dc.format.extent | 171223 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1853/6516 | |
dc.language.iso | en_US | |
dc.publisher | Georgia Institute of Technology | en_US |
dc.relation.ispartofseries | CC Technical Report; GIT-CC-03-56 | en_US |
dc.subject | Cluster validity | |
dc.title | Towards Finding Optimal Partitions of Categorical Datasets | en_US |
dc.type | Text | |
dc.type.genre | Technical Report | |
dspace.entity.type | Publication | |
local.contributor.author | Liu, Ling | |
local.contributor.corporatename | College of Computing | |
local.relation.ispartofseries | College of Computing Technical Report Series | |
relation.isAuthorOfPublication | 96391b98-ac42-4e2c-93ee-79a5e16c2dfb | |
relation.isOrgUnitOfPublication | c8892b3c-8db6-4b7b-a33a-1b67f7db2021 | |
relation.isSeriesOfPublication | 35c9e8fc-dd67-4201-b1d5-016381ef65b8 |
Files
Original bundle
1 - 1 of 1