Title:
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
dc.contributor.author | Nguyen, Minh Quoc | |
dc.contributor.author | Omiecinski, Edward | |
dc.contributor.author | Mark, Leo | |
dc.contributor.corporatename | Georgia Institute of Technology. College of Computing | |
dc.date.accessioned | 2010-10-01T16:55:32Z | |
dc.date.available | 2010-10-01T16:55:32Z | |
dc.date.issued | 2010 | |
dc.description | Research area: Databases | en_US |
dc.description | Research topic: Data Mining | |
dc.description.abstract | Local density-based outlier (LOF) is a useful method to detect outliers because of its model free and locally based property. However, the method is very slow for high dimensional datasets. In this paper, we introduce a randomization method that can computer LOF very efficiently for high dimensional datasets. Based on a consistency property of outliers, random points are selected to partition a data set to compute outlier candidates locally. Since the probability of a point to be isolated from its neighbors is small, we apply multiple iterations with random partitions to prune false outliers. The experiments on a variety of real and synthetic datasets show that the randomization is effective in computing LOF. The experiments also show that our method can compute LOF very efficiently with very high dimensional data. | en_US |
dc.identifier.uri | http://hdl.handle.net/1853/35004 | |
dc.language.iso | en_US | en_US |
dc.publisher | Georgia Institute of Technology | en_US |
dc.relation.ispartofseries | CC Technical Report; GT-CS-10-09 | en_US |
dc.subject | Data mining | en_US |
dc.subject | Randomization | en_US |
dc.subject | Outliers | en_US |
dc.title | A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data | en_US |
dc.type | Text | |
dc.type.genre | Technical Report | |
dspace.entity.type | Publication | |
local.contributor.corporatename | College of Computing | |
local.contributor.corporatename | School of Computer Science | |
local.relation.ispartofseries | College of Computing Technical Report Series | |
local.relation.ispartofseries | School of Computer Science Technical Report Series | |
relation.isOrgUnitOfPublication | c8892b3c-8db6-4b7b-a33a-1b67f7db2021 | |
relation.isOrgUnitOfPublication | 6b42174a-e0e1-40e3-a581-47bed0470a1e | |
relation.isSeriesOfPublication | 35c9e8fc-dd67-4201-b1d5-016381ef65b8 | |
relation.isSeriesOfPublication | 26e8e5bc-dc81-469c-bd15-88e6f98f741d |