Title:
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data

dc.contributor.author Nguyen, Minh Quoc
dc.contributor.author Omiecinski, Edward
dc.contributor.author Mark, Leo
dc.contributor.corporatename Georgia Institute of Technology. College of Computing
dc.date.accessioned 2010-10-01T16:55:32Z
dc.date.available 2010-10-01T16:55:32Z
dc.date.issued 2010
dc.description Research area: Databases en_US
dc.description Research topic: Data Mining
dc.description.abstract Local density-based outlier (LOF) is a useful method to detect outliers because of its model free and locally based property. However, the method is very slow for high dimensional datasets. In this paper, we introduce a randomization method that can computer LOF very efficiently for high dimensional datasets. Based on a consistency property of outliers, random points are selected to partition a data set to compute outlier candidates locally. Since the probability of a point to be isolated from its neighbors is small, we apply multiple iterations with random partitions to prune false outliers. The experiments on a variety of real and synthetic datasets show that the randomization is effective in computing LOF. The experiments also show that our method can compute LOF very efficiently with very high dimensional data. en_US
dc.identifier.uri http://hdl.handle.net/1853/35004
dc.language.iso en_US en_US
dc.publisher Georgia Institute of Technology en_US
dc.relation.ispartofseries CC Technical Report; GT-CS-10-09 en_US
dc.subject Data mining en_US
dc.subject Randomization en_US
dc.subject Outliers en_US
dc.title A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data en_US
dc.type Text
dc.type.genre Technical Report
dspace.entity.type Publication
local.contributor.corporatename College of Computing
local.contributor.corporatename School of Computer Science
local.relation.ispartofseries College of Computing Technical Report Series
local.relation.ispartofseries School of Computer Science Technical Report Series
relation.isOrgUnitOfPublication c8892b3c-8db6-4b7b-a33a-1b67f7db2021
relation.isOrgUnitOfPublication 6b42174a-e0e1-40e3-a581-47bed0470a1e
relation.isSeriesOfPublication 35c9e8fc-dd67-4201-b1d5-016381ef65b8
relation.isSeriesOfPublication 26e8e5bc-dc81-469c-bd15-88e6f98f741d
Files
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
Name:
GT-CS-10-09.pdf
Size:
278.83 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.76 KB
Format:
Item-specific license agreed upon to submission
Description: