Title:
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
A Fast Randomized Method for Local Density-based Outlier Detection in High Dimensional Data
Authors
Nguyen, Minh Quoc
Omiecinski, Edward
Mark, Leo
Omiecinski, Edward
Mark, Leo
Authors
Advisors
Advisors
Associated Organizations
Collections
Supplementary to
Permanent Link
Abstract
Local density-based outlier (LOF) is a useful method to detect outliers because of its model free and locally based property. However, the method is very slow for high dimensional datasets. In this paper, we introduce a randomization method that can computer LOF very efficiently
for high dimensional datasets. Based on a consistency property
of outliers, random points are selected to partition a data set to compute
outlier candidates locally. Since the probability of a point to be isolated
from its neighbors is small, we apply multiple iterations with random
partitions to prune false outliers. The experiments on a variety of real
and synthetic datasets show that the randomization is effective in computing
LOF. The experiments also show that our method can compute
LOF very efficiently with very high dimensional data.
Sponsor
Date Issued
2010
Extent
Resource Type
Text
Resource Subtype
Technical Report