Title:
Evolution in Data Streams
Evolution in Data Streams
Author(s)
Omiecinski, Edward
Mark, Leo
Huang, Weiyun
Mark, Leo
Huang, Weiyun
Advisor(s)
Editor(s)
Collections
Supplementary to
Permanent Link
Abstract
Conventional data mining deals with static data stored on disk, for example, using the current state of a data warehouse. In addition, the data may be read muliple times to accomplish the mining task. Recently, the data stream paradigm has become the focus of study, where data is continuously arriving as a sequence of elements and the data mining task has to be done in a single pass. An example is to construct a model(s) of the data as in clusitering or classification in a single pass and with limited memory. Data arrives as one of multiple potentially infinite streams under the data stream model. Data streams can flow at variable rates and the underlying models often change with time. The current work in data stream mining does not focus on change ("evolution") and that is precisely our main focus. Monitoring the changes in the models becomes as important as objeaining the models. Therefore, stream data mining not only needs to mine data incrementally and decrementally (in order to keep track of recent data), but also has to provide methods to monitor/detect the changes of underlying modesl. We consider this problem as "data evolution." Of equal importance, the mining algorithms themselves need to be adaptive/dynamic when the flow rate of data streams change dramatically. That is, the algorithms should be able to downgrade accuracy in order to handle a data burst, or to do a more thorough analysis when data flow is slow. We consider this problem as "algorithm evolution." We will study both data evolution and algorithm evolution. We will provide efficient algorithms to incrementally/decrementally mine stream data, good techniques to store data models and detect/monitor the changes, and a set of algorithms that can switch from "high resolution" to "low resolution" in order to adapt to the flow rate.
Sponsor
Date Issued
2003
Extent
273457 bytes
Resource Type
Text
Resource Subtype
Technical Report