Lecture 5 Data Stream Mining
Lecture 5 Data Stream Mining
Outline ▣What is data stream? What is Concept Drift? Data stream classification Data stream clustering
What is data stream? What is Concept Drift? Data stream classification Data stream clustering Outline
Internet Surveillance SPAM SPAM FILTER Spam Filtering DATA Network Intrusion Industry STREAM Mobile Smart Phone Sensor *Note:some pictures derived from internet
DATA STREAM Internet Industry Surveillance *Note: some pictures derived from internet Sensor Network Intrusion Smart Phone Spam Filtering Mobile
Potential Applications Telecommunication calling records Business:credit card transaction flows Network monitoring and traffic engineering Financial market:stock exchange Engineering industrial processes:power supply manufacturing Sensor,monitoring surveillance:video streams,RFIDs ·Security monitoring Web logs and Web page click streams
Potential Applications • Telecommunication calling records • Business: credit card transaction flows • Network monitoring and traffic engineering • Financial market: stock exchange • Engineering & industrial processes: power supply & manufacturing • Sensor, monitoring & surveillance: video streams, RFIDs • Security monitoring • Web logs and Web page click streams
What is data stream? A data stream is a massive sequence of data objects which have some unique features: >One by One >Potentially Unbounded >Concept Drift data4 data3 data2 datal Data mining system Data stream
What is data stream? A data stream is a massive sequence of data objects which have some unique features: One by One Potentially Unbounded Concept Drift data1 Data stream data4 data3 data2 Data mining system
Challenges Data Stream:(a)Infinite Length (b)Evolving Nature ◆Single Pass Handling ◆Memory Limitation ◆Low Time Complexity ◆Concept Drift
Challenges Data Stream: (a) Infinite Length (b) Evolving Nature Single Pass Handling Memory Limitation Low Time Complexity Concept Drift
What is concept drift? In predictive analytics and machine learning,the concept drift means that the statistical properties of the target variable, which the model is trying to predict,change over time in unforeseen ways. In a word,the probability distribution changes. ·Change in P(c) ·Change in P(X) ·Change in P(ClX)
What is concept drift? In predictive analytics and machine learning, the concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. In a word, the probability distribution changes. • Change in P(C) • Change in P(X) • Change in P(C|X)
Real concept drift vs.Virtual concept drift Original data Real concept drift Virtual drift ● p(yX)changes p(X)changes,but not p(ylX) P(C,IX)=P(C)P(XIC,) P(X)
Real concept drift vs. Virtual concept drift P(C ) P(X | C ) (C | X) P(X) i i P i
Example:Concept-Drift Current hyperplane 0 O 0 0 0 O 0 6 0 00 0 00 8 000 8 000 0 O Previous hyperplane A data chunk Negative instance● Instances victim of concept-drift Positive instance o
Example: Concept-Drift Negative instance Positive instance A data chunk Current hyperplane Previous hyperplane Instances victim of concept-drift
1,Concept Drift Detection
1、 Concept Drift Detection