acquire, harmonize, rescale, clean an_中国高校课件下载中心

点击下载：上海交通大学：公共管理类《专业英语》课程教学资源（阅读材料）Occupancy data analytics and prediction a case study

正在加载图片...

X.Liang et al.Building and Environment 102(2016)179-192 Steps Methods/Tools Outcomes 1 Problem statement. Literature review; Problem Framing assumption and key Expert interview metrics Acquire and Acquire,harmonize, rescale.clean and Valid data Prepare Data format data Methodology Identify problem Selected Selection solving approaches approaches and and software software tools Patterns and rules Learning Machine learning: Rule Induction; of occupancy schedule Prediction method Results of Prediction based on occupancy occupancy pattern presence prediction Compare prediction Validation results to observed Effect of the data proposed method Fig.1.Framework of the proposed method for occupancy schedule learning and predicting. acquire,harmonize,rescale,clean and format data.Due to the modularized operation for analytics and data mining.Due to its failure of sensors and other interference factors,the raw data may flexibility and accessibility,RapidMiner has been widely used in contain missing data,error data and the unstructured data.Before industry and academia. data mining.the raw data should be pre-processed to get the valid Step 4:learning.This step is to discover the patterns of occu- data.In this study,the missing data is removed from the data set. pancy schedule and abstract the rules within the patterns.Clus- Statistical methods (ie.,box plot and mean value)are used to tering and decision tree are applied for pattern recognition and rule investigate the characteristics of the data before data mining. induction respectively.The details of processes and results of each Step 3:methodology selection.Data mining involves various step are illustrated in the learning phase in Fig.2. kinds of methods.Different methods target problems at different Step 5:prediction.The observed data is split to a training set and levels.According to the specific problem and data source,appro- a test set.The training set is used to train the model and identify the priate methods could be selected.In this study,machine learning rules,shown in the predicting phase in Fig.2.Based on the iden- method is adopted to discover patterns of occupant presence,while tified patterns and rules of occupant presence,the occupancy rule induction is used to summarize rules within the patterns. schedule can be predicted. Software selection is essential to analyze data.Matlab 2015 and Step 6:validation.This step is to compare the prediction result RapidMiner 6.5 are applied on a standard PC with Windows 7 to to the test data set,shown in the validating phase in Fig.2.The perform the data processing and data mining.respectively.Rapid- more similar the two sets are,the better the method is.To quan- Miner is open source software with visualized interface and titatively validate the proposed method,several metrics can beacquire, harmonize, rescale, clean and format data. Due to the failure of sensors and other interference factors, the raw data may contain missing data, error data and the unstructured data. Before data mining, the raw data should be pre-processed to get the valid data. In this study, the missing data is removed from the data set. Statistical methods (i.e., box plot and mean value) are used to investigate the characteristics of the data before data mining. Step 3: methodology selection. Data mining involves various kinds of methods. Different methods target problems at different levels. According to the specific problem and data source, appropriate methods could be selected. In this study, machine learning method is adopted to discover patterns of occupant presence, while rule induction is used to summarize rules within the patterns. Software selection is essential to analyze data. Matlab 2015 and RapidMiner 6.5 are applied on a standard PC with Windows 7 to perform the data processing and data mining, respectively. RapidMiner is open source software with visualized interface and modularized operation for analytics and data mining. Due to its flexibility and accessibility, RapidMiner has been widely used in industry and academia. Step 4: learning. This step is to discover the patterns of occupancy schedule and abstract the rules within the patterns. Clustering and decision tree are applied for pattern recognition and rule induction respectively. The details of processes and results of each step are illustrated in the learning phase in Fig. 2. Step 5: prediction. The observed data is split to a training set and a test set. The training set is used to train the model and identify the rules, shown in the predicting phase in Fig. 2. Based on the identified patterns and rules of occupant presence, the occupancy schedule can be predicted. Step 6: validation. This step is to compare the prediction result to the test data set, shown in the validating phase in Fig. 2. The more similar the two sets are, the better the method is. To quantitatively validate the proposed method, several metrics can be Methods/Tools Problem statement, assumption and key metrics Steps Problem Framing 1 Acquire and Prepare Data 2 Methodology Selection 3 Learning 4 Prediction 5 Literature review; Expert interview Acquire, harmonize, rescale, clean and format data Identify problem solving approaches and software Machine learning; Rule Induction; Prediction method based on occupancy pattern Outcomes Valid data Selected approaches and software tools Patterns and rules of occupancy schedule Results of occupancy presence prediction Validation 6 Compare prediction results to observed data Effect of the proposed method Fig. 1. Framework of the proposed method for occupancy schedule learning and predicting. X. Liang et al. / Building and Environment 102 (2016) 179e192 181

<<向上翻页向下翻页>>

点击下载：上海交通大学：公共管理类《专业英语》课程教学资源（阅读材料）Occupancy data analytics and prediction a case study