applied to measure similarity between_中国高校课件下载中心

点击下载：上海交通大学：公共管理类《专业英语》课程教学资源（阅读材料）Occupancy data analytics and prediction a case study

正在加载图片...

182 X Liang et aL Building and Environment 102 (2016)179-192 Phase Process Results Start Clustering No Clusters Patterns of occupant acceptable? presence Learning Yes Decision Tree Training No Accuracy Rules of patterns acceptable? Applying Rules Predicting Observed data☐ Training and Predicting Splitting Comparing Training set Test set raining Validating No Prediction Accuracy acceptable? Comparing Yes End Evaluation of Method Fig.2.Processes of the proposed method and results. applied to measure similarity between prediction results and Y to train the function(X).The goal of unsupervised learning is observed data,including mean,median,bias,RMSE (root mean to discover hidden patterns in the input data x by its own features, squared error)and RTE (relative total error).The details of the shown in Fig.3(b).In reality,numerous problems cannot obtain metrics and validation will be introduced in Section 3.5. priori information of outputs.Therefore,unsupervised learning is widely used to solve this kind of problems recently. 2.2.Machine learning This study uses both the supervised learning and the unsuper- vised learning in two steps.At the beginning,there is no label of Machine learning is an important method of data mining[27]. occupancy schedule data,so the unsupervised learning method which allows computers to learn from and make predictions on (i.e..clustering)is applied to identify patterns of occupant presence data via observation,experience,analysis and self-training [27,28]. from the features of data.After that,the presence data have labels, It operates by building a model to make data-driven predictions or which are the identified patterns.Then,the supervised learning decisions,rather than following strictly static program instructions method (i.e.,decision tree)is applied to induct rules based on the 291. labeled data. There are two types of machine learning.namely supervised learning and unsupervised learning [30.The former one refers to 2.2.1.Cluster analysis the traditional learning methods with training data,which is a Cluster analysis is a typical unsupervised machine learning known labeled data set of inputs and outputs.As a standard su- method,which aims to group data into a few cohesive clusters [31]. pervised learning problem, training samples The criterion of clustering is the similarities among samples.The (X,Y)={(x1,y).....(x.y)}are offered for an unknown function samples should have high similarities within the same cluster but Y=(X).X denotes the "input"variables,also called input fea- low similarities in different clusters.The similarity is normally tures,and Y denotes the "output"or target variables that trying to measured by distance.The shorter the distance between samples is, predict.The xi values are typically vectors of the form the more similar the samples are.There are various distance defi- (x,1,x2.....Xin)which are the features of xi.such as weight,color, nitions,including the Euclidian distance,the Chebyshev distance, shape and so on.The notation xij refers to the j-th feature of xi.The the Hamming distance,the dynamic time wrap distance and the goal of supervised learning is to learn a general rule(x)that correlation distance [32].Appropriate distance type should be maps inputs X to outputs Y,shown in Fig.3(a).The typical algo- selected according to the specific problem.For example,The rithms of supervised learning include regression,Bayesian statistic, Euclidian distance is commonly used for the direct geometrical decision tree and etc. distance.The correlation distance is good at triangle similarity.The The unsupervised learning refers to the methods without given dynamic time wrap is commonly used for the similarity of time- labels to the learning algorithm.leaving it on its own to find shift sequences.This study compares three kinds of distances, structure in its input.In unsupervised learning.there is no"output" shown in Fig.10,and selects the Euclidian distance due to its bestapplied to measure similarity between prediction results and observed data, including mean, median, bias, RMSE (root mean squared error) and RTE (relative total error). The details of the metrics and validation will be introduced in Section 3.5. 2.2. Machine learning Machine learning is an important method of data mining [27], which allows computers to learn from and make predictions on data via observation, experience, analysis and self-training [27,28]. It operates by building a model to make data-driven predictions or decisions, rather than following strictly static program instructions [29]. There are two types of machine learning, namely supervised learning and unsupervised learning [30]. The former one refers to the traditional learning methods with training data, which is a known labeled data set of inputs and outputs. As a standard supervised learning problem, training samples ðX; YÞ ¼ fðx1; y1Þ;…;ðxm ; ym Þg are offered for an unknown function Y ¼ F ðXÞ: X denotes the “input” variables, also called input features, and Y denotes the “output” or target variables that trying to predict. The xi values are typically vectors of the form ðxi 1; xi 2; …; xinÞwhich are the features of xi, such as weight, color, shape and so on. The notation xij refers to the j-th feature of xi. The goal of supervised learning is to learn a general rule F ðXÞ that maps inputs X to outputs Y, shown in Fig. 3 (a). The typical algorithms of supervised learning include regression, Bayesian statistic, decision tree and etc. The unsupervised learning refers to the methods without given labels to the learning algorithm, leaving it on its own to find structure in its input. In unsupervised learning, there is no “output” Y to train the function F ðXÞ. The goal of unsupervised learning is to discover hidden patterns in the input data X by its own features, shown in Fig. 3 (b). In reality, numerous problems cannot obtain priori information of outputs. Therefore, unsupervised learning is widely used to solve this kind of problems recently. This study uses both the supervised learning and the unsupervised learning in two steps. At the beginning, there is no label of occupancy schedule data, so the unsupervised learning method (i.e., clustering) is applied to identify patterns of occupant presence from the features of data. After that, the presence data have labels, which are the identified patterns. Then, the supervised learning method (i.e., decision tree) is applied to induct rules based on the labeled data. 2.2.1. Cluster analysis Cluster analysis is a typical unsupervised machine learning method, which aims to group data into a few cohesive clusters [31]. The criterion of clustering is the similarities among samples. The samples should have high similarities within the same cluster but low similarities in different clusters. The similarity is normally measured by distance. The shorter the distance between samples is, the more similar the samples are. There are various distance defi- nitions, including the Euclidian distance, the Chebyshev distance, the Hamming distance, the dynamic time wrap distance and the correlation distance [32]. Appropriate distance type should be selected according to the specific problem. For example, The Euclidian distance is commonly used for the direct geometrical distance. The correlation distance is good at triangle similarity. The dynamic time wrap is commonly used for the similarity of timeshift sequences. This study compares three kinds of distances, shown in Fig. 10, and selects the Euclidian distance due to its best Fig. 2. Processes of the proposed method and results. 182 X. Liang et al. / Building and Environment 102 (2016) 179e192

<<向上翻页向下翻页>>

点击下载：上海交通大学：公共管理类《专业英语》课程教学资源（阅读材料）Occupancy data analytics and prediction a case study