正在加载图片...
X.Liang et al.Building and Environment 102(2016)179-192 183 Goal of Taraets Learning y1y2,…,yn) Inputs Rule: Outputs 十 Compare (&1,2,…,Xn)1 Y=F(X) (12,…,n) Adjust (a)Supervised learning Goal of Learning Inputs Optimization Outputs (1,X2,…,Xn) Algorithm Patterns of Inputs Adjust (b)Unsupervised learning Fig.3.Mechanism of machine learning. performance. One operation is assigning each training sample xi to the closest There are various clustering models,and for each of these cluster centroid uj,shown in Eq.(1).The other one is moving each models,different algorithms can be given [33].Typical cluster cluster centroid uj to the mean of the points assigned to it,shown in models include connectivity based models (e.g.,hierarchical clus- Eq.(2). tering),centroid based models(e.g..k-means clustering).distribu- The appropriate clustering algorithm for a particular problem tion based models(e.g..Gaussian distributions fitting)and density needs to be chosen experimentally,since there is no defined "best" based models(e.g.Density-based spatial clustering of applications clustering algorithm [33.The most appropriate algorithm for a with noise)[34].Among numerous clustering algorithms,the k- certain problem can be selected by its performance.The perfor- means clustering is the most commonly used,which is defined as mance of algorithms can be measured by the definition of clusters, follows. namely the proportion of intra-cluster distance to inter-cluster distance.The Davies-Bouldin index (DBI)is used to evaluate 1.Initialize cluster centroids u.u2.....uk ER different methods in this study.This index is defined in Eq.(3). 2.Repeat until convergence:{ For every j,set -若) (3) argmin, (1) where n is the number of clusters,ci is the centroid of cluster i,o;is the average distance of all elements in cluster i to centroid ci,and For every i,set d(ci.ci)is the distance between centroids c and c;.The lower value of DBI means lower intra-cluster distances (higher intra-cluster sim- 〔1f=j =1 1={0fj (2) ilarity)and higher inter-cluster distances(lower inter-cluster sim- ilarity).therefore,the clustering algorithm with the smallest DBI is considered the best algorithm based on this criterion. In the k-means algorithm,k(a parameter of the algorithm)is the 2.2.2.Decision tree learning preset number of clusters.The cluster centroids ui represent the This study uses decision tree to induce the rules of occupant positions of the centers of the clusters.Step 1 is to initialize cluster presence.Decision tree learning is a typical supervised machine centroids,randomly or by a specific method.Step 2 is to find learning algorithm in data mining [35.It uses a tree-like structure optimal cluster centroids and samples assigned to them.Two op- to model the rules and their possible consequences.A main erations are implemented iteratively until convergence in this step. advantage of decision tree method is that it can represent the rulesperformance. There are various clustering models, and for each of these models, different algorithms can be given [33]. Typical cluster models include connectivity based models (e.g., hierarchical clus￾tering), centroid based models (e.g., k-means clustering), distribu￾tion based models (e.g., Gaussian distributions fitting) and density based models (e.g., Density-based spatial clustering of applications with noise) [34]. Among numerous clustering algorithms, the k￾means clustering is the most commonly used, which is defined as follows. 1. Initialize cluster centroids m1, m2,…, mk 2ℝ 2. Repeat until convergence: { For every j, set ci ¼ argminj    xi mj    (1) For every i, set mj ¼ Pm Pi ¼1a,xi m i ¼1a ; a ¼ 1 if ci ¼ j 0 if ci sj  (2) . In the k-means algorithm, k (a parameter of the algorithm) is the preset number of clusters. The cluster centroids mj represent the positions of the centers of the clusters. Step 1 is to initialize cluster centroids, randomly or by a specific method. Step 2 is to find optimal cluster centroids and samples assigned to them. Two op￾erations are implemented iteratively until convergence in this step. One operation is assigning each training sample xi to the closest cluster centroid mj, shown in Eq. (1). The other one is moving each cluster centroid mj to the mean of the points assigned to it, shown in Eq. (2). The appropriate clustering algorithm for a particular problem needs to be chosen experimentally, since there is no defined “best” clustering algorithm [33]. The most appropriate algorithm for a certain problem can be selected by its performance. The perfor￾mance of algorithms can be measured by the definition of clusters, namely the proportion of intra-cluster distance to inter-cluster distance. The Davies-Bouldin index (DBI) is used to evaluate different methods in this study. This index is defined in Eq. (3). DB ¼ 1 n Xn i¼1 max jsi si þ sj dðci; cjÞ ! (3) where n is the number of clusters, ci is the centroid of cluster i, si is the average distance of all elements in cluster i to centroid ci, and d(ci,cj) is the distance between centroids ci and cj. The lower value of DBI means lower intra-cluster distances (higher intra-cluster sim￾ilarity) and higher inter-cluster distances (lower inter-cluster sim￾ilarity), therefore, the clustering algorithm with the smallest DBI is considered the best algorithm based on this criterion. 2.2.2. Decision tree learning This study uses decision tree to induce the rules of occupant presence. Decision tree learning is a typical supervised machine learning algorithm in data mining [35]. It uses a tree-like structure to model the rules and their possible consequences. A main advantage of decision tree method is that it can represent the rules Rule: Inputs Outputs Adjust Compare Goal of Targets Learning (a) Supervised learning Goal of Learning Optimization Algorithm Inputs Outputs Adjust Patterns of Inputs (b) Unsupervised learning Fig. 3. Mechanism of machine learning. X. Liang et al. / Building and Environment 102 (2016) 179e192 183
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有