正在加载图片...
184 X Liang et aL Building and Environment 102 (2016)179-192 visually and explicitly.Fig.4 illustrates the structure of decision tree model,which includes three types of nodes (ie.,root node,leaf node and terminal node)and branches between nodes.The leaf nodes denote attributes of input,while branches denote the con- dition of these attributes.Each terminal node is a subset of target variables Y,which indicates two kinds of information:(1)classi- fication of the target variables Y,and(2)the probability of each subset.Based on the classification and probability.the rules of prediction can be inducted. Most algorithms for generating decision trees are variations of a core algorithm that employs a top down,greedy search through the entire space of possible decision trees.ID3 algorithm [36]and its successor C4.5 37]are the most used methods.The key of these algorithms is the choice of the best attribute in each node.To measure the classification effect of a given attribute,a metric is defined,called information gain,which can be defined as follows Fig.5.Photo of building 101 [381. evaluate the performance of decision tree in this study.The data set Gain(S,A)=Entropy(S)> Entropy(S) vEValues(A)可 (4) is divided into ten subsets.Seven subsets are used for training and the other three are used for testing.Then it repeats by exchanging subsets.The cross-validation can improve the accuracy and robustness of decision tree model. where Entropy(S)=>-Pi log2Pi (5) Gain(S,A)represents the information gain of an attribute A related 2.3.Case study to a collection of samples S.Values(A)is the set of all possible values of attribute A,and S is the subset of S,which contains attribute A A case study was conducted to demonstrate the proposed has value v,namely S=seSA(s)=v).pi represents the proportion method.The office building of the case study is the Building 101 in of S belonging to class i,and Entropy is a measure of the impurity in the Navy Yard,Philadelphia,U.S.,shown in Fig.5.The building is a collection of training set.Given the definition of Entropy,the one of the nation's most highly instrumented commercial build- Gain(S,A)in Eq.(4)is the reduction in entropy caused by the ings.Building 101 in the Navy Yard is the temporary headquarters knowledge of attribute A.Namely,Gain(S,A)is the contribution of of the U.S.Department of Energy's Energy Efficient Building Hub attribute A to the information of samples S.The highest value of (EEB Hub)[39].Various sensors have been installed by EEB Hub information gain indicates the best attribute A in a specific node. since 2012 to acquire building data of occupants,facilities,energy There are two steps of decision tree generation.First step is consumption and environment.The profile of Building 101 is learning rules from training data based on the aforementioned C4.5 shown in Table 1. algorithm.Gain ratio method is employed to identify the best Four sensors are installed at the gates of the building to record attribute in each node by minimizing the entropy.The confidence is the number of occupants entering and exiting.The sensors are set to 0.25 and the minimal gain is 0.1.The second step is predicting located at the first floor of Building 101,shown in Fig.6.The data based on the rules learned from the first step,and validating results format of raw sensor records is shown in Table 2.The set(Ni1.Ni3. by testing data.If the accuracy is satisfied,the process is finished. Nis.Ni7)denotes the number of entering occupants,while the set Otherwise,the two steps are repeated to update decision tree until (Ni2.Ni4.Ni6.Nis)denotes the number of exiting occupants at the i- the result is satisfied.Cross-validation method 8]is used to th time step.Therefore,the number of total occupants in building at Root Node Condition 1.1 Condition 1.j Attribute 1 Leaf Node Leaf Node 。00·· Attribute n Condition i.1 Condition i.j Terminal Node Terminal Node Terminal Node Terminal Node 1.Subset of Target Variables 2.The probability of each subset Fig.4.Graphical structure of decision tree modelvisually and explicitly. Fig. 4 illustrates the structure of decision tree model, which includes three types of nodes (i.e., root node, leaf node and terminal node) and branches between nodes. The leaf nodes denote attributes of input, while branches denote the con￾dition of these attributes. Each terminal node is a subset of target variables Y, which indicates two kinds of information: (1) classi- fication of the target variables Y, and (2) the probability of each subset. Based on the classification and probability, the rules of prediction can be inducted. Most algorithms for generating decision trees are variations of a core algorithm that employs a top down, greedy search through the entire space of possible decision trees. ID3 algorithm [36] and its successor C4.5 [37] are the most used methods. The key of these algorithms is the choice of the best attribute in each node. To measure the classification effect of a given attribute, a metric is defined, called information gain, which can be defined as follows [38]. GainðS; AÞ ¼ EntropyðSÞ X y2ValuesðAÞ jSyj jSj EntropyðSyÞ (4) where EntropyðSÞ ¼ Xpi log2pi (5) Gain(S,A) represents the information gain of an attribute A related to a collection of samples S. Values(A) is the set of all possible values of attribute A, and Sy is the subset of S, which contains attribute A has value y, namely Sy¼{s2SjA(s)¼y}. pi represents the proportion of S belonging to class i, and Entropy is a measure of the impurity in a collection of training set. Given the definition of Entropy, the Gain(S,A) in Eq. (4) is the reduction in entropy caused by the knowledge of attribute A. Namely, Gain(S,A) is the contribution of attribute A to the information of samples S. The highest value of information gain indicates the best attribute A in a specific node. There are two steps of decision tree generation. First step is learning rules from training data based on the aforementioned C4.5 algorithm. Gain ratio method is employed to identify the best attribute in each node by minimizing the entropy. The confidence is set to 0.25 and the minimal gain is 0.1. The second step is predicting based on the rules learned from the first step, and validating results by testing data. If the accuracy is satisfied, the process is finished. Otherwise, the two steps are repeated to update decision tree until the result is satisfied. Cross-validation method [8] is used to evaluate the performance of decision tree in this study. The data set is divided into ten subsets. Seven subsets are used for training and the other three are used for testing. Then it repeats by exchanging subsets. The cross-validation can improve the accuracy and robustness of decision tree model. 2.3. Case study A case study was conducted to demonstrate the proposed method. The office building of the case study is the Building 101 in the Navy Yard, Philadelphia, U.S., shown in Fig. 5. The building is one of the nation's most highly instrumented commercial build￾ings. Building 101 in the Navy Yard is the temporary headquarters of the U.S. Department of Energy's Energy Efficient Building Hub (EEB Hub) [39]. Various sensors have been installed by EEB Hub since 2012 to acquire building data of occupants, facilities, energy consumption and environment. The profile of Building 101 is shown in Table 1. Four sensors are installed at the gates of the building to record the number of occupants entering and exiting. The sensors are located at the first floor of Building 101, shown in Fig. 6. The data format of raw sensor records is shown in Table 2. The set (Ni1, Ni3, Ni5, Ni7) denotes the number of entering occupants, while the set (Ni2, Ni4, Ni6, Ni8) denotes the number of exiting occupants at the i￾th time step. Therefore, the number of total occupants in building at Fig. 4. Graphical structure of decision tree model. Fig. 5. Photo of building 101. 184 X. Liang et al. / Building and Environment 102 (2016) 179e192
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有