Motif Difficulty (MD):A Predictive Measure of Problem Difficulty for Evolutionary Algorithms using Network Motifs Jing Liu jing.liu@adfa.edu.au School of Engineering and Information Technology,The University of New South Wales at the Australian Defence Force Academy,Canberra,ACT 2600,Australia Hussein A.Abbass h.abbass@adfa.edu.au School of Engineering and Information Technology,The University of New South Wales at the Australian Defence Force Academy,Canberra,ACT 2600,Australia David G.Green david.green@monash.edu Clayton School of Information Technology,Monash University,Clayton 3800, Australia Weicai Zhong w.zhong@adfa.edu.au School of Engineering and Information Technology,The University of New South Wales at the Australian Defence Force Academy,Canberra,ACT 2600,Australia Abstract One of the major challenges in the field of Evolutionary Algorithms(EAs)is to char- acterise which kinds of problems are easy and which are not.Researchers have been attracted to predict the behaviour of EAs in different domains.Based on fitness land- scape networks(FLNs)built by operators satisfying specific requirements,we define a new predictive measure,namely Motif Difficulty (MD),for comparison-based EAs. Since exhaustive computation on whole networks becomes quickly impractical,we propose a sampling technique for calculating the approximate MD.Extensive experi- ments on binary search spaces are conducted to show both advantages and limitations of MD.Multidimensional knapsack problems(MKPs)are also used to validate the performance of approximate MD on FLNs with different topologies.The effect of two representations,namely binary and permutation,on the difficulty of MKPs is analysed. Keytinryalort,roblem difficully,predictive measure,fies landscape, complex network,network motif. 1 Introduction Intrinsically,evolutionary algorithms (EAs)(Forgel,1999;Back et al.,1997;Goldberg, 1989)are still stochastic algorithms.Thus,one of the major challenges in this field is to characterise which kinds of problems are easy for a given algorithm to solve and which are not.In this direction,researchers have been attracted to predict the behaviour of EAs in different domains and have proposed predictive measures to quantify problem difficulty.The major tool used in available predictive measures is fitness landscapes (FLs).The concept of a FL was introduced in theoretical genetics (Wright,1932)as a way to visualise evolutionary dynamics.FLs were connected to EAs via a neigh- bourhood structure based on operators used in EAs,which highlights the association C200X by the Massachusetts Institute of Technology Evolutionary Computationx(x):xxx-xxx
Motif Difficulty (MD): A Predictive Measure of Problem Difficulty for Evolutionary Algorithms using Network Motifs Jing Liu jing.liu@adfa.edu.au School of Engineering and Information Technology, The University of New South Wales at the Australian Defence Force Academy, Canberra, ACT 2600, Australia Hussein A. Abbass h.abbass@adfa.edu.au School of Engineering and Information Technology, The University of New South Wales at the Australian Defence Force Academy, Canberra, ACT 2600, Australia David G. Green david.green@monash.edu Clayton School of Information Technology, Monash University, Clayton 3800, Australia Weicai Zhong w.zhong@adfa.edu.au School of Engineering and Information Technology, The University of New South Wales at the Australian Defence Force Academy, Canberra, ACT 2600, Australia Abstract One of the major challenges in the field of Evolutionary Algorithms (EAs) is to characterise which kinds of problems are easy and which are not. Researchers have been attracted to predict the behaviour of EAs in different domains. Based on fitness landscape networks (FLNs) built by operators satisfying specific requirements, we define a new predictive measure, namely Motif Difficulty (MD), for comparison-based EAs. Since exhaustive computation on whole networks becomes quickly impractical, we propose a sampling technique for calculating the approximate MD. Extensive experiments on binary search spaces are conducted to show both advantages and limitations of MD. Multidimensional knapsack problems (MKPs) are also used to validate the performance of approximate MD on FLNs with different topologies. The effect of two representations, namely binary and permutation, on the difficulty of MKPs is analysed. Keywords Evolutionary algorithm, problem difficulty, predictive measure, fitness landscape, complex network, network motif. 1 Introduction Intrinsically, evolutionary algorithms (EAs) (Forgel, 1999; Ba¨ck et al., 1997; Goldberg, 1989) are still stochastic algorithms. Thus, one of the major challenges in this field is to characterise which kinds of problems are easy for a given algorithm to solve and which are not. In this direction, researchers have been attracted to predict the behaviour of EAs in different domains and have proposed predictive measures to quantify problem difficulty. The major tool used in available predictive measures is fitness landscapes (FLs). The concept of a FL was introduced in theoretical genetics (Wright, 1932) as a way to visualise evolutionary dynamics. FLs were connected to EAs via a neighbourhood structure based on operators used in EAs, which highlights the association °c 200X by the Massachusetts Institute of Technology Evolutionary Computation x(x): xxx-xxx
J.Liu,H.A.Abbass,D.G.Green,and W.Zhong between search spaces and fitness spaces.With the property of neighbourhood struc- ture in mind and at some level of granularity,each FL can form a network.In such a network,each node corresponds to a point in the search space,and each edge con- nects one node to one of its neighbours.The fitness value can be viewed as the weight of each node.Therefore,different problems correspond to different networks,and the process of EAs solving problems is equivalent to navigating through these networks. From this viewpoint,problem difficulty can be predicted by analysing the features of FL networks(FLNs).Although it is well-known that FLs are related to networks,no predictive measure has been proposed based on the features of FLNs. A number of global features of complex networks,such as the small-world prop- erty,the clustering property,and the scale free property have been studied thoroughly. In addition to these global features,"network motifs"(Milo et al.,2002)were proposed to uncover the structural design principles and resulted in a deeper understanding of complex networks.Motifs are connected sub-graphs occurring in complex networks at numbers that are significantly higher than those in random networks.It has been shown that network motifs exist widely in various complex networks,and different complex networks have different types of motifs.Thus,network motifs are in fact an intrinsic property of complex networks,and can be used to differentiate different net- works.For example,Ghoneim et al.(2008)used network motifs to discriminate two- player strategy games.For EAs,FLNs corresponding to various problems are expected to be different in some intrinsic properties.Therefore,we propose a predictive diffi- culty measure for EAs,namely Motif Difficulty (MD),by extracting motif properties from directed FLNs.We define MD by synthesising the effect of different classes of distance motifs on the searching process.Our experimental results show that MD can quantify the difficulty of different problems into the range of-1.0(easiest)to 1.0(most difficult),and performs especially well on some counterexamples for other measures. This paper is organised as follows.Section 2 discusses related work.Preliminaries on fitness landscapes and network motifs are given in Section 3.Section 4 presents the definition of motifs in FLNs,and Section 5 presents a qualification of problem difficulty based on the motifs defined in Section 4.Experiments on exact and approximate MD are given in Sections 6 and 7,respectively.Finally,Section 8 summarises the work in this paper,and discusses both advantages and disadvantages of MD. 2 Related Work In general,the study of factors affecting the performance of EAs can be divided into two classes(Borenstein and Poli,2005a).The first class focuses on the properties of a particular algorithm,while the second class focuses on the problem itself,and partic- ularly on FLs.In the first class,the BB hypothesis(Goldberg,1989),which is famous in the Genetic Algorithm(GA)community,states that a GA tries to combine low order and highly fit schemata.Following the BB hypothesis,the notion of deception has been defined (Goldberg,1989;Forrest and Mitchell,1993).Epistasis variance (Davidor,1991) and epistasis correlation(Naudts,1998)try to assess the GA-hardness of problems from the perspective of theoretical genetics. In the second class,methods focus on using statistical properties of FLs to char- acterise problem difficulty.The first study in this class proposed isolation(needle-in- a-haystack)(Forrest and Mitchell,1993)and multimodality (Davidor,1991).The other popular method is Fitness Distance Correlation (Jones and Forrest,1995),which mea- sures the hardness of a landscape according to the correlation between the distance from the optimum and the fitness value of the solution.However,none of the available 2 Evolutionary Computation Volume x,Number x
J. Liu, H. A. Abbass, D. G. Green, and W. Zhong between search spaces and fitness spaces. With the property of neighbourhood structure in mind and at some level of granularity, each FL can form a network. In such a network, each node corresponds to a point in the search space, and each edge connects one node to one of its neighbours. The fitness value can be viewed as the weight of each node. Therefore, different problems correspond to different networks, and the process of EAs solving problems is equivalent to navigating through these networks. From this viewpoint, problem difficulty can be predicted by analysing the features of FL networks (FLNs). Although it is well-known that FLs are related to networks, no predictive measure has been proposed based on the features of FLNs. A number of global features of complex networks, such as the small-world property, the clustering property, and the scale free property have been studied thoroughly. In addition to these global features, “network motifs” (Milo et al., 2002) were proposed to uncover the structural design principles and resulted in a deeper understanding of complex networks. Motifs are connected sub-graphs occurring in complex networks at numbers that are significantly higher than those in random networks. It has been shown that network motifs exist widely in various complex networks, and different complex networks have different types of motifs. Thus, network motifs are in fact an intrinsic property of complex networks, and can be used to differentiate different networks. For example, Ghoneim et al. (2008) used network motifs to discriminate twoplayer strategy games. For EAs, FLNs corresponding to various problems are expected to be different in some intrinsic properties. Therefore, we propose a predictive diffi- culty measure for EAs, namely Motif Difficulty (MD), by extracting motif properties from directed FLNs. We define MD by synthesising the effect of different classes of distance motifs on the searching process. Our experimental results show that MD can quantify the difficulty of different problems into the range of −1.0 (easiest) to 1.0 (most difficult), and performs especially well on some counterexamples for other measures. This paper is organised as follows. Section 2 discusses related work. Preliminaries on fitness landscapes and network motifs are given in Section 3. Section 4 presents the definition of motifs in FLNs, and Section 5 presents a qualification of problem difficulty based on the motifs defined in Section 4. Experiments on exact and approximate MD are given in Sections 6 and 7, respectively. Finally, Section 8 summarises the work in this paper, and discusses both advantages and disadvantages of MD. 2 Related Work In general, the study of factors affecting the performance of EAs can be divided into two classes (Borenstein and Poli, 2005a). The first class focuses on the properties of a particular algorithm, while the second class focuses on the problem itself, and particularly on FLs. In the first class, the BB hypothesis (Goldberg, 1989), which is famous in the Genetic Algorithm (GA) community, states that a GA tries to combine low order and highly fit schemata. Following the BB hypothesis, the notion of deception has been defined (Goldberg, 1989; Forrest and Mitchell, 1993). Epistasis variance (Davidor, 1991) and epistasis correlation (Naudts, 1998) try to assess the GA-hardness of problems from the perspective of theoretical genetics. In the second class, methods focus on using statistical properties of FLs to characterise problem difficulty. The first study in this class proposed isolation (needle-ina-haystack) (Forrest and Mitchell, 1993) and multimodality (Davidor, 1991). The other popular method is Fitness Distance Correlation (Jones and Forrest, 1995), which measures the hardness of a landscape according to the correlation between the distance from the optimum and the fitness value of the solution. However, none of the available 2 Evolutionary Computation Volume x, Number x
Motif Difficulty:A Problem Difficulty Measure measures fully achieved success.Isolation might be sufficient,but it is not a necessary condition for a landscape to be difficult to search.Multimodality is neither necessary nor sufficient for a landscape to be difficult to search(Kallel et al.,2001).FDC has achieved some success,but is still not able to predict the performance in some scenar- ios(Naudts and Kallel,2000;Jansen,2001). In addition to FLs,Borenstein and Poli(2005a)pointed out that a limitation of the original FL approach is that it does not provide a way to quantify the amount of infor- mation available in a landscape nor to asses its quality.Thus,they proposed Informa- tion Landscapes(ILs)based on tournament selection in GAs.Using ILs,they proposed a method to predict GA hardness and a theoretical model to study search algorithms (Borenstein and Poli,2005b,c).Based on the observation that FLs can actually form net- works,Ochoa et al.(2008)proposed a network characterisation of combinatorial FLs, and use the well-known family of NK landscapes as an example,and exhaustively ex- tract local optima networks on NK landscape instances.This work is the first attempt at using network analysis techniques in connection with the study of FLs and problem difficulty.However,they did not propose predictive measures. He et al.(2007)gave a rigorous definition of difficulty measures in black-box op- timisation,and proved that in general predictive difficulty measures that run in poly- nomial time do not exist unless certain complexity-theoretical assumptions are wrong. However,there are still some successful applications of using predictive difficulty mea- sures to guide the design of new algorithms.For example,Merz and Freisleben(2000) and Tavares et al.(2008)conducted a FL analysis for quadratic assignment problems and multidimensional knapsack problems(MKPs),respectively;Yang et al.(2006)pre- sented an attempt at characterising the search space difficulties in red teaming by using fitness landscapes. 3 Preliminaries 3.1 Fitness Landscapes Formally,a fitness landscape FL is defined by a tuple of three components: FL=(S,f,N) (1) where S is the search space.In this paper,we consider only a finite discrete search space for combinatorial optimisation problems.The fitness function f:S-R assigns a numeric value to each configuration in S,and we consider maximisation problems here.N is a neighbourhood structure defined over S as follows. s∈S,N(s)={y∈SlP(y∈operator(s)>O} (2) where P(e)denotes the occurrence probability of event e,and operator(s)denotes the set of configurations that can be obtained by performing operator on s,namely the set of neighbours of s.Since EAs are determined by various operators,the above neigh- bourhood structure defined over operators reflects the features of different EAs,and correspondingly,these features can be reflected in the resulted FLs. If we treat each configuration in the search space as a node,and link each node to all its neighbours,then each FL can be mapped into a network,namly a fitness land- scape network FLN,which is also defined by a tuple of three components: FLN =(V,E,f) (3) Evolutionary Computation Volume x,Number x 3
Motif Difficulty: A Problem Difficulty Measure measures fully achieved success. Isolation might be sufficient, but it is not a necessary condition for a landscape to be difficult to search. Multimodality is neither necessary nor sufficient for a landscape to be difficult to search (Kallel et al., 2001). FDC has achieved some success, but is still not able to predict the performance in some scenarios (Naudts and Kallel, 2000; Jansen, 2001). In addition to FLs, Borenstein and Poli (2005a) pointed out that a limitation of the original FL approach is that it does not provide a way to quantify the amount of information available in a landscape nor to asses its quality. Thus, they proposed Information Landscapes (ILs) based on tournament selection in GAs. Using ILs, they proposed a method to predict GA hardness and a theoretical model to study search algorithms (Borenstein and Poli, 2005b,c). Based on the observation that FLs can actually form networks, Ochoa et al. (2008) proposed a network characterisation of combinatorial FLs, and use the well-known family of NK landscapes as an example, and exhaustively extract local optima networks on NK landscape instances. This work is the first attempt at using network analysis techniques in connection with the study of FLs and problem difficulty. However, they did not propose predictive measures. He et al. (2007) gave a rigorous definition of difficulty measures in black-box optimisation, and proved that in general predictive difficulty measures that run in polynomial time do not exist unless certain complexity-theoretical assumptions are wrong. However, there are still some successful applications of using predictive difficulty measures to guide the design of new algorithms. For example, Merz and Freisleben (2000) and Tavares et al. (2008) conducted a FL analysis for quadratic assignment problems and multidimensional knapsack problems (MKPs), respectively; Yang et al. (2006) presented an attempt at characterising the search space difficulties in red teaming by using fitness landscapes. 3 Preliminaries 3.1 Fitness Landscapes Formally, a fitness landscape FL is defined by a tuple of three components: FL = (S, f, N) (1) where S is the search space. In this paper, we consider only a finite discrete search space for combinatorial optimisation problems. The fitness function f : S → R assigns a numeric value to each configuration in S, and we consider maximisation problems here. N is a neighbourhood structure defined over S as follows. ∀s ∈ S, N(s) = {y ∈ S|P(y ∈ operator(s)) > 0} (2) where P(e) denotes the occurrence probability of event e, and operator(s) denotes the set of configurations that can be obtained by performing operator on s, namely the set of neighbours of s. Since EAs are determined by various operators, the above neighbourhood structure defined over operators reflects the features of different EAs, and correspondingly, these features can be reflected in the resulted FLs. If we treat each configuration in the search space as a node, and link each node to all its neighbours, then each FL can be mapped into a network, namly a fitness landscape network FLN, which is also defined by a tuple of three components: FLN = (V, E, f) (3) Evolutionary Computation Volume x, Number x 3
J.Liu,H.A.Abbass,D.G.Green,and W.Zhong >≥>2≥ >>>>≥房 Figure 1:All 13 types of three-node connected subgraphs(Milo et al 2002). where V is the finite set of nodes and each node corresponds to a configuration in S. f is the same as that in (1).E is a set of undirected edges and each edge connects one configuration to one of its neighbours;that is, E={x,y}lx,y∈SA(x∈N(y)Vy∈Nx)} (4) 3.2 Network Motifs Network motifs are patterns of inter-connections,which can be reflected by connected subgraphs(Milo et al.,2002).Milo et al.(2002)showed that there are 13 types of three- node connected subgraphs(Figure 1)for all directed networks.In a directed network, a pattern of inter-connections can be viewed as a network motif only when its number of occurrences in this network is significantly higher than that in the corresponding random network.More specifically,network motifs are those connected subgraphs for which the probability of appearing in a random network an equal or greater number of times than in the real network is lower than a cutoff value 0.01(Milo et al.,2002). To detect n-node network motifs in a network,the network was scanned for all of the possible n-node subgraphs,and the number of occurrences of each subgraph was recorded(Milo et al.,2002).Milo et al.(2002)showed that several networks(i.e., gene regulation,neurons,food webs,electronic circuits,and world wide web)exhibit different types of network motifs,and frequencies of different network motifs vary from one network to another.Motifs may thus define universal classes of networks, and are basic building blocks of most networks.Therefore,network motifs have been widely used in studying complex systems and in characterising features on the system level by analysing locally how the substructures are formed. 4 Motifs in Directed Fitness Landscape Networks In EAs,when an individual moves from one node to its neighbour under the effect of an operator,it is equivalent to exploring the local structures or subgraphs.Thus, subgraphs,which can be reflected by motifs,that an EA has visited during the whole evolutionary process have a close relationship with its performance.Since network motifs are just connected subgraphs,obviously,the number of possible motifs in undi- rected networks is much lower than that in directed networks.Moreover,the topology of FLNs built by the same operator is identical if the fitness value of each node is ig- nored.Thus,we first convert undirected FLNs to directed FLNs(DFLNs)by making use of fitness values so that more different types of network motifs can be extracted. Second,we propose another way to consider the 13 types of possible three-nodes mo- tifs presented in Figure 1,and define six types of basic motifs.Finally,based on these basic motifs,a new type of motifs,namely distance motifs,is designed by taking global optima as references. Evolutionary Computation Volume x,Number x
J. Liu, H. A. Abbass, D. G. Green, and W. Zhong Figure 1: All 13 types of three-node connected subgraphs (Milo et al., 2002). where V is the finite set of nodes and each node corresponds to a configuration in S. f is the same as that in (1). E is a set of undirected edges and each edge connects one configuration to one of its neighbours; that is, E = {{x, y}|x, y ∈ S ∧ (x ∈ N(y) ∨ y ∈ N(x))} (4) 3.2 Network Motifs Network motifs are patterns of inter-connections, which can be reflected by connected subgraphs (Milo et al., 2002). Milo et al. (2002) showed that there are 13 types of threenode connected subgraphs (Figure 1) for all directed networks. In a directed network, a pattern of inter-connections can be viewed as a network motif only when its number of occurrences in this network is significantly higher than that in the corresponding random network. More specifically, network motifs are those connected subgraphs for which the probability of appearing in a random network an equal or greater number of times than in the real network is lower than a cutoff value 0.01 (Milo et al., 2002). To detect n-node network motifs in a network, the network was scanned for all of the possible n-node subgraphs, and the number of occurrences of each subgraph was recorded (Milo et al., 2002). Milo et al. (2002) showed that several networks (i.e., gene regulation, neurons, food webs, electronic circuits, and world wide web) exhibit different types of network motifs, and frequencies of different network motifs vary from one network to another. Motifs may thus define universal classes of networks, and are basic building blocks of most networks. Therefore, network motifs have been widely used in studying complex systems and in characterising features on the system level by analysing locally how the substructures are formed. 4 Motifs in Directed Fitness Landscape Networks In EAs, when an individual moves from one node to its neighbour under the effect of an operator, it is equivalent to exploring the local structures or subgraphs. Thus, subgraphs, which can be reflected by motifs, that an EA has visited during the whole evolutionary process have a close relationship with its performance. Since network motifs are just connected subgraphs, obviously, the number of possible motifs in undirected networks is much lower than that in directed networks. Moreover, the topology of FLNs built by the same operator is identical if the fitness value of each node is ignored. Thus, we first convert undirected FLNs to directed FLNs (DFLNs) by making use of fitness values so that more different types of network motifs can be extracted. Second, we propose another way to consider the 13 types of possible three-nodes motifs presented in Figure 1, and define six types of basic motifs. Finally, based on these basic motifs, a new type of motifs, namely distance motifs, is designed by taking global optima as references. 4 Evolutionary Computation Volume x, Number x
Motif Difficulty:A Problem Difficulty Measure 4.1 Directed Fitness Landscape Networks and Basic Motifs Definition 1:Define a fitness landscape network as FLN =(V,E,f),then the corre- sponding Directed Fitness Landscape Network is: 瓦N=(V,E) (5) where E={(x,y)lx,y}EAf(x)<f(y)}is a set of directed edges. Given two nodes x and y,y is the neighbour of x as long as the probability of converting x to y by operator is larger than 0,then the edge {x,y}exists in the corre- sponding network.In this way,a complete graph will be generated when using the mutation operator in which each bit will be flipped with probability 1/n for a binary string of length n(labelled as 1/n mutation in the following text),and the probabilities of transforming this string to different neighbours are not always the same.These incur some problems in detecting network motifs.First,if an operator leads to a complete graph,the topology of fitness landscapes for all problems is the same.Second,if the probabilities of transforming a string to different neighbours are not always the same, which is equivalent to the fact that edges have different weights,the edges in a motif may have different importance,and we cannot just count the number of each type of motifs.Therefore,in the following text,for the sake of simplicity,the operator used to build DFLNs must satisfy the conditions in (6)and (7).That is to say,the number of neighbours of each node must be much smaller than the number of nodes in the network,and the probabilities of converting one node to different neighbours must be identical.Edges in DFLNs use only the relative difference between the fitness val- ues of two nodes rather than the absolute fitness values.Thus,DFLNs are suitable for analysing comparison-based search algorithms,such as(1+1)EA(Droste et al.,2002). x∈V,N(xI《V (6) Vy,ZE N(x),P(operator(x)=y)=P(operator(x)=z) (7) The original network motifs are defined on the difference in frequency from ran- dom networks(Milo et al.,2002).However,random FLNs are built from a random fitness function.The difficulty of this random fitness function for EAs is also a part of study,so we cannot take it as a reference to detect network motifs in DFLNs.In fact,if we further check the 13 types of possible three-node motifs listed in Figure 1,we can find that motifs of types 1,2,3,4,7,and 8 are subsets of motifs of types of 5,6,9,10, 11,12,or 13.Therefore,in this study,for a three-node subgraph,we consider only the edges between two pairs of nodes,namely motifs of types 1,2,3,4,7,and 8 in Figure 1,which are named as Basic Motifs. Definition 2:Given a directed fitness landscape network FLN=(V,E),a Basic Motif,labelled as Mb,in FLN is a connected three-node subgraph, M={VMb={n1,2,3}CV,EMb={(h,2),(2,n1),(2,g),(n3,2)}nE}(⑧) To let Mo be a connected subgraph,EM should satisfy (1,2)∈EMV(2,m1)∈EMb)A(2,g)∈EMV(n3,2)∈EMb)(9) Evolutionary Computation Volume x,Number x
Motif Difficulty: A Problem Difficulty Measure 4.1 Directed Fitness Landscape Networks and Basic Motifs Definition 1: Define a fitness landscape network as FLN = (V, E, f), then the corresponding Directed Fitness Landscape Network is: −−→FLN = (V, −→E ) (5) where −→E = {(x, y)|{x, y} ∈ E ∧ f(x) ≤ f(y)} is a set of directed edges. Given two nodes x and y, y is the neighbour of x as long as the probability of converting x to y by operator is larger than 0, then the edge {x, y} exists in the corresponding network. In this way, a complete graph will be generated when using the mutation operator in which each bit will be flipped with probability 1/n for a binary string of length n (labelled as 1/n mutation in the following text), and the probabilities of transforming this string to different neighbours are not always the same. These incur some problems in detecting network motifs. First, if an operator leads to a complete graph, the topology of fitness landscapes for all problems is the same. Second, if the probabilities of transforming a string to different neighbours are not always the same, which is equivalent to the fact that edges have different weights, the edges in a motif may have different importance, and we cannot just count the number of each type of motifs. Therefore, in the following text, for the sake of simplicity, the operator used to build DFLNs must satisfy the conditions in (6) and (7). That is to say, the number of neighbours of each node must be much smaller than the number of nodes in the network, and the probabilities of converting one node to different neighbours must be identical. Edges in DFLNs use only the relative difference between the fitness values of two nodes rather than the absolute fitness values. Thus, DFLNs are suitable for analysing comparison-based search algorithms, such as (1+1)EA (Droste et al., 2002). ∀x ∈ V, |N(x)| ¿ |V| (6) ∀y, z ∈ N(x), P(operator(x) = y) = P(operator(x) = z) (7) The original network motifs are defined on the difference in frequency from random networks (Milo et al., 2002). However, random FLNs are built from a random fitness function. The difficulty of this random fitness function for EAs is also a part of study, so we cannot take it as a reference to detect network motifs in DFLNs. In fact, if we further check the 13 types of possible three-node motifs listed in Figure 1, we can find that motifs of types 1, 2, 3, 4, 7, and 8 are subsets of motifs of types of 5, 6, 9, 10, 11, 12, or 13. Therefore, in this study, for a three-node subgraph, we consider only the edges between two pairs of nodes, namely motifs of types 1, 2, 3, 4, 7, and 8 in Figure 1, which are named as Basic Motifs. Definition 2: Given a directed fitness landscape network −−→FLN = (V, −→E ), a Basic Motif, labelled as Mb , in −−→FLN is a connected three-node subgraph, Mb = {VMb = {n1, n2, n3} ⊂ V, −→E Mb = {(n1, n2), (n2, n1), (n2, n3), (n3, n2)} ∩ −→E } (8) To let Mb be a connected subgraph, −→E Mb should satisfy ((n1, n2) ∈ −→E Mb ∨ (n2, n1) ∈ −→E Mb ) ∧ ((n2, n3) ∈ −→E Mb ∨ (n3, n2) ∈ −→E Mb ) (9) Evolutionary Computation Volume x, Number x 5
J.Liu,H.A.Abbass,D.G.Green,and W.Zhong Table 1:The relationship between basic motifs and distance motifs Type of basic motifs Class of Distance Motifs No. Motifs Condition Class Condition Class Groupo Neutral Core dg≥d2 Guide d≥d Guide Group Core dad Deceptive Core d2≥dg Guide d≥d Guide Group? d2d3 Core Deceptive Deceptive Guide d≥d2and Core d1≥d3 Group3 d≥dg Guide di>d2 and dd3 Deceptive d2≥d1andd2≥ds Guide Group4 ● ● d2<d or d2<ds Deceptive d1≥d2 and d3≥d2 Guide Groups d<d2 or d3<d2 Deceptive Clearly,based on the edges between {n,n2}and {n2,n3},there are six types of basic motifs in total for all DFLNs,which are labelled as Groupo-Groups and shown in the first two columns of Table 1.These six types of basic motifs can be viewed as basic building blocks of a DFLN. 4.2 Distance Motifs An selection mechanism in EAs,such as binary tournament selection,causes the search heuristic to prefer high fitness regions.Thus,the problems would be easy if high fitness regions are close to global optima.On the contrary,if high fitness regions are far from global optima,the selection mechanism may lead the search heuristic in a wrong direc- tion.This indicates that the distance between candidate solutions and global optima is another important factor that impacts EAs'performance.Thus,we use the distance information to refine basic motifs.There are a number of ways to define the distance between a candidate solution and a global optimum.Naturally,the one that is most suitable for the searching process on fitness landscapes is the shortest path length.To calculate the shortest path length,we need to identify the nodes in a network that correspond to the global optima.However,for indirect encoding methods,like the per- mutation encoding for MKPs,the computational cost to match the points in the search 6 Evolutionary Computation Volume x,Number x
J. Liu, H. A. Abbass, D. G. Green, and W. Zhong Table 1: The relationship between basic motifs and distance motifs. Type of basic motifs Class of Distance Motifs No. Motifs Condition Class Condition Class Group0 Neutral Group1 d3 ≥ d2 Guide d3 ≥ d1 Core Guide d3 d1 Core Deceptive Group2 d2 ≥ d3 Guide d1 ≥ d3 Core Guide d2 d3 Core Deceptive Group3 d1 ≥ d3 Guide d1 ≥ d2 and Core d2 ≥ d3 Guide d1 d2 and Core d2 > d3 Deceptive Group4 d2 ≥ d1 and d2 ≥ d3 Guide d2 < d1 or d2 < d3 Deceptive Group5 d1 ≥ d2 and d3 ≥ d2 Guide d1 < d2 or d3 < d2 Deceptive Clearly, based on the edges between {n1, n2} and {n2, n3}, there are six types of basic motifs in total for all DFLNs, which are labelled as Group0 - Group5 and shown in the first two columns of Table 1. These six types of basic motifs can be viewed as basic building blocks of a DFLN. 4.2 Distance Motifs An selection mechanism in EAs, such as binary tournament selection, causes the search heuristic to prefer high fitness regions. Thus, the problems would be easy if high fitness regions are close to global optima. On the contrary, if high fitness regions are far from global optima, the selection mechanism may lead the search heuristic in a wrong direction. This indicates that the distance between candidate solutions and global optima is another important factor that impacts EAs’ performance. Thus, we use the distance information to refine basic motifs. There are a number of ways to define the distance between a candidate solution and a global optimum. Naturally, the one that is most suitable for the searching process on fitness landscapes is the shortest path length. To calculate the shortest path length, we need to identify the nodes in a network that correspond to the global optima. However, for indirect encoding methods, like the permutation encoding for MKPs, the computational cost to match the points in the search 6 Evolutionary Computation Volume x, Number x
Motif Difficulty:A Problem Difficulty Measure space to the global optima may be quite high.For binary encoding,since the expected number of bits that are flipped in the 1/n mutation is 1,we use the 1-bit flip operator to build the DFLN,and then the Hamming distance is used as the distance measure, which is equivalent to the shortest path length in this case.For other encoding meth- ods,if the computational cost to identify the nodes corresponding to global optima and calculate the shortest path length is feasible,the shortest path length can be used as the distance measure;otherwise,an approximate way should be designed. Definition 3:Given a basic motif,M={Vb=(m1,n2,n3),Eme),the corre- sponding Distance Motif is Ma={VM,E,{di,d2,ds}),where di,d2,and d3 are respectively the distances attached to n,n2,and n. When the searching process visits a basic motif from m to n2 to n3,we can check di, d2,and d3 to see whether the searching process is heading towards the right direction or not.Clearly,when we check a direction is correct or not,we should take into account the information of both fitness and distance.Since the information of fitness is reflected by edges in DFLNs,we first define paths in a distance motif.Then,based on their contributions to the searching process,distance motifs are divided into three classes. Definition 4: Given a distance motif,Ma {VMa (m,n2,n3},Emd,{d,d2,d3)).For Vni,nj E VMd,if ni can reach nj by only visiting edges in Ev4,then there is a Path between n;and n.Furthermore,for each edge on a path in Ma,if the edge with inverse direction does not exist,then this path is an Effective Path. Definition 5:Given a distance motif,Md ={VMa,Em4,{di,d2,d3)).If there is no effective paths in Ma,then Md is a Neutral Motif.If all effective paths with the largest length(the number of edges in the path)in Md satisfies that the distance of the start node is not less than that of the end node,then Ma is a Guide Motif;otherwise,it is a Deceptive Motif. In a path,the node with lower fitness value always points to the node with higher fitness value.When the distance of an end node on a path is not larger than that of the other end node,then it means that when the fitness value increases,the distance decreases.In such a case,the motif has a positive effect on the searching process,thus, it is a guide motif.Table 1 illustrates how the three classes of distance motifs are related to the six types of basic motifs.For Groupo,since the set of effective paths is empty,all motifs in Groupo are neutral motifs.For Group,the only effective path is between #3 and n2.Thus,if d3 d2,then they are guide motifs;otherwise,deceptive motifs.The case for Group?is similar.For Group,there are three effective paths,namely the paths between n and n2,n2 and n3,and m and n3,whose lengths are 1,1,and 2,respectively, and only the path with the largest length need to be considered.Thus,if di>d3,then they are guide motifs;otherwise,deceptive motifs.For Group,there are two effective paths,namely the paths between n2 and n1,n2 and n3,whose lengths are both 1,and both of them need to be considered.Thus,if both d2>di and d2>d3,then they are guide motifs;otherwise,deceptive motifs.The case for Group,is similar. If we further analyse guide and deceptive motifs,paths with length two exist in some cases.On such paths,if the difference in distances and fitness values of each pair of nodes connected by an edge is consistent with that of two end nodes,then the corresponding motifs have a more explicit impact on the searching process.Thus,we further define two sub-classes of guide and deceptive motifs. Definition 6:Given a guide motif,MG=[VMc,EMc,{di,d2,d3)).If there exists a path with length two,labelled as {(mi,nk),(nk,nj)},where ni,nk,njE VMG and diz dj,satisfies the conditions that di>dk when (nk,ni)g Emc and dk >dj when Evolutionary Computation Volume x,Number x
Motif Difficulty: A Problem Difficulty Measure space to the global optima may be quite high. For binary encoding, since the expected number of bits that are flipped in the 1/n mutation is 1, we use the 1-bit flip operator to build the DFLN, and then the Hamming distance is used as the distance measure, which is equivalent to the shortest path length in this case. For other encoding methods, if the computational cost to identify the nodes corresponding to global optima and calculate the shortest path length is feasible, the shortest path length can be used as the distance measure; otherwise, an approximate way should be designed. Definition 3: Given a basic motif, Mb = {VMb = {n1, n2, n3}, −→E Mb }, the corresponding Distance Motif is Md = {VMb , −→E Mb , {d1, d2, d3}}, where d1, d2, and d3 are respectively the distances attached to n1, n2, and n3. When the searching process visits a basic motif from n1 to n2 to n3, we can check d1, d2, and d3 to see whether the searching process is heading towards the right direction or not. Clearly, when we check a direction is correct or not, we should take into account the information of both fitness and distance. Since the information of fitness is reflected by edges in DFLNs, we first define paths in a distance motif. Then, based on their contributions to the searching process, distance motifs are divided into three classes. Definition 4: Given a distance motif, Md = {VMd = {n1, n2, n3}, −→E Md , {d1, d2, d3}}. For ∀ni , nj ∈ VMd , if ni can reach nj by only visiting edges in −→E Md , then there is a Path between ni and nj . Furthermore, for each edge on a path in Md , if the edge with inverse direction does not exist, then this path is an Effective Path. Definition 5: Given a distance motif, Md = {VMd , −→E Md , {d1, d2, d3}}. If there is no effective paths in Md , then Md is a Neutral Motif. If all effective paths with the largest length (the number of edges in the path) in Md satisfies that the distance of the start node is not less than that of the end node, then Md is a Guide Motif; otherwise, it is a Deceptive Motif. In a path, the node with lower fitness value always points to the node with higher fitness value. When the distance of an end node on a path is not larger than that of the other end node, then it means that when the fitness value increases, the distance decreases. In such a case, the motif has a positive effect on the searching process, thus, it is a guide motif. Table 1 illustrates how the three classes of distance motifs are related to the six types of basic motifs. For Group0 , since the set of effective paths is empty, all motifs in Group0 are neutral motifs. For Group1 , the only effective path is between n3 and n2. Thus, if d3 ≥ d2, then they are guide motifs; otherwise, deceptive motifs. The case for Group2 is similar. For Group3 , there are three effective paths, namely the paths between n1 and n2, n2 and n3, and n1 and n3, whose lengths are 1, 1, and 2, respectively, and only the path with the largest length need to be considered. Thus, if d1 ≥ d3, then they are guide motifs; otherwise, deceptive motifs. For Group4 , there are two effective paths, namely the paths between n2 and n1, n2 and n3, whose lengths are both 1, and both of them need to be considered. Thus, if both d2 ≥ d1 and d2 ≥ d3, then they are guide motifs; otherwise, deceptive motifs. The case for Group5 is similar. If we further analyse guide and deceptive motifs, paths with length two exist in some cases. On such paths, if the difference in distances and fitness values of each pair of nodes connected by an edge is consistent with that of two end nodes, then the corresponding motifs have a more explicit impact on the searching process. Thus, we further define two sub-classes of guide and deceptive motifs. Definition 6: Given a guide motif, MG = {VMG , −→E MG , {d1, d2, d3}}. If there exists a path with length two, labelled as {(ni , nk), (nk, nj )}, where ni , nk, nj ∈ VMG and di ≥ dj , satisfies the conditions that di ≥ dk when (nk, ni) 6∈ −→E MG and dk ≥ dj when Evolutionary Computation Volume x, Number x 7
J.Liu,H.A.Abbass,D.G.Green,and W.Zhong (nj,nk)EMG,then MG is a Core Guide Motif. Definition 7:Given a deceptive motif,M=(VD,ED,{d,d2,d3}).If there exists a path with length two,labelled as {(ni,n),(nk,nj)},where ni,nk,n;E VMD and dif(n)→(,n)是EMd (13) Thus,Md belongs to Groups,Group,or Groups. Case 1:Md belongs to Group3. Evolutionary Computation Volume x,Number x
J. Liu, H. A. Abbass, D. G. Green, and W. Zhong (nj , nk) 6∈ −→E MG , then MG is a Core Guide Motif. Definition 7: Given a deceptive motif, MD = {VMD , −→E MD , {d1, d2, d3}}. If there exists a path with length two, labelled as {(ni , nk), (nk, nj )}, where ni , nk, nj ∈ VMD and di f(ni) ⇒ (nj , ni) 6∈ −→E Md (13) Thus, Md belongs to Group3 , Group4 , or Group5 . Case 1: Md belongs to Group3 . 8 Evolutionary Computation Volume x, Number x
Motif Difficulty:A Problem Difficulty Measure Ewa={(1,2),(2,3)}→f(1)d2>d3;that is,Md is a core guide motif. Case 2:Md belongs to Group. Exa={(n2,m),(n2,n3)}f(n2)di and d2>d3;that is,Md is a guide motif. Case 3:Md belongs to Groups. EMa={,),(n,2}→fh)d2 and d3>d2;that is,Md is a guide motif. To summarise,in all the three cases,Md is a guide motif. ◇ Before we give Proposition 2,we first prove some properties of DFLNs built by the 1-bit flip operator. Lemma 1:In a fitness landscape network built by the 1-bit flip operator,(1)the number of edges connecting any three nodes is less than three,(2)each node belongs to 3n()basic motifs where n is the dimension of the search space,(3)the number of all basic motifs is 2n-n(n-1). Proof.(1)Suppose there are three nodes {n1,n2,n3}connected by three edges. Without loss of generality,let the set of edges connecting these three nodes be n,n2,n2,n3,n3,n,and the configurations corresponding to these three nodes be s1,s2,s3.Then,we have H(s1,s3)=1 (17) where H(,)denotes the Hamming distance.But we also have H(s1,s2)=1 and H(s2,s3)=1 H(s1,s3)=0 or 2 (18) Clearly,(17)and(18)contradict each other.Thus,the number of edges connecting any three nodes is less than three. (2)The number of edges connected to each node is n.According to Lemmal(1), any three nodes can form only 1 basic motif at most.Thus,all distance motifs that a node m belongs to can be divided into two cases;that is,(a)both n2 and n3 connect to n and (b)n connects to n2 and n2 connects to n3.The number of basic motifs that n belongs to is )+n(n-1)=3nm- (19) 2 (3)The proof for Lemmal(2)shows that basic motifs that each node belongs to can be divided into two cases.In fact,Case 2 for n is equivalent to Case 1 for n2. Therefore,to prevent repetition,we need to count only Case 1 for each node to calculate Evolutionary Computation Volume x,Numberx 9
Motif Difficulty: A Problem Difficulty Measure −→E Md = {(n1, n2), (n2, n3)} ⇒ f(n1) d2 > d3; that is, Md is a core guide motif. Case 2: Md belongs to Group4 . −→E Md = {(n2, n1), (n2, n3)} ⇒ f(n2) d1 and d2 > d3; that is, Md is a guide motif. Case 3: Md belongs to Group5 . −→E Md = {(n1, n2), (n3, n2)} ⇒ f(n1) d2 and d3 > d2; that is, Md is a guide motif. To summarise, in all the three cases, Md is a guide motif. Before we give Proposition 2, we first prove some properties of DFLNs built by the 1-bit flip operator. Lemma 1: In a fitness landscape network built by the 1-bit flip operator, (1) the number of edges connecting any three nodes is less than three, (2) each node belongs to 3n(n−1) 2 basic motifs where n is the dimension of the search space, (3) the number of all basic motifs is 2 n−1n(n − 1). Proof. (1) Suppose there are three nodes {n1, n2, n3} connected by three edges. Without loss of generality, let the set of edges connecting these three nodes be {{n1, n2}, {n2, n3}, {n3, n1}}, and the configurations corresponding to these three nodes be s1, s2, s3. Then, we have H(s1, s3) = 1 (17) where H(·, ·) denotes the Hamming distance. But we also have H(s1, s2) = 1 and H(s2, s3) = 1 ⇒ H(s1, s3) = 0 or 2 (18) Clearly, (17) and (18) contradict each other. Thus, the number of edges connecting any three nodes is less than three. (2) The number of edges connected to each node is n. According to Lemma1(1), any three nodes can form only 1 basic motif at most. Thus, all distance motifs that a node n1 belongs to can be divided into two cases; that is, (a) both n2 and n3 connect to n1 and (b) n1 connects to n2 and n2 connects to n3. The number of basic motifs that n1 belongs to is ( n 2 ) + n(n − 1) = 3n(n − 1) 2 (19) (3) The proof for Lemma1(2) shows that basic motifs that each node belongs to can be divided into two cases. In fact, Case 2 for n1 is equivalent to Case 1 for n2. Therefore, to prevent repetition, we need to count only Case 1 for each node to calculate Evolutionary Computation Volume x, Number x 9
J.Liu,H.A.Abbass,D.G.Green,and W.Zhong the number of basic motifs in the whole network.Since the number of nodes in the search space is 2",the number of all basic motifs is 2"()=2m-1n(n-1) (20) ◇ Proposition 2:Given a fitness landscape,FL ={S,f,N},where S is composed of binary strings with length n,f=NIAH,and the 1-bit flip operator is used to construct N.Then,the fraction of neutral motifs in the corresponding directed fitness landscape network is equal to or greater than 1-3 x 2-7. Proof.Given a distance motif Md,and the configurations corresponding to three nodes in Md are not the global optimum,then the fitness values of these three nodes are 0. Thus,Md belongs to Groupo,namely a neutral motif. That is to say,only motifs that include the node corresponding to the global opti- mum,labelled as n",can be a guide or deceptive motif.According to Lemma 1(2),the number of motifs that nbelongs to is 3n(.Thus,the number of neutral motifs is 2 no less than 3n(n-1 3 1-2n-1nn-d=1-2a (21) ▣ Proposition 3:Given a fitness landscape,FL ={S,f,N},where S is composed of binary strings with length n,f=TRAP,and the 1-bit flip operator is used to construct N;then,the fraction of deceptive motifs in the corresponding directed fitness landscape network is equal to or greater than 1-3 x 2-". Proof.Let a distance motif in the corresponding DFLN be Md {(m,n2,n3},Em,{di,d2,d3}},and the configurations corresponding to m,n2, n3 are not the global optimum,namely the string of all 0'.Let the configuration corresponding to ni,i=1,2,3,be (si,si2,...,sin),Then f)=∑s (22) In fact,this is equivalent to ONEMAX,and according to the proof for Proposition 1,we have that Md belongs to Groups,Group,or Groups. Case 1:Md belongs to Group3.According to(14),we obtained that the number of'1'in m is smaller than that of n2,and that of n2 is smaller than that of n3.Since the global optimum is the string of all '0',we have didi and d2>d3;that is,Md is a deceptive motif. 10 Evolutionary Computation Volume x,Number x
J. Liu, H. A. Abbass, D. G. Green, and W. Zhong the number of basic motifs in the whole network. Since the number of nodes in the search space is 2 n, the number of all basic motifs is 2 n ( n 2 ) = 2n−1n(n − 1) (20) Proposition 2: Given a fitness landscape, FL = {S, f, N}, where S is composed of binary strings with length n, f=NIAH , and the 1-bit flip operator is used to construct N. Then, the fraction of neutral motifs in the corresponding directed fitness landscape network is equal to or greater than 1 − 3 × 2 −n. Proof. Given a distance motif Md , and the configurations corresponding to three nodes in Md are not the global optimum, then the fitness values of these three nodes are 0. Thus, Md belongs to Group0 , namely a neutral motif. That is to say, only motifs that include the node corresponding to the global optimum, labelled as n ∗ , can be a guide or deceptive motif. According to Lemma 1(2), the number of motifs that n ∗ belongs to is 3n(n−1) 2 . Thus, the number of neutral motifs is no less than 1 − 3n(n−1) 2 2 n−1n(n − 1) = 1 − 3 2 n (21) Proposition 3: Given a fitness landscape, FL = {S, f, N}, where S is composed of binary strings with length n, f = TRAP, and the 1-bit flip operator is used to construct N; then, the fraction of deceptive motifs in the corresponding directed fitness landscape network is equal to or greater than 1 − 3 × 2 −n. Proof. Let a distance motif in the corresponding DFLN be Md = {{n1, n2, n3}, −→E Md , {d1, d2, d3}}, and the configurations corresponding to n1, n2, n3 are not the global optimum, namely the string of all ‘0’. Let the configuration corresponding to ni , i = 1, 2, 3, be (si1, si2, · · · , sin), Then f(ni) = Xn j=1 sij (22) In fact, this is equivalent to ONEMAX , and according to the proof for Proposition 1, we have that Md belongs to Group3 , Group4 , or Group5 . Case 1: Md belongs to Group3 . According to (14), we obtained that the number of ‘1’ in n1 is smaller than that of n2, and that of n2 is smaller than that of n3. Since the global optimum is the string of all ‘0’, we have d1 d1 and d2 > d3; that is, Md is a deceptive motif. 10 Evolutionary Computation Volume x, Number x