正在加载图片...
534 LETTERSTONATURE NATURE VOL.323 9 OCTOBER 1986 -8.8 -8.8 Christopher Penetope Andrew =Christine Output unit Margaret Arthur Victoria=James Jennifer Charles Colin Charlolte 14.2 -14.2 Roberto=Maria Pierro Francesca -3.6 3.6 Gina Emilio Lucia Marco Angela Tomaso 7.2 -71 Alfonso Sophia hidden 1.1 hidden Fig.2 Two isomorphic family trees.The information can be unit -7.2 7.1 unit expressed as a set of triples of the form (person 1Xrelationship) (person 2),where the possible relationships are (father,mother, husband,wife,son,daughter,uncle,aunt,brother,sister,nephew, 3.6 -3.6 niece}.A layered net can be said to 'know'these triples if it can produce the third term of each triple when given the first two.The first two terms are encoded by activating two of the input units, -14.2 14.2 and the network must then complete the proposition by activating the output unit that represents the third term. Input units Fig.1 A network that has learned to detect mirror symmetry in ■■· the input vector.The numbers on the arcs are weights and the numbers inside the nodes are biases.The learning required 1,425 。·■■· sweeps through the set of 64 possible input vectors,with the weights being adjusted on the basis of the accumulated gradient after each sweep.The values of the parameters in equation(9)were s=0.1 and a=0.9.The initial weights were random and were uniformly distributed between-0.3 and 0.3.The key property of this solution is that for a given hidden unit,weights that are symmetric about the middle of the input vector are equal in magnitude and opposite Fig.3 Activity levels in a five-layer network after it has learned. in sign.So if a symmetrical pattern is presented,both hidden units The bottom layer has 24 input units on the left for representing will receive a net input of 0 from the input units,and,because the (person 1)and 12 input units on the right for representing the hidden units have a negative bias,both will be off.In this case the relationship.The white squares inside these two groups show the output unit,having a positive bias,will be on.Note that the weights activity levels of the units.There is one active unit in the first group on each side of the midpoint are in the ratio 1:2:4.This ensures representing Colin and one in the second group representing the that each of the eight patterns that can occur above the midpoint relationship has-aunt'.Each of the two input groups is totally sends a unique activation sum to each hidden unit,so the only connected to its own group of 6 units in the second layer.These pattern below the midpoint that can exactly balance this sum is groups learn to encode people and relationships as distributed the symmetrical one.For all non-symmetrical patterns,both hidden patterns of activity.The second layer is totally connected to the units will receive non-zero activations from the input units.The central layer of 12 units,and these are connected to the penultimate two hidden units have identical patterns of weights but with layer of 6 units.The activity in the penultimate layer must activate opposite signs,so for every non-symmetric pattern one hidden unit the correct output units,each of which stands for a particular will come on and suppress the output unit. (person 2).In this case,there are two correct answers (marked by It is not necessary to use exactly the functions given in equations black dots)because Colin has two aunts.Both the input units and (1)and(2).Any input-output function which has a bounded the output units are laid out spatially with the English people in derivative will do.However,the use of a linear function for one row and the isomorphic Italians immediately below. combining the inputs to a unit before applying the nonlinearity greatly simplifies the learning procedure. The backward pass starts by computing aE/ay for each of The aim is to find a set of weights that ensure that for each the output units.Differentiating equation (3)for a particular input vector the output vector produced by the network is the case,c,and suppressing the index c gives same as (or sufficiently close to)the desired output vector.If aE/ay=yj-d (4) there is a fixed,finite set of input-output cases,the total error in the performance of the network with a particular set of weights We can then apply the chain rule to compute aE/ax, can be computed by comparing the actual and desired output vectors for every case.The total error,E,is defined as aE/ax=aE/ay'dy/dx E=∑∑(0%e-dc)2 Differentiating equation (2)to get the value of dy/dx,and (3) substituting gives where c is an index over cases (input-output pairs),j is an aE/ax=aE/ayy(1-y) (5) index over output units,y is the actual state of an output unit and d is its desired state.To minimize E by gradient descent This means that we know how a change in the total input x to it is necessary to compute the partial derivative of E with respect an output unit will affect the error.But this total input is just a linear function of the states of the lower level units and it is to each weight in the network.This is simply the sum of the partial derivatives for each of the input-output cases.For a also a linear function of the weights on the connections,so it given case,the partial derivatives of the error with respect to is easy to compute how the error will be affected by changing each weight are computed in two passes.We have already these states and weights.For a weight wi,from i to j the derivative is described the forward pass in which the units in each layer have their states determined by the input they receive from units in oE/owit=oE/8xioxj/8w lower layers using equations (1)and(2).The backward pass which propagates derivatives from the top layer back to the =aE/axiyi (6) bottom one is more complicated. and for the output of the ith unit the contribution to aE/ay 1986 Nature Publishing Group© Nature Publishing Group 1986
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有