正在加载图片...
k tpe i h pe FIGURE 20 7 How to adapt the weights connected to ith Pe. d) a dy d dy dnet dw et=d-n)f(net)x=-ef'(net)x (20.5) where f(net)is the derivative of the nonlinearity computed at the operating point Equation(20.5)is known the delta rule, and it will train the perceptron[ Haykin, 1994]. Note that throughout the derivation we skipped the pattern index p for simplicity, but this rule is applied for each input pattern. However, the delta rule cannot train MLPs since it requires the knowledge of the error signal at each PE The principle of the ordered derivatives can be extended to multilayer networks, provided we organize the computations in flows of activation and error propagation. The principle is very easy to understand, but a little complex to formulate in equation form [ Haykin, 1994] Suppose that we want to adapt the weights connected to a hidden layer PE, the ith PE(Fig. 20.7). One can decompose the computation of the partial derivative of the cost with respect to the weight wi as dj d dy d net (20.6) i.e., the partial derivative with respect to the weight is the product of the partial derivative with respect to the PE state- part 1 in Eq (20.6)-times the partial derivative of the local activation to the weights- part 2 in Eq (20.6). This last quantity is exactly the same as for the nonlinear PE ((net )x), so the big issue is the omputation of o. For an output PE, a becomes the injected error e in Eq (20.4). For the hidden ith pe dI is evaluated by summing all the errors that reach the PE from the top layer through the topology when the injected errors Ek are clamped at the top layer, or in an equation aJ ∑ E,f'(net)y (20.7) Substituting back in Eq (20.6)we finally get aL=-x,/"(net,)>,/"(net- )wH (20.8) c 2000 by CRC Press LLC© 2000 by CRC Press LLC (20.5) where f ¢(net) is the derivative of the nonlinearity computed at the operating point. Equation (20.5) is known as the delta rule, and it will train the perceptron [Haykin, 1994]. Note that throughout the derivation we skipped the pattern index p for simplicity, but this rule is applied for each input pattern. However, the delta rule cannot train MLPs since it requires the knowledge of the error signal at each PE. The principle of the ordered derivatives can be extended to multilayer networks, provided we organize the computations in flows of activation and error propagation. The principle is very easy to understand, but a little complex to formulate in equation form [Haykin, 1994]. Suppose that we want to adapt the weights connected to a hidden layer PE, the ith PE (Fig. 20.7). One can decompose the computation of the partial derivative of the cost with respect to the weight wij as (20.6) i.e., the partial derivative with respect to the weight is the product of the partial derivative with respect to the PE state — part 1 in Eq. (20.6) — times the partial derivative of the local activation to the weights — part 2 in Eq. (20.6). This last quantity is exactly the same as for the nonlinear PE (f ¢(neti )xj ), so the big issue is the computation of . For an output PE, becomes the injected error e in Eq. (20.4). For the hidden ith PE is evaluated by summing all the errors that reach the PE from the top layer through the topology when the injected errors ek are clamped at the top layer, or in an equation (20.7) Substituting back in Eq. (20.6) we finally get (20.8) FIGURE 20.7 How to adapt the weights connected to ith PE. ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ e J w J y y w net d y f x f x i i i i = = -( - ) ¢( ) = - ¢( ) net net net ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ J w J y y ij i w i i ij i = net net 1 2 ¶ ¶ J y ¶ ¶ J y ¶ ¶ J y ¶ ¶ ¶ ¶ ¶ ¶ ¶ ¶ e J y J y y y f w i k k k k i k k k ki k = Ê Ë Á ˆ ¯   ˜ = ¢( ) net net net ¶ ¶ e J w x f f w ij j i k k ki k = - ¢( ) ¢( ) Ê Ë Á ˆ ¯ net  ne ˜ t 1 2
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有