《Artificial Intelligence：A Modern Approach》教学资源（讲义，英文版）chapter20b-6pp

团购合买资源类别：文库，文档格式：PDF，文档页数：4，文件大小：305.83KB

Network structures Feed-forward networks: – single-layer perceptrons – multi-layer perceptrons Feed-forward networks implement functions, have no internal state Recurrent networks: – Hopfield networks have symmetric weights (Wi,j = Wj,i) g(x) = sign(x), ai = ± 1; holographic associative memory – Boltzmann machines use stochastic activation functions, ≈ MCMC in Bayes nets – recurrent neural nets have directed cycles with delays ⇒ have internal state (like flip-flops), can oscillate etc. Chapter 20, Section 5 7 Feed-forward example W1,3 W1,4 W2,3 W2,4 W3,5 W4,5 1 2 3 4 5 Feed-forward network = a parameterized family of nonlinear functions: a5 = g(W3,5 · a3 + W4,5 · a4) = g(W3,5 · g(W1,3 · a1 + W2,3 · a2) + W4,5 · g(W1,4 · a1 + W2,4 · a2)) Adjusting weights changes the function: do learning this way! Chapter 20, Section 5 8 Single-layer perceptrons Input Units Units Output Wj,i -4 -2 0 2 x1 4 -4 -2 0 2 4 x2 0 0.2 0.4 0.6 0.8 1 Perceptron output Output units all operate separately—no shared weights Adjusting weights moves the location, orientation, and steepness of cliff Chapter 20, Section 5 9 Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space: ΣjWjxj > 0 or W · x > 0 (a) x1 and x2 1 0 0 1 x1 x2 (b) x1 or x2 0 1 1 0 x1 x2 (c) x1 xor x2 ? 0 1 1 0 x1 x2 Minsky & Papert (1969) pricked the neural network balloon Chapter 20, Section 5 10 Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2 (y − hW(x))2 , Perform optimization search by gradient descent: ∂E ∂Wj = Err × ∂Err ∂Wj = Err × ∂ ∂Wj y − g(Σ n j = 0Wjxj) = −Err × g 0 (in) × xj Simple weight update rule: Wj ← Wj + α × Err × g 0 (in) × xj E.g., +ve error ⇒ increase network output ⇒ increase weights on +ve inputs, decrease on -ve inputs Chapter 20, Section 5 11 Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set 0.4 0.5 0.6 0.7 0.8 0.9 1 Proportion correct on test set 0 10 20 30 40 50 60 70 80 90 100 Training set size - MAJORITY on 11 inputs Perceptron Decision tree 0.4 0.5 0.6 0.7 0.8 0.9 1 Proportion correct on test set 0 10 20 30 40 50 60 70 80 90 100 Training set size - RESTAURANT data Perceptron Decision tree Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it Chapter 20, Section 5 12

Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Input units Hidden units Output units ai Wj,i aj Wk,j ak Chapter 20, Section 5 13 Expressiveness of MLPs All continuous functions w/ 2 layers, all functions w/ 3 layers -4 -2 0 2 x1 4 -4 -2 0 2 4 x2 0 0.2 0.4 0.6 0.8 1 hW (x1 , x2 ) -4 -2 0 2 x1 4 -4 -2 0 2 4 x2 0 0.2 0.4 0.6 0.8 1 hW (x1 , x2 ) Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (cf DTL proof) Chapter 20, Section 5 14 Back-propagation learning Output layer: same as for single-layer perceptron, Wj,i ← Wj,i + α × aj × ∆i where ∆i = Err i × g 0 (ini) Hidden layer: back-propagate the error from the output layer: ∆j = g 0 (inj) X i Wj,i∆i . Update rule for weights in hidden layer: Wk,j ← Wk,j + α × ak × ∆j . (Most neuroscientists deny that back-propagation occurs in the brain) Chapter 20, Section 5 15 Back-propagation derivation The squared error on a single example is defined as E = 1 2 X i (yi − ai) 2 , where the sum is over the nodes in the output layer. ∂E ∂Wj,i = −(yi − ai) ∂ai ∂Wj,i = −(yi − ai) ∂g(ini) ∂Wj,i = −(yi − ai)g 0 (ini) ∂ini ∂Wj,i = −(yi − ai)g 0 (ini) ∂ ∂Wj,i   X j Wj,iaj   = −(yi − ai)g 0 (ini)aj = −aj∆i Chapter 20, Section 5 16 Back-propagation derivation contd. ∂E ∂Wk,j = − X i (yi − ai) ∂ai ∂Wk,j = − X i (yi − ai) ∂g(ini) ∂Wk,j = − X i (yi − ai)g 0 (ini) ∂ini ∂Wk,j = − X i ∆i ∂ ∂Wk,j   X j Wj,iaj   = − X i ∆iWj,i ∂aj ∂Wk,j = − X i ∆iWj,i ∂g(inj ) ∂Wk,j = − X i ∆iWj,ig 0 (inj) ∂inj ∂Wk,j = − X i ∆iWj,ig 0 (inj) ∂ ∂Wk,j   X k Wk,jak   = − X i ∆iWj,ig 0 (inj)ak = −ak∆j Chapter 20, Section 5 17 Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply Training curve for 100 restaurant examples: finds exact fit 0 2 4 6 8 10 12 14 0 50 100 150 200 250 300 350 400 Total error on training set Number of epochs Typical problems: slow convergence, local minima Chapter 20, Section 5 18

点击进入文档下载页（PDF格式）

已到末页，全文结束

点击下载（PDF格式）

浏览记录