works. Each unit in the hidden layer _中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Artificial neural networks - a tutorial

正在加载图片...

works.Each unit in the hidden layer employs aradial basis Issues function,such as a Gaussian kernel,as the activation func- There are many issues in designing feed-forward net- tion.The radial basis function (or kernel function)is cen- works,including tered at the point specified by the weight vector associated with the unit.Both the positions and the widths of these how many layers are needed for a given task. kernels must be learned from training patterns.There are how many units are needed per layer, usually many fewer kernels in the RBF network than there how will the network perform on data not included in are training patterns.Each output unit implements a lin the training set(generalization ability),and ear combination of these radial basis functions.From the how large the training set should be for"good"gen- point of view of function approximation,the hidden units eralization provide a set of functions that constitute a basis set for rep- resenting input patterns in the space spanned by the hid Although multilayer feed-forward networks using back- den units. propagation have been widely employed for classification There are a variety of learning algorithms for the RBF and function approximation,2 many design parameters network.3 The basic one employs a two-step learning strat still must be determined by trial and error.Existing theo- egy,or hybrid learning.It estimates kernel positions and retical results provide only very loose guidelines for select- kernel widths using an unsupervised clustering algorithm ing these parameters in practice. followed by a supervised least mean square (LMS)algo rithm to determine the connection weights between the KOHONEN'S SELF-ORGANIZING MAPS hidden layer and the output layer.Because the output units The self-organizing map (SOM)has the desirable prop- are linear.a noniterative algorithm can be used.After this erty of topology preservation,which captures an impor- initial solution is obtained,a supervised gradient-based tant aspect of the feature maps in the cortex of highly algorithm can be used to refine the network parameters. developed animal brains.In a topology-preserving map- This hybrid learning algorithm for training the RBF net- ping,nearby input patterns should activate nearby output work converges much faster than the back-propagation units on the map.Figure 4 shows the basic network archi algorithm for training multilayer perceptrons.However, tecture of Kohonen's SOM.It basically consists of a two- for many problems,the RBF network often involves a dimensional array of units,each connected to all n input larger number of hidden units.This implies that the run- nodes.Let w.denote the n-dimensional vector associated time(after training)speed of the RBF network is often with the unit at location (i,of the 2D array.Each neuron slower than the runtime speed ofa multilayer perceptron. computes the Euclidean distance between the input vec- The efficiencies (error versus network size)of the RBF net- tor x and the stored weight vector w.. work and the multilayer perceptron are,however,prob- This SOM is a special type of competitive learning net- lem-dependent.It has been shown that the RBF network work that defines a spatial neighborhood for each output has the same asymptotic approximation power as a mul- unit.The shape of the local neighborhood can be square, tilayer perceptron. rectangular,or circular.Initial neighborhood size is often set to one half to two thirds of the network size and shrinks over time according to a schedule (for example,an expo- 5象3%en新ng algo零itn nentially decreasing function).During competitive learn- ing,all the weight vectors associated with the winner and its neighboring units are updated(see the"SOMlearning 1.Initialize weights to small random numbers;set initial learn algorithm”sidebar) ing rate and neighborhood. Kohonen's SOM can be used for projection of multi- 2.Present a pattern x,and evaluate the network outputs. variate data,density approximation,and clustering.It has 3.Select the unit(cc)with the minimum output: been successfully applied in the areas of speech recogni- tion,image processing,robotics,and process control.?The x -min x w design parameters include the dimensionality of the neu- ron array,the number of neurons in each dimension,the 4.Update all weights according to the following learning shape of the neighborhood,the shrinking schedule of the rule: neighborhood,and the learning rate. w:(t)(t)[x(t)-wg(t)if (i)e Nc(t). ADAPTIVE RESONANCE w.(tD- THEORY MODELS w t). otherwise, Recall that the stability-plasticity dilemma is an impor- tant issue in competitive learning.How do we learn new where N(t)is the neighborhood of the unit(cc)at things (plasticity)and yet retain the stability to ensure that time t,and a(t)is the learning rate. existing knowledge is not erased or corrupted?Carpenter 5.Decrease the value of a(t)and shrink the neighborhood and Grossberg's Adaptive Resonance Theory models N(t) (ART1,ART2,and ARTMap)were developed in an attempt 6.Repeat steps 2 through 5 until the change in weight val- to overcome this dilemma.The network has a sufficient ues is less than a prespecified threshold or a maximum supply of output units,but they are not used until deemed number of iterations is reached. necessary.A unit is said to be committed (uncommitted)if it is (is not)being used.The learning algorithm updates 40 Computerworks. Each unit in the hidden layer employs a radial basis function, such as a Gaussian kernel, as the activation function. The radial basis function (or kernel function) is centered at the point specified by the weight vector associated with the unit. Both the positions and the widths of these kernels must be learned from training patterns. There are usually many fewer kernels in the RBF network than there are training patterns. Each output unit implements a linear combination of these radial basis functions. From the point of view of function approximation, the hidden units provide a set of functions that constitute a basis set for representing input patterns in the space spanned by the hidden units. There are a variety of learning algorithms for the RBF network.3 The basic one employs a two-step learning strategy, or hybrid learning. It estimates kernel positions and kernel widths using an unsupervised clustering algorithm, followed by a supervised least mean square (LMS) algorithm to determine the connection weights between the hidden layer and the output layer. Because the output units are linear, a noniterative algorithm can be used. After this initial solution is obtained, a supervised gradient-based algorithm can be used to refine the network parameters. This hybrid learning algorithm for training the RBF network converges much faster than the back-propagation algorithm for training multilayer perceptrons. However, for many problems, the RBF network often involves a larger number of hidden units. This implies that the runtime (after training) speed of the RBF network is often slower than the runtime speed of a multilayer perceptron. The efficiencies (error versus network size) of the RBF network and the multilayer perceptron are, however, problem-dependent. It has been shown that the RBF network has the same asymptotic approximation power as a multilayer perceptron. Issues works, including There are many issues in designing feed-forward nethow many layers are needed for a given task, how many units are needed per layer, how will the network perform on data not included in how large the training set should be for “good” genthe training set (generalization ability), and eralization. Although multilayer feed-forward networks using backpropagation have been widely employed for classification and function approximation,2 many design parameters still must be determined by trial and error. Existing theoretical results provide onlyvery loose guidelines for selecting these parameters in practice. KOHONEN’S SELF-ORGANIZING MAPS The self-organizing map (SOM)I6 has the desirable property of topology preservation, which captures an important aspect of the feature maps in the cortex of highly developed animal brains. In a topology-preserving mapping, nearby input patterns should activate nearby output units on the map. Figure 4 shows the basic network architecture of Kohonen’s SOM. It basically consists of a twodimensional array of units, each connected to all n input nodes. Let wg denote the n-dimensional vector associated with the unit at location (i, j) of the 2D array. Each neuron computes the Euclidean distance between the input vector x and the stored weight vector wy. This SOM is a special type of competitive learning network that defines a spatial neighborhood for each output unit. The shape of the local neighborhood can be square, rectangular, or circular. Initial neighborhood size is often set to one half to two thirds of the network size and shrinks over time according to a schedule (for example, an exponentially decreasing function). During competitive learning, all the weight vectors associated with the winner and its neighboring units are updated (see the “SOM learning algorithm” sidebar). Kohonen’s SOM can be used for projection of multivariate data, density approximation, and clustering. It has been successfully applied in the areas of speech recognition, image processing, robotics, and process control.2 The design parameters include the dimensionality of the neuron array, the number of neurons in each dimension, the shape of the neighborhood, the shrinking schedule of the neighborhood, and the learning rate. ADAPTlM RESONANCE THEORY MODELS Recall that the stability-plasticzty dilemma is an important issue in competitive learning. How do we learn new things (plasticity) and yet retain the stability to ensure that existingknowledge is not erased or corrupted? Carpenter and Grossberg’s Adaptive Resonance Theory models (ART1, ART2, and ARTMap) were developed in an attempt to overcome this dilemma.” The network has a sufficient supply of output units, but they are not used until deemed necessary. Aunit is said to be committed (uncommitted) if it is (is not) being used. The learning algorithm updates Computer

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）Artificial neural networks - a tutorial