Eevh Gumey An introduction to neural networks An introduction to neural networks Kevin Gurney University of Sheffield OUTL R GE London and New York ©Kevin Gurney I997 This book is copyright under the Berne Convention. No reproduction without permission. All rights reserved. First published in 1997 by UCL Press UCL Press Limited 11 New Fetter Lane London EC4P 4EE 2
An introduction to neural networks An introduction to neural networks Kevin Gurney University of Sheffield London and New York © Kevin Gurney 1997 This book is copyright under the Berne Convention. No reproduction without permission. All rights reserved. First published in 1997 by UCL Press UCL Press Limited 11 New Fetter Lane London EC4P 4EE 2
Contents Preface 1 Neural networks-an overview 1.1 What are neural networks? 1.2 Why study neural networks? 1.3 Summary 14 Notes 2 Real and artificial neurons 2.1 Real neurons:a review 2.2 Artificial neurons:the TLU 2.3 Resilience to noise and hardware failure 2.4 Non-binary signal communication 2.5 Introducing time 2.6 Summary 2.7 Notes 3 TLUs.linear separability and vectors 3.1 Geometric interpretation of TLU action 3.2 Vectors 3.3 TLUs and linear separability revisited 3.4 Summary 3.5 Notes 4
Contents Preface 1 Neural networks—an overview 1.1 What are neural networks? 1.2 Why study neural networks? 1.3 Summary 1.4 Notes 2 Real and artificial neurons 2.1 Real neurons: a review 2.2 Artificial neurons: the TLU 2.3 Resilience to noise and hardware failure 2.4 Non-binary signal communication 2.5 Introducing time 2.6 Summary 2.7 Notes 3 TLUs, linear separability and vectors 3.1 Geometric interpretation of TLU action 3.2 Vectors 3.3 TLUs and linear separability revisited 3.4 Summary 3.5 Notes 4
4 Training TLUs:the perceptron rule 4.1 Training networks 4.2 Training the threshold as a weight 4.3 Adjusting the weight vector 4.4 The perceptron 4.5 Multiple nodes and layers 4.6 Some practical matters 4.7 Summary 4.8 Notes 5 The delta rule 5.1 Finding the minimum of a function:gradient descent 5.2 Gradient descent on an error 5.3 The delta rule 5.4 Watching the delta rule at work 5.5 Summary 6 Multilayer nets and backpropagation 6.1 Training rules for multilayer nets 6.2 The backpropagation al gorithm 6.3 Local versus global minima 6.4 The stopping criterion 6.5 Speeding up learning:the momentum term 6.6 More complex nets 5
4 Training TLUs: the perceptron rule 4.1 Training networks 4.2 Training the threshold as a weight 4.3 Adjusting the weight vector 4.4 The perceptron 4.5 Multiple nodes and layers 4.6 Some practical matters 4.7 Summary 4.8 Notes 5 The delta rule 5.1 Finding the minimum of a function: gradient descent 5.2 Gradient descent on an error 5.3 The delta rule 5.4 Watching the delta rule at work 5.5 Summary 6 Multilayer nets and backpropagation 6.1 Training rules for multilayer nets 6.2 The backpropagation algorithm 6.3 Local versus global minima 6.4 The stopping criterion 6.5 Speeding up learning: the momentum term 6.6 More complex nets 5
6.7 The action of well-trained nets 6.8 Taking stock 6.9 Generalization and overtraining 6.10 Fostering generalization 6.11 Applications 6.12 Final remarks 6.13 Summary 6.14 Notes 7 Associative memories:the Hopfield net 7.1 The nature of associative memory 7.2 Neural networks and associative memory 7.3 a physical analogy with memory 7.4 The Hopfield net 7.5 Finding the weights 7.6 Storage capacity 7.7 The analogue Hopfield model 7.8 Combinatorial optimization 7.9 Feedforward and recurrent associative nets 7.10 Summary 7.11 Notes 8 Self-organization 6
6.7 The action of well-trained nets 6.8 Taking stock 6.9 Generalization and overtraining 6.10 Fostering generalization 6.11 Applications 6.12 Final remarks 6.13 Summary 6.14 Notes 7 Associative memories: the Hopfield net 7.1 The nature of associative memory 7.2 Neural networks and associative memory 7.3 A physical analogy with memory 7.4 The Hopfield net 7.5 Finding the weights 7.6 Storage capacity 7.7 The analogue Hopfield model 7.8 Combinatorial optimization 7.9 Feedforward and recurrent associative nets 7.10 Summary 7.11 Notes 8 Self-organization 6
8.1 Competitive dynamics 8.2 Competitive learning 8.3 Kohonen's self-organizing feature maps 8.4 Principal component analysis 8.5 Further remarks 8.6 Summary 8.7 Notes 9 Adaptive resonance theory:ART 9.1 ART's objectives 9.2 A hierarchical description of networks 9.3 ARTI 9.4 The ART family 9.5 Applications 9.6 Further remarks 9.7 Summary 9.8 Notes 10 Nodes,nets and algorithms:further alternatives 10.1 Synapses revisited 10.2 Sigma-pi units 10.3 Digital neural networks 10.4 Radial basis functions 10.5 Learning by exploring the environment 7
8.1 Competitive dynamics 8.2 Competitive learning 8.3 Kohonen's self-organizing feature maps 8.4 Principal component analysis 8.5 Further remarks 8.6 Summary 8.7 Notes 9 Adaptive resonance theory: ART 9.1 ART's objectives 9.2 A hierarchical description of networks 9.3 ART1 9.4 The ART family 9.5 Applications 9.6 Further remarks 9.7 Summary 9.8 Notes 10 Nodes, nets and algorithms: further alternatives 10.1 Synapses revisited 10.2 Sigma-pi units 10.3 Digital neural networks 10.4 Radial basis functions 10.5 Learning by exploring the environment 7
10.6 Summary 10.7 Notes 11 Taxonomies.contexts and hierarchies 11.1 Classifying neural net structures 11.2 Networks and the computational hierarchy 11.3 Networks and statistical analysis 11.4 Neural networks and intelligent systems:symbols versus neurons 11.5 A brief history of neural nets 11.6 Summary 11.7 Notes A The cosine function References Index 8
10.6 Summary 10.7 Notes 11 Taxonomies, contexts and hierarchies 11.1 Classifying neural net structures 11.2 Networks and the computational hierarchy 11.3 Networks and statistical analysis 11.4 Neural networks and intelligent systems: symbols versus neurons 11.5 A brief history of neural nets 11.6 Summary 11.7 Notes A The cosine function References Index 8
Preface This book grew out of a set of course notes for a neural networks module given as part of a Masters degree in "Intelligent Systems".The people on this course came from a wide variety of intellectual backgrounds (from philosophy,through psychology to computer science and engineering)and I knew that I could not count on their being able to come to grips with the largely technical and mathematical approach which is often used (and in some ways easier to do).As a result I was forced to look carefully at the basic conceptual principles at work in the subject and try to recast these using ordinary language,drawing on the use of physical metaphors or analogies,and pictorial or graphical representations.I was pleasantly surprised to find that,as a result of this process,my own understanding was considerably deepened;I had now to unravel,as it were,condensed formal descriptions and say exactly how these were related to the "physical"world of artificial neurons,signals,computational processes,etc.However,I was acutely aware that,while a litany of equations does not constitute a full description of fundamental principles,without some mathematics,a purely descriptive account runs the risk of dealing only with approximations and cannot be sharpened up to give any formulaic prescriptions.Therefore,I introduced what I believed was just sufficient mathematics to bring the basic ideas into sharp focus. To allay any residual fears that the reader might have about this,it is useful to distinguish two contexts in which the word "maths"might be used.The first refers to the use of symbols to stand for quantities and is,in this sense,merely a shorthand.For example,suppose we were to calculate the difference between a target neural output and its actual output and then multiply this difference by a constant learning rate (it is not important that the reader knows what these terms mean just now).Ift stands for the target,y the actual output,and the learning rate is denoted by a(Greek "alpha")then the output-difference is just (t-y)and the verbose description of the calculation may be reduced to a(t-y).In this example the symbols refer to numbers but it is quite possible they may refer to other mathematical quantities or objects.The two instances of this used here are vectors and function gradients.However,both these ideas are described at some length in the main body of the text and assume no prior knowledge in this respect.In each case,only enough is given for the purpose in hand;other related,technical material may have been useful but is not considered essential and it is not one of the aims of this book to double as a mathematics primer. The other way in which we commonly understand the word "maths"goes one step further and deals with the rules by which the symbols are manipulated.The only rules used in this book are those of simple arithmetic (in the above example we have a subtraction and a multiplication).Further,any manipulations (and there 9
Preface This book grew out of a set of course notes for a neural networks module given as part of a Masters degree in "Intelligent Systems". The people on this course came from a wide variety of intellectual backgrounds (from philosophy, through psychology to computer science and engineering) and I knew that I could not count on their being able to come to grips with the largely technical and mathematical approach which is often used (and in some ways easier to do). As a result I was forced to look carefully at the basic conceptual principles at work in the subject and try to recast these using ordinary language, drawing on the use of physical metaphors or analogies, and pictorial or graphical representations. I was pleasantly surprised to find that, as a result of this process, my own understanding was considerably deepened; I had now to unravel, as it were, condensed formal descriptions and say exactly how these were related to the "physical" world of artificial neurons, signals, computational processes, etc. However, I was acutely aware that, while a litany of equations does not constitute a full description of fundamental principles, without some mathematics, a purely descriptive account runs the risk of dealing only with approximations and cannot be sharpened up to give any formulaic prescriptions. Therefore, I introduced what I believed was just sufficient mathematics to bring the basic ideas into sharp focus. To allay any residual fears that the reader might have about this, it is useful to distinguish two contexts in which the word "maths" might be used. The first refers to the use of symbols to stand for quantities and is, in this sense, merely a shorthand. For example, suppose we were to calculate the difference between a target neural output and its actual output and then multiply this difference by a constant learning rate (it is not important that the reader knows what these terms mean just now). If t stands for the target, y the actual output, and the learning rate is denoted by a (Greek "alpha") then the output-difference is just (t-y) and the verbose description of the calculation may be reduced to (t-y). In this example the symbols refer to numbers but it is quite possible they may refer to other mathematical quantities or objects. The two instances of this used here are vectors and function gradients. However, both these ideas are described at some length in the main body of the text and assume no prior knowledge in this respect. In each case, only enough is given for the purpose in hand; other related, technical material may have been useful but is not considered essential and it is not one of the aims of this book to double as a mathematics primer. The other way in which we commonly understand the word "maths" goes one step further and deals with the rules by which the symbols are manipulated. The only rules used in this book are those of simple arithmetic (in the above example we have a subtraction and a multiplication). Further, any manipulations (and there 9
aren't many of them)will be performed step by step.Much of the traditional "fear of maths"stems,I believe,from the apparent difficulty in inventing the right manipulations to go from one stage to another;the reader will not,in this book,be called on to do this for him-or herself. One of the spin-offs from having become familiar with a certain amount of mathematical formalism is that it enables contact to be made with the rest of the neural network literature.Thus,in the above example,the use of the Greek letter may seem gratuitous(why not use a,the reader asks)but it turns out that learning rates are often denoted by lower case Greek letters and a is not an uncommon choice.To help in this respect,Greek symbols will always be accompanied by their name on first use. In deciding how to present the material I have started from the bottom up by describing the properties of artificial neurons(Ch 2)which are motivated by looking at the nature of their real counterparts.This emphasis on the biology is intrinsically useful from a computational neuroscience perspective and helps people from all disciplines appreciate exactly how "neural"(or not)are the networks they intend to use.Chapter 3 moves to networks and introduces the geometric perspective on network function offered by the notion of linear separability in pattern space.There are other viewpoints that might have been deemed primary (function approximation is a favourite contender)but linear separability relates directly to the function of single threshold logic units(TLUs) and enables a discussion of one of the simplest learning rules (the perceptron rule) in Chapter 4.The geometric approach also provides a natural vehicle for the introduction of vectors.The inadequacies of the perceptron rule lead to a discussion of gradient descent and the delta rule (Ch.5)culminating in a description of backpropagation(Ch.6).This introduces multilayer nets in full and is the natural point at which to discuss networks as function approximators,feature detection and generalization. This completes a large section on feedforward nets.Chapter 7 looks at Hopfield nets and introduces the idea of state-space attractors for associative memory and its accompanying energy metaphor.Chapter 8 is the first of two on self-organization and deals with simple competitive nets,Kohonen self-organizing feature maps, linear vector quantization and principal component analysis.Chapter 9 continues the theme of self-organization with a discussion ofadaptive resonance theory (ART).This is a somewhat neglected topic (especially in more introductory texts) because it is often thought to contain rather difficult material.However,a novel perspective on ART which makes use of a hierarchy of analysis is aimed at helping the reader in understanding this worthwhile area.Chapter 10 comes full circle and looks again at alternatives to the artificial neurons introduced in Chapter 2.It also briefly reviews some other feedforward network types and training algorithms so 10
aren't many of them) will be performed step by step. Much of the traditional "fear of maths" stems, I believe, from the apparent difficulty in inventing the right manipulations to go from one stage to another; the reader will not, in this book, be called on to do this for him- or herself. One of the spin-offs from having become familiar with a certain amount of mathematical formalism is that it enables contact to be made with the rest of the neural network literature. Thus, in the above example, the use of the Greek letter may seem gratuitous (why not use a, the reader asks) but it turns out that learning rates are often denoted by lower case Greek letters and a is not an uncommon choice. To help in this respect, Greek symbols will always be accompanied by their name on first use. In deciding how to present the material I have started from the bottom up by describing the properties of artificial neurons (Ch. 2) which are motivated by looking at the nature of their real counterparts. This emphasis on the biology is intrinsically useful from a computational neuroscience perspective and helps people from all disciplines appreciate exactly how "neural" (or not) are the networks they intend to use. Chapter 3 moves to networks and introduces the geometric perspective on network function offered by the notion of linear separability in pattern space. There are other viewpoints that might have been deemed primary (function approximation is a favourite contender) but linear separability relates directly to the function of single threshold logic units (TLUs) and enables a discussion of one of the simplest learning rules (the perceptron rule) i n Chapter 4. The geometric approach also provides a natural vehicle for the introduction of vectors. The inadequacies of the perceptron rule lead to a discussion of gradient descent and the delta rule (Ch. 5) culminating in a description of backpropagation (Ch. 6). This introduces multilayer nets in full and is the natural point at which to discuss networks as function approximators, feature detection and generalization. This completes a large section on feedforward nets. Chapter 7 looks at Hopfield nets and introduces the idea of state-space attractors for associative memory and its accompanying energy metaphor. Chapter 8 is the first of two on self-organization and deals with simple competitive nets, Kohonen self-organizing feature maps, linear vector quantization and principal component analysis. Chapter 9 continues the theme of self-organization with a discussion of adaptive resonance theory (ART). This is a somewhat neglected topic (especially in more introductory texts) because it is often thought to contain rather difficult material. However, a novel perspective on ART which makes use of a hierarchy of analysis is aimed at helping the reader in understanding this worthwhile area. Chapter 10 comes full circle and looks again at alternatives to the artificial neurons introduced in Chapter 2. It also briefly reviews some other feedforward network types and training algorithms so 10
that the reader does not come away with the impression that backpropagation has a monopoly here.The final chapter tries to make sense of the seemingly disparate collection of objects that populate the neural network universe by introducing a series of taxonomies for network architectures,neuron types and algorithms.It also places the study of nets in the general context of that of artificial intelligence and closes with a brief history of its research. The usual provisos about the range of material covered and introductory texts apply,it is neither possible nor desirable to be exhaustive in a work of this nature. However,most of the major network types have been dealt with and,while there are a plethora of training algorithms that might have been included (but weren't)I believe that an understanding of those presented here should give the reader a firm foundation for understanding others they may encounter elsewhere. 11
that the reader does not come away with the impression that backpropagation has a monopoly here. The final chapter tries to make sense of the seemingly disparate collection of objects that populate the neural network universe by introducing a series of taxonomies for network architectures, neuron types and algorithms. It also places the study of nets in the general context of that of artificial intelligence and closes with a brief history of its research. The usual provisos about the range of material covered and introductory texts apply; it is neither possible nor desirable to be exhaustive in a work of this nature. However, most of the major network types have been dealt with and, while there are a plethora of training algorithms that might have been included (but weren't) I believe that an understanding of those presented here should give the reader a firm foundation for understanding others they may encounter elsewhere. 11
Chapter One Neural networks-an overview The term "Neural networks"is a very evocative one.It suggests machines that are something like brains and is potentially laden with the science fiction connotations of the Frankenstein mythos.One of the main tasks of this book is to demystify neural networks and show how,while they indeed have something to do with brains,their study also makes contact with other branches of science,engineering and mathematics.The aim is to do this in as non-technical a way as possible,although some mathematical notation is essential for specifying certain rules,procedures and structures quantitatively.Nevertheless,all symbols and expressions will be explained as they arise so that,hopefully,these should not get in the way of the essentials:that is,concepts and ideas that may be described in words. This chapter is intended for orientation.We attempt to give simple descriptions of what networks are and why we might study them.In this way,we have something in mind right from the start,although the whole of this book is,of course,devoted to answering these questions in full. 12
Chapter One Neural networks—an overview The term "Neural networks" is a very evocative one. It suggests machines that are something like brains and is potentially laden with the science fiction connotations of the Frankenstein mythos. One of the main tasks of this book is to demystify neural networks and show how, while they indeed have something to do with brains, their study also makes contact with other branches of science, engineering and mathematics. The aim is to do this in as non-technical a way as possible, although some mathematical notation is essential for specifying certain rules, procedures and structures quantitatively. Nevertheless, all symbols and expressions will be explained as they arise so that, hopefully, these should not get in the way of the essentials: that is, concepts and ideas that may be described in words. This chapter is intended for orientation. We attempt to give simple descriptions of what networks are and why we might study them. In this way, we have something in mind right from the start, although the whole of this book is, of course, devoted to answering these questions in full. 12