香港科技大摹 UNIVERSITY OF SCIENCE AND TECHNOLOGY Introduction to Deep Learning Professor Qiang Yang
Introduction to Deep Learning Professor Qiang Yang
Outline Introduction Supervised Learning Convolutional Neural Network Sequence Modelling RNN and its extensions Unsupervised Learning Autoencoder Stacked Denoising Autoencoder Reinforcement Learning Deep reinforcement Learning Two applications: Playing Atari alphaGo
Outline • Introduction • Supervised Learning – Convolutional Neural Network – Sequence Modelling: RNN and its extensions • Unsupervised Learning – Autoencoder – Stacked DenoisingAutoencoder • Reinforcement Learning – Deep Reinforcement Learning – Two applications: Playing Atari & AlphaGo
Introduction Traditional pattern recognition models use hand crafted features and relatively simple trainable hand-crafted Simple feature Trainable outpu extractor Classifier This approach has the following limitations It is very tedious and costly to develop hand crafted features The hand-crafted features are usually highly dependents on one application, and cannot be transferred easily to other applications
Introduction • Traditional pattern recognition models use handcrafted features and relatively simple trainable classifier. • This approach has the following limitations: – It is very tedious and costly to develop handcrafted features – The hand-crafted features are usually highly dependents on one application, and cannot be transferred easily to other applications hand-crafted feature extractor “Simple” Trainable Classifier output
Deep Learning Deep learning(a k a. representation learning) seeks to learn rich hierarchical representations (i.e. features) automatically through multiple stage of feature learning process LoW-level Mid-level High-level Trainable features features features classifier output Feature visualization of convolutional net trained on ImageNet Zeiler and fergus, 2013
Deep Learning • Deep learning (a.k.a. representation learning) seeks to learn rich hierarchical representations (i.e. features) automatically through multiple stage of feature learning process. Low-level features output Mid-level features High-level features Trainable classifier Feature visualization of convolutional net trained on ImageNet (Zeiler and Fergus, 2013)
Learning hierarchical Representations OW. Mid level let High-level Trainable features classifier output features features Increasing level of abstraction Hierarchy of representations with increasing level of abstraction. Each stage is a kind of trainable nonlinear feature transform ° mage recognition Pⅸel→edge→ texton→ motif→part→ object Text Character→Word→ word group→ clause→ sentence→ story
Learning Hierarchical Representations • Hierarchy of representations with increasing level of abstraction. Each stage is a kind of trainable nonlinear feature transform • Image recognition – Pixel → edge → texton → motif → part → object • Text – Character → word → word group → clause → sentence → story Lowlevel features output Midlevel features High-level features Trainable classifier Increasing level of abstraction
The mammalian visual Cortex is Hierarchical It is good to be inspired relationships) WHAT?(Form, Color by nature, but not too O lT much MSTd MST We need to understand which details are Fatoc-deminated) important, and which details are merely the result of evolution blob Each module in Deep Retina LGN Learning transforms its M A orienta。→ Dirac Pattern :3 3d PL'sr eve input representation into Spala Bo D scan racE y c wavelength⑤ n- Cartesian t Temporal a higher-level one, in a O NGn-CatesiEn Faces high:'low way similar to human (van Essen and Gallant, 1994) cortex
The Mammalian Visual Cortex is Hierarchical • It is good to be inspired by nature, but not too much. • We need to understand which details are important, and which details are merely the result of evolution. • Each module in Deep Learning transforms its input representation into a higher-level one, in a way similar to human cortex. (van Essen and Gallant, 1994)
Supervised Learning Convolutional neural network ° Sequence Modelling Why do we need rnn? What are RNns? RNN EXtensions What can rnns can do?
Supervised Learning • Convolutional Neural Network • Sequence Modelling – Why do we need RNN? – What are RNNs? – RNN Extensions – What can RNNs can do?
Convolutional neural Network Input can have very high dimension Using a fully-connected neural network would need a large amount of parameters Inspired by the neurophysiological experiments conducted by [Hubel Wiesel 1962], CNNs are a special type of neural network whose hidden units are only connected to local receptive field. The number of parameters needed by CNNs is much smaller Example: 200X200 image so a)fully connected: 40,000 hidden units =>1.6 billion parameters b)CNn: 5X5 kernel, 100 feature maps => 2, 500 parameters
Convolutional Neural Network • Input can have very high dimension. Using a fully-connected neural network would need a large amount of parameters. • Inspired by the neurophysiological experiments conducted by [Hubel & Wiesel 1962], CNNs are a special type of neural network whose hidden units are only connected to local receptive field. The number of parameters needed by CNNs is much smaller. Example: 200x200 image a) fully connected: 40,000 hidden units => 1.6 billion parameters b) CNN: 5x5 kernel, 100 feature maps => 2,500 parameters
Three Stages of a Convolutional Layer Complex layer terminology Simple layer terminology Next layer Next layer 1. Convolution stage 2. Nonlinearity: a Convolutional Laver nonlinear transform Pooling stage Pooling layer such as rectified linear or tanh Nonlinearity Detector layer: Nonlinearity e. g. rectified linea e.g, rectified linear 3. Pooling: output a Convolution stage: Convolution layer: summary statistics Alline transform Affine transform of local input, such as max pooling and average pooling
Three Stages of a Convolutional Layer 1. Convolution stage 2. Nonlinearity: a nonlinear transform such as rectified linear or tanh 3. Pooling: output a summary statistics of local input, such as max pooling and average pooling
Convolution Operation in CNN Input: an image(2-D array)X Convolution kernel/operator(2-D array of learnable parameters):W Feature map(2-D array of processed data):s Convolution operation in 2-D domains M N 5=(x*w=∑∑x+m,+川m川 m=-Mn=-N Kernel Output bw.cx cw. dx eyfz+fy+gz·gy+hz ew. fx + gx y+1 ky . h
Convolution Operation in CNN • Input: an image (2-D array) x • Convolution kernel/operator(2-D array of learnable parameters): w • Feature map (2-D array of processed data): s • Convolution operation in 2-D domains: