Vanishing Gradient Problem Smaller gradients Small output 0.5 Large +△w input Intuitive way to compute the derivatives .. al △L △WVanishing Gradient Problem 1 x 2 x …… Nx…… …… …… …… …… …… …… 𝑦1 𝑦2 𝑦𝑀 …… 𝑦 ො 1 𝑦 ො 2 𝑦 ො 𝑀 𝑙 Intuitive way to compute the derivatives … 𝜕𝑙 𝜕𝑤 =? +∆𝑤 +∆𝑙 ∆𝑙 ∆𝑤 Smaller gradients Large input Small output