Vanishing Gradient Problem Y1 V2 : YM Smaller gradients Larger gradients Learn very slow Learn very fast Almost random Already converge based on random!? Vanishing Gradient Problem Larger gradients Almost random Already converge based on random!? Learn very slow Learn very fast 1 x 2 x …… Nx…… …… …… …… …… …… …… y1 y2 yM Smaller gradients