https://docs.google.com/presentation/d/1siUFXARYRpNiMeSRwgFbt7mZVjkMPhR5od09w0Z8xa U/edit#slide=id.g3532c09be1_0_382 Training stuck Small Gradient People believe training stuck because the parameters are around a critical point .. 0.20 .15 loss 1.10 3.05 0.00 100 200 300400500600 700 iteration 1.5 norm of 1.0 gradient 0.5 0.0 100 200 30 400 500Training stuck ≠ Small Gradient • People believe training stuck because the parameters are around a critical point … loss norm of gradient https://docs.google.com/presentation/d/1siUFXARYRpNiMeSRwgFbt7mZVjkMPhR5od09w0Z8xa U/edit#slide=id.g3532c09be1_0_382 2