正在加载图片...
Introduction Machine Learning for Big Data For big data applications,first-order methods have become much more popular than other higher-order methods for learning (optimization). Gradient descent methods are the most representative first-order methods. (Deterministic)gradient descent(GD): w+1←L-nm片∑ fi(wt), where t is the iteration number. Linear convergence rate:O(p) Iteration cost is O(n) Stochastic gradient descent(SGD):In the tth iteration,randomly choosing an example it∈{1,2,,n},then update wt+1←-wt-EVfi,(wt) Iteration cost is O(1) The convergence rate is sublinear.O(1/t). Wu-Jun Li (http://cs.nju.edu.cn/lvj) PDSL CS,NJU 4/36Introduction Machine Learning for Big Data For big data applications, first-order methods have become much more popular than other higher-order methods for learning (optimization). Gradient descent methods are the most representative first-order methods. (Deterministic) gradient descent (GD): wt+1 ← wt − ηt [ 1 n Xn i=1 ∇fi(wt)], where t is the iteration number. Linear convergence rate: O(ρ t ) Iteration cost is O(n) Stochastic gradient descent (SGD): In the t th iteration, randomly choosing an example it ∈ {1, 2, ..., n}, then update wt+1 ← wt − ηt∇fit (wt) Iteration cost is O(1) The convergence rate is sublinear: O(1/t) Wu-Jun Li (http://cs.nju.edu.cn/lwj) PDSL CS, NJU 4 / 36
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有