Sample x1,x2,...,xm}from Paata(x) m m o=arg mgx ฮ P6=arg mxlog(xy m arg max logP(xargmx ExalogPo()] ๅฐ (not related to arg max Paata(x)logPg(x)dx-Paata(x)logPaata(x)dx Pe(x) Difference between Pdata and Pe arg max Pata(x)P dx =arg min KL(PaatallPe) Maximum Likelihood Minimize KL DivergenceSample ๐ฅ 1 , ๐ฅ 2 , โฆ , ๐ฅ ๐ from ๐๐๐๐ก๐ ๐ฅ ๐ โ = ๐๐๐ max ๐ เท ๐=1 ๐ ๐๐ ๐ฅ ๐ = ๐๐๐ max ๐ ๐๐๐เท ๐=1 ๐ ๐๐ ๐ฅ ๐ = ๐๐๐ max ๐ เท ๐=1 ๐ ๐๐๐๐๐ ๐ฅ ๐ โ ๐๐๐ max ๐ ๐ธ๐ฅ~๐๐๐๐ก๐[๐๐๐๐๐ ๐ฅ ] = ๐๐๐ max ๐ เถฑ ๐ฅ ๐๐๐๐ก๐ ๐ฅ ๐๐๐๐๐ ๐ฅ ๐๐ฅ โ เถฑ ๐ฅ ๐๐๐๐ก๐ ๐ฅ ๐๐๐๐๐๐๐ก๐ ๐ฅ ๐๐ฅ = ๐๐๐ min ๐ ๐พ๐ฟ ๐๐๐๐ก๐||๐๐ = ๐๐๐ max ๐ เถฑ ๐ฅ ๐๐๐๐ก๐ ๐ฅ ๐๐๐ ๐๐ ๐ฅ ๐๐๐๐ก๐ ๐ฅ ๐๐ฅ Maximum Likelihood = Minimize KL Divergence (not related to ๐) Difference between ๐๐๐๐ก๐ and ๐๐