正在加载图片...
Stochastic Multi-Armed Bandit (MAB) MAB:A player is facing K arms.At each time t,the player pulls one arm a∈[K]and then receives a reward rt(a)∈[O,l: Arm1 1(1) r2(1)】 0.6 r4(1) T5(1) Arm2 1 r2(2) r3(2) 0.2 r5(2) Arm3 r1(3) 0.7 r3(3) r4(3) 0.3 ●Stochastic: Each arm aEK]has an unknown distribution Da with mean u(a), such that rewards ri(a),r2(a),...,rT(a)are i.i.d samples from Da. Advanced Optimization(Fall 2023) Lecture 12.Stochastic Bandits 3Advanced Optimization (Fall 2023) Lecture 12. Stochastic Bandits 3 Stochastic Multi-Armed Bandit (MAB) Arm 1 Arm 2 Arm 3 0.6 0.7 0.3 1 0.2
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有