Report copyright - 9.8.1 The Multi-Armed Bandit Framework · Regret(T) = Tp Reward(T) where Tp is the reward from the best arm if you pull it Ttimes, and Reward(T) is your actual reward after Tpulls
Please pass captcha verification before submit form