Vojtěch Tóth
Search
Search
Dark mode
Light mode
Reader mode
Explorer
Home
❯
Vault
❯
Symbolic machine learning
❯
Reinforcement learning
❯
Bandits
❯
Upper Confidence Bound Action Selection
Upper-Confidence-Bound Action Selection
Feb 20, 2026
1 min read
Action-Value Methods
A
t
=
a
arg max
[
Q
t
(
a
)
+
c
N
t
(
a
)
ln
t
]
Graph View