Vojtěch Tóth

❯

❯

Symbolic machine learning

❯

Reinforcement learning

❯

❯

Upper Confidence Bound Action Selection

Upper-Confidence-Bound Action Selection

Feb 20, 20261 min read

Action-Value Methods

A_{t} = a arg max [Q_{t} (a) + c \frac{ln t}{N _{t} ( a )}]

Graph View

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community