Vojtěch Tóth

Home

❯

Vault

❯

Symbolic machine learning

❯

Reinforcement learning

❯

Bandits

❯

Upper Confidence Bound Action Selection

Upper-Confidence-Bound Action Selection

Feb 20, 20261 min read

Action-Value Methods

At​=aarg max​[Qt​(a)+cNt​(a)lnt​​]

Graph View

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community