Bug fix: update params of epsilon greedy policy #107

fullflu · 2021-06-12T17:19:19Z

reward_counts of epsilon greedy policy should not be the mean, but the count.

If reward_counts is the mean, predicted_rewards defined in select_action is not the reward_counts / action_counts (=expected reward), but the reward_counts / action_counts^2.

I also fixed a testing script of the epsilon greedy policy.

…he count

usaito · 2021-06-13T01:37:13Z

@fullflu Thanks!

fix bug of update params (reward_counts should not be the mean, but t…

9a98923

…he count

usaito merged commit ea4d9b8 into st-tech:master Jun 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fix: update params of epsilon greedy policy #107

Bug fix: update params of epsilon greedy policy #107

Uh oh!

fullflu commented Jun 12, 2021

Uh oh!

usaito commented Jun 13, 2021

Uh oh!

Uh oh!

Bug fix: update params of epsilon greedy policy #107

Bug fix: update params of epsilon greedy policy #107

Uh oh!

Conversation

fullflu commented Jun 12, 2021

Uh oh!

usaito commented Jun 13, 2021

Uh oh!

Uh oh!