Add some arguments to SyntheticBanditDataset #123

usaito · 2021-08-28T16:20:11Z

new features

add the following two arguments to obp.dataset.SyntheticBanditDataset
- reward_std: Standard deviation of the reward distribution.
- tau: A temperature hyperparameer to control the behavior policy.

these arguments are helpful to modify the synthetic data that will be generated.

add obp.utils.sample_action_fast function. This will make it fast to sample actions from a behavior policy when generating synthetic bandit data.
add corresponding tests

usaito added 3 commits August 28, 2021 12:17

add some arguments

962a92f

add sample_action_fact func

501b0c9

add some tests about SyntheticBanditDataset

d15e992

usaito changed the base branch from master to update-benchmark August 28, 2021 16:21

fix a bug and typo in tests

d4d4684

usaito merged commit 5fcf7ca into update-benchmark Aug 28, 2021

usaito deleted the custom-synthetic branch August 28, 2021 19:38

usaito mentioned this pull request Sep 3, 2021

Update benchmark experiments #120

Merged