Skip to content

Add some arguments to SyntheticBanditDataset #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 28, 2021

Conversation

usaito
Copy link
Contributor

@usaito usaito commented Aug 28, 2021

new features

  • add the following two arguments to obp.dataset.SyntheticBanditDataset
    • reward_std: Standard deviation of the reward distribution.
    • tau: A temperature hyperparameer to control the behavior policy.

these arguments are helpful to modify the synthetic data that will be generated.

  • add obp.utils.sample_action_fast function. This will make it fast to sample actions from a behavior policy when generating synthetic bandit data.

  • add corresponding tests

@usaito usaito changed the base branch from master to update-benchmark August 28, 2021 16:21
@usaito usaito merged commit 5fcf7ca into update-benchmark Aug 28, 2021
@usaito usaito deleted the custom-synthetic branch August 28, 2021 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant