Skip to content

Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 19, 2021

Conversation

aiueola
Copy link
Contributor

@aiueola aiueola commented May 18, 2021

new feature

Implemented calc_ground_truth_policy_value() function in SyntheticSlateBanditDataset.
Calculation procedure is given as follows.

  1. receive evaluation_policy_logit (n_rounds, n_unique_action) and context (n_rounds, dim_context).
  2. enumerate combinatorial slate actions.
  3. for each slate action list, calculate (combinatorial) pscore and expected reward.
  4. sum up expected reward weighted by pscore.

Note that we use expected reward instead of sampled reward to replicate click models.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L655

refactor

tests

  • Fixed and added corresponding tests.

others

  • Minor fix on typos and docstrings.

@usaito
Copy link
Contributor

usaito commented May 18, 2021

@aiueola

@usaito
Copy link
Contributor

usaito commented May 18, 2021

@aiueola

@usaito usaito merged commit 117aa9a into st-tech:master May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants