Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
new feature
Implemented
calc_ground_truth_policy_value()
function inSyntheticSlateBanditDataset
.Calculation procedure is given as follows.
evaluation_policy_logit
(n_rounds, n_unique_action) andcontext
(n_rounds, dim_context).Note that we use expected reward instead of sampled reward to replicate click models.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L655
refactor
calc_item_position_pscore()
to_calc_pscore_given_action_list()
, as it does not directly calculate marginal pscore. It rather calculates combinatorial pscore given action list inside.https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L319
behavior_policy_logit_i_
arg in_calc_pscore_given_action_list()
topolicy_logit_i_
as we use this function when callingcalc_ground_truth_policy_value()
for evaluation policy.https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L320
tau
inobtain_batch_bandit_feedback()
.https://github.com/aiueola/zr-obp/blob/d52501696038def2189ae1a61abb106fdd8db688/obp/dataset/synthetic_slate.py#L474
standard_exponential
andcascade_exponential
reward structure tostandard_decay
andcascade_decay
, respectively. In addition, introducedecay_function
, which is eitherexponential
orinverse
.https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L50
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L1241
tests
others