Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

aiueola · 2021-05-18T16:11:53Z

new feature

Implemented calc_ground_truth_policy_value() function in SyntheticSlateBanditDataset.
Calculation procedure is given as follows.

receive evaluation_policy_logit (n_rounds, n_unique_action) and context (n_rounds, dim_context).
enumerate combinatorial slate actions.
for each slate action list, calculate (combinatorial) pscore and expected reward.
sum up expected reward weighted by pscore.

Note that we use expected reward instead of sampled reward to replicate click models.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L655

refactor

Renamed from calc_item_position_pscore() to _calc_pscore_given_action_list(), as it does not directly calculate marginal pscore. It rather calculates combinatorial pscore given action list inside.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L319
Renamed from behavior_policy_logit_i_ arg in _calc_pscore_given_action_list() to policy_logit_i_ as we use this function when calling calc_ground_truth_policy_value() for evaluation policy.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L320
Removed unused arg tau in obtain_batch_bandit_feedback().
https://github.com/aiueola/zr-obp/blob/d52501696038def2189ae1a61abb106fdd8db688/obp/dataset/synthetic_slate.py#L474
Changed from standard_exponential and cascade_exponential reward structure to standard_decay and cascade_decay, respectively. In addition, introduce decay_function, which is either exponential or inverse.
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L50
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L1241

tests

Fixed and added corresponding tests.

others

Minor fix on typos and docstrings.

usaito · 2021-05-18T17:33:05Z

@aiueola

These lines can be outside of the above for i in range(n_rounds):, right?
https://github.com/aiueola/zr-obp/blob/f3dc59c19a7f4298662ae8e201e6cec743d83e8f/obp/dataset/synthetic_slate.py#L690-L693
enumerated_slate_actions has to be np.ndarray, but here it is still a list. You should put enumerated_slate_actions=np.array(enumerated_slate_actions) between lines 713 and 714.
https://github.com/aiueola/zr-obp/blob/f3dc59c19a7f4298662ae8e201e6cec743d83e8f/obp/dataset/synthetic_slate.py#L718
"distance must be 1-dimensional ndarray"
https://github.com/aiueola/zr-obp/blob/bf6f6c8ca4ec76c84ef660263870369eb568f831/obp/dataset/synthetic_slate.py#L1265

usaito · 2021-05-18T23:35:05Z

@aiueola

Regarding the above comment, I think it is good to add the base_reward_function=None case in the same test function
https://github.com/aiueola/zr-obp/blob/afd6577e64a0b6ad14a6273ce24ae80bce604236/tests/dataset/test_synthetic_slate.py#L1368
How about adding the comparisons with varying eta? For example, for both "pbm" and "cascade", a larger value of eta should lead to a lower ground-truth policy value given the same eval_policy right?
https://github.com/aiueola/zr-obp/blob/afd6577e64a0b6ad14a6273ce24ae80bce604236/tests/dataset/test_synthetic_slate.py#L1441-L1442

aiueola added 8 commits May 17, 2021 18:56

add decay_function

cb7f3ca

bug fix

61620f8

relocate function

d27f196

implement calc_ground_truth_policy_value function

4021407

add ValueError

c862ebb

bug fix

3e79e69

add tests and bug fix

bf6f6c8

fix flake8

f3dc59c

aiueola added 2 commits May 19, 2021 06:43

fix ValueError

c1166fb

black

afd6577

add tests and bug fix

01de3ad

usaito merged commit 117aa9a into st-tech:master May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

Uh oh!

aiueola commented May 18, 2021

Uh oh!

usaito commented May 18, 2021 •

edited

Loading

Uh oh!

usaito commented May 18, 2021

Uh oh!

Uh oh!

Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

Review: SyntheticSlateBanditDataset (w/ new feature: calc_ground_truth_policy_value) #98

Uh oh!

Conversation

aiueola commented May 18, 2021

new feature

refactor

tests

others

Uh oh!

usaito commented May 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

usaito commented May 18, 2021

Uh oh!

Uh oh!

usaito commented May 18, 2021 •

edited

Loading