Skip to content

Review: SyntheticSlateBanditDataset #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 8, 2021
Merged

Conversation

aiueola
Copy link
Contributor

@aiueola aiueola commented May 5, 2021

important

  1. It may be better to check the calculation of pscore_item_position_i_l:
# from
if sampled_action_index not in action_list:
# to
if sampled_action != action_list[position_]:

as pscore is calculated as :math: \\pi_k(a_k \\mid x) = \\sum_{a} \\pi(a \\mid x) \\mathbbm{1}{a(k)=a_k}.
https://github.com/aiueola/zr-obp/blob/57cae57736f00a8f4887c4d3ac41a6544b880b34/obp/dataset/synthetic_slate.py#L376

  1. Also, I revised pscore_item_position_i_l when using uniform random policy (self.behavior_policy_function is None):
# from
pscore_item_position_i_l = self.len_list / self.n_unique_action
#to
pscore_item_position_i_l = 1 / self.n_unique_action

https://github.com/aiueola/zr-obp/blob/57cae57736f00a8f4887c4d3ac41a6544b880b34/obp/dataset/synthetic_slate.py#L368

refactor

  1. Removed return_exact_uniform_pscore_item_position parameter from obtain_batch_bandit_feedback() and sample_action_and_obtain_pscore().
# from
if return_exact_uniform_pscore_item_position: 
# to
if self.behavior_policy_function is None:  # uniform random

It might be more intuitive that we use ground-truth pscore (not approximated one) as default for uniform random policy.
https://github.com/aiueola/zr-obp/blob/57cae57736f00a8f4887c4d3ac41a6544b880b34/obp/dataset/synthetic_slate.py#L447

  1. Renamed from self.behavior_policy to self.uniform_behavior_policy as it always indicates uniform behavior policy.
# from
self.behavior_policy = np.ones(self.n_unique_action) / self.n_unique_action
# to
self.uniform_behavior_policy = np.ones(self.n_unique_action) / self.n_unique_action

https://github.com/aiueola/zr-obp/blob/57cae57736f00a8f4887c4d3ac41a6544b880b34/obp/dataset/synthetic_slate.py#L245

  1. Renamed from action_interaction_matrix to action_interaction_weight_matrix.

https://github.com/aiueola/zr-obp/blob/35f893fd660f627b0bb470d24eaf970aeb541109/obp/dataset/synthetic_slate.py#L49

tests

Fix pscore_item_position target of uniform random policy:

# from 
pscore_item_position = len_list / n_unique_action
#to
pscore_item_position = 1 / n_unique_action

https://github.com/aiueola/zr-obp/blob/d46cd4c9a626020a14c73014a21b8749b9dd88ad/tests/dataset/test_synthetic_slate.py#L229

others

Minor fix on typos and docstrings.

@usaito usaito merged commit 57b4bc7 into st-tech:master May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants