Skip to content

new feature: is_factorizable option in SyntheticSlateBanditDataset #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 28, 2021

Conversation

aiueola
Copy link
Contributor

@aiueola aiueola commented May 28, 2021

new feature

  • implemented is_factorizable option:
    • if is_factorizable=True, the action at each slot is sampled independently from other items (i.e., :math:pi(a_k | x)). Using this option, the actions in a slate may be duplicated. (newly implemented one to avoid computation cost to calculate pscore)
    • if is_factorizable=False, the action at each slot is sampled dependently on the former items (i.e., :math:pi(a_k | x, a_1, \ldots, a_{k-1})). Using this option, the actions in a slate will not be duplicated. (originally implemented one)

https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L97

  • specifically, the following functions are changed to correspond is_factorizable option.

.sample_action_and_obtain_pscore()
https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L496

.obtain_pscore_given_evaluation_policy_logit()
https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L399

.calc_ground_truth_policy_value()
https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L815

.generate_evaluation_policy_pscore()
https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L884

refactor

  • changed argname from evaluation_policy_logit to evaluation_policy_logit_ as there existed both evaluation_policy_logit and evaluation_policy_logit_ for the different functions with the same meaning.

tests

  • add corresponding tests for is_factorizable option.

others

  • minor fix on typos and docstrings.

@usaito
Copy link
Contributor

usaito commented May 28, 2021

@aiueola

Thanks!

# current
factorizable_pscore = softmax(evaluation_policy_logit_[i : i + 1])[0]            
for action_list in enumerated_slate_actions:                
    if self.is_factorizable:
        pscores.append(
            np.cumprod([factorizable_pscore[a_] for a_ in action_list])[-1]                   
        )
    else:

# my proposal
if self.is_factorizable:
    factorizable_pscore = softmax(evaluation_policy_logit_[i : i + 1])[0]        
    for action_list in enumerated_slate_actions:       
        pscores.append(
            np.cumprod([factorizable_pscore[a_] for a_ in action_list])[-1]                   
        )
else:

https://github.com/aiueola/zr-obp/blob/ea995f4337b94945baac7aa6547e36bd7e6d69d5/obp/dataset/synthetic_slate.py#L814-816

@usaito usaito merged commit 6dc904c into st-tech:master May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants