Skip to content

Implement SyntheticBanditDataWithActionEmbeds and Marginalized IPS #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 3, 2022

Conversation

usaito
Copy link
Contributor

@usaito usaito commented Feb 13, 2022

Overview

  • Implement obp.ope.MarginalziedInverseProbabilityWeighting and obp.ope.SelfNormalizedMarginalizedInverseProbabilityWeighting. The first one is the main proposal of the reference paper (called MIPS in that paper), and the latter is a trivial extension.
  • Implement obp.dataset.SyntheticBanditDatasetWithActionEmbeds, which generates synthetic bandit data with action embeddings consisting of some discrete random variables (e.g., category information of movies). The data generating process of this class follows the synthetic experiment conducted in the reference paper.

Reference

Yuta Saito and Thorsten Joachims.
"Off-Policy Evaluation for Large Action Spaces via Embeddings.", 2022.
https://arxiv.org/abs/2202.06317

@usaito usaito changed the title [WIP] Implement Marginalized IPS Implement Marginalized IPS Feb 15, 2022
@usaito usaito changed the title Implement Marginalized IPS Implement SyntheticBanditDataWithActionEmbeds and Marginalized IPS Feb 15, 2022
Copy link
Contributor

@fullflu fullflu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the late review...!

My 1st review mainly focuses on the estimators.

In the 2nd review, I will check the testing details.

else:
exclude_d_idx = np.where(current_feat != exclude_d, True, False)
return theta_j[-1]
exclude_d_idx = np.where(current_feat != exclude_d, True, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ask]

This line seems to exclude a feature that has the smallest CDF.
Is this a reasonable algorithm?
(Intuitively, the smallest lower bound feature has some reason to be excluded.)

Copy link
Contributor Author

@usaito usaito Mar 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intended. theta_j[-1] is the last element of theta_j (or equivalently the last element of theta_list), this is not necessarily the element having the smallest CDF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theta_j[-1] is the last element of theta_j (or equivalently the last element of theta_list), this is not necessarily the element having the smallest CDF.

That is OK!
In my understanding, the feature extraction is implemented as the following steps:

  1. The last element of idx_list indicate a feature that has the smallest CDF among the current_feature (idx_list = np.argsort(cnf_list_)[::-1])
  2. The last element of idx_list is excluded if not returned (idx_without_d = np.where(current_feat != excluded_dim, True, False))
  3. That is why a feature that has the smallest CDF seems to be excluded.

Are the steps correct?
And I cannot understand why the last element of idx_list is excluded in the step3.

Copy link
Contributor Author

@usaito usaito Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fullflu Thanks for clarifying your point!

The last element of idx_list indicate a feature that has the smallest CDF among the current_feature

Let me first clarify that the last element of idx_list is the feature that leads to the smallest CDF when it is excluded from OPE. And if all features pass the criterion (np.abs(theta_j - theta_i) <= cnf_i + C * cnf_j).all(), we know that we can exclude any feature from current_feature and proceed to the next round. Then, I think we should exclude the feature that leads to the smallest CDF, as it has already satisfied the criterion. (if this is not true, which you think is the appropriate feature to exclude when proceed to the next iteration?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your thorough explanation!
I read the algorithm again, and I find that the process has no problem!

(I think that excluding the feature that leads to the smallest CDF or the largest CDF might be a greedy algorithm, and we cannot decide which one is better.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fullflu Thanks for checking again!

the smallest CDF or the largest CDF might be a greedy algorithm, and we cannot decide which one is better.)

I agree, but I thought the feature that leads to the smallest CDF should be removed because passing (np.abs(theta_j - theta_i) <= cnf_i + C * cnf_j).all() becomes the most difficult with the smallest CDF, and if the feature with the smallest CDF pass the criterion, then we can be aggressive and simply remove that feature.

@usaito
Copy link
Contributor Author

usaito commented Mar 27, 2022

@fullflu Thanks for the detailed feedback!

@usaito usaito merged commit 628c8c8 into master Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants