-
Notifications
You must be signed in to change notification settings - Fork 92
Implement SyntheticBanditDataWithActionEmbeds and Marginalized IPS #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nditDatasetWithActionEmbeds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the late review...!
My 1st review mainly focuses on the estimators.
In the 2nd review, I will check the testing details.
obp/ope/estimators_embed.py
Outdated
else: | ||
exclude_d_idx = np.where(current_feat != exclude_d, True, False) | ||
return theta_j[-1] | ||
exclude_d_idx = np.where(current_feat != exclude_d, True, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ask]
This line seems to exclude a feature that has the smallest CDF.
Is this a reasonable algorithm?
(Intuitively, the smallest lower bound feature has some reason to be excluded.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intended. theta_j[-1]
is the last element of theta_j
(or equivalently the last element of theta_list
), this is not necessarily the element having the smallest CDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theta_j[-1] is the last element of theta_j (or equivalently the last element of theta_list), this is not necessarily the element having the smallest CDF.
That is OK!
In my understanding, the feature extraction is implemented as the following steps:
- The last element of
idx_list
indicate a feature that has the smallest CDF among thecurrent_feature
(idx_list = np.argsort(cnf_list_)[::-1]
) - The last element of
idx_list
is excluded if not returned (idx_without_d = np.where(current_feat != excluded_dim, True, False)
) - That is why a feature that has the smallest CDF seems to be excluded.
Are the steps correct?
And I cannot understand why the last element of idx_list
is excluded in the step3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fullflu Thanks for clarifying your point!
The last element of idx_list indicate a feature that has the smallest CDF among the current_feature
Let me first clarify that the last element of idx_list
is the feature that leads to the smallest CDF when it is excluded from OPE. And if all features pass the criterion (np.abs(theta_j - theta_i) <= cnf_i + C * cnf_j).all()
, we know that we can exclude any feature from current_feature
and proceed to the next round. Then, I think we should exclude the feature that leads to the smallest CDF, as it has already satisfied the criterion. (if this is not true, which you think is the appropriate feature to exclude when proceed to the next iteration?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your thorough explanation!
I read the algorithm again, and I find that the process has no problem!
(I think that excluding the feature that leads to the smallest CDF or the largest CDF might be a greedy algorithm, and we cannot decide which one is better.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fullflu Thanks for checking again!
the smallest CDF or the largest CDF might be a greedy algorithm, and we cannot decide which one is better.)
I agree, but I thought the feature that leads to the smallest CDF should be removed because passing (np.abs(theta_j - theta_i) <= cnf_i + C * cnf_j).all()
becomes the most difficult with the smallest CDF, and if the feature with the smallest CDF pass the criterion, then we can be aggressive and simply remove that feature.
@fullflu Thanks for the detailed feedback! |
Overview
obp.ope.MarginalziedInverseProbabilityWeighting
andobp.ope.SelfNormalizedMarginalizedInverseProbabilityWeighting
. The first one is the main proposal of the reference paper (called MIPS in that paper), and the latter is a trivial extension.obp.dataset.SyntheticBanditDatasetWithActionEmbeds
, which generates synthetic bandit data with action embeddings consisting of some discrete random variables (e.g., category information of movies). The data generating process of this class follows the synthetic experiment conducted in the reference paper.Reference
Yuta Saito and Thorsten Joachims.
"Off-Policy Evaluation for Large Action Spaces via Embeddings.", 2022.
https://arxiv.org/abs/2202.06317