Skip to content

Feature Implement QLearner #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2021
Merged

Feature Implement QLearner #144

merged 10 commits into from
Nov 11, 2021

Conversation

usaito
Copy link
Contributor

@usaito usaito commented Oct 26, 2021

New Features

  • implement QLearner in the policy module, which first estimates the expected reward function (or the q function) from the logged bandit data, and then use that estimator to make new decisions on the test data (such as via argmax or softmax)
  • See example 1 of in Appendix A of https://arxiv.org/pdf/2002.08536v2.pdf for the definition of QLearner implemented here
  • add some tests corresponds to the inputs and performance of the QLearner class in tests/policy
  • implement "Gumbel softmax trick" (see Section 4.1 of https://arxiv.org/pdf/2105.00855.pdf) in IPWLearner/QLearner/NNPolicyLearner (of obp.policy) to efficiently sample a ranking of actions from the plucket luce ranking distribution

Minor Fix

  • unify some test functions to delete repetitive training of a same policy in tests/policy/test_offline_learner_performance.py to reduce the time needed to run the tests
  • fix some numpy/pandas/sklearn warnings (now the tests will run without any warnings)

Copy link
Contributor

@fullflu fullflu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits (no warning happened)

I'll add a review of QLearner in the next comment.

@usaito
Copy link
Contributor Author

usaito commented Nov 11, 2021

@fullflu Thanks!

@usaito usaito changed the title [WIP] Feature Implement QLearner Feature Implement QLearner Nov 11, 2021
@usaito usaito merged commit fc5d628 into master Nov 11, 2021
@usaito usaito deleted the feature/q-learning branch November 11, 2021 07:02
Copy link
Contributor

@fullflu fullflu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of QLearner

@usaito
Copy link
Contributor Author

usaito commented Nov 11, 2021

@fullflu Thanks again for your thoughtful comments! I'll address some recent minor comments in the following ongoing PR.
#145

@usaito usaito restored the feature/q-learning branch November 12, 2021 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants