Feature Implement QLearner #144

usaito · 2021-10-26T11:57:17Z

New Features

implement QLearner in the policy module, which first estimates the expected reward function (or the q function) from the logged bandit data, and then use that estimator to make new decisions on the test data (such as via argmax or softmax)
See example 1 of in Appendix A of https://arxiv.org/pdf/2002.08536v2.pdf for the definition of QLearner implemented here
add some tests corresponds to the inputs and performance of the QLearner class in tests/policy
implement "Gumbel softmax trick" (see Section 4.1 of https://arxiv.org/pdf/2105.00855.pdf) in IPWLearner/QLearner/NNPolicyLearner (of obp.policy) to efficiently sample a ranking of actions from the plucket luce ranking distribution

Minor Fix

unify some test functions to delete repetitive training of a same policy in tests/policy/test_offline_learner_performance.py to reduce the time needed to run the tests
fix some numpy/pandas/sklearn warnings (now the tests will run without any warnings)

fullflu

nits (no warning happened)

I'll add a review of QLearner in the next comment.

obp/dataset/synthetic_slate.py

obp/utils.py

usaito · 2021-11-11T05:58:43Z

@fullflu Thanks!

fullflu

Review of QLearner

obp/policy/offline.py

usaito · 2021-11-11T13:34:18Z

@fullflu Thanks again for your thoughtful comments! I'll address some recent minor comments in the following ongoing PR.
#145

usaito added 9 commits October 26, 2021 07:56

add q-learner

240a6ff

add tests

adf796c

unify some test funcs

fdd4ca2

fix circular import error

7aca078

fix numpy warnings

7d2dab2

implement the gumble softmax trick to sample rankings of actions

d5d1070

fix runtime warnings

9e5234c

fix docs

1d6351b

fix warnings

b0851b1

fullflu reviewed Nov 11, 2021

View reviewed changes

obp/dataset/synthetic_slate.py Outdated Show resolved Hide resolved

obp/utils.py Outdated Show resolved Hide resolved

fix typos

0583583

usaito changed the title ~~[WIP] Feature Implement QLearner~~ Feature Implement QLearner Nov 11, 2021

usaito merged commit fc5d628 into master Nov 11, 2021

usaito deleted the feature/q-learning branch November 11, 2021 07:02

fullflu reviewed Nov 11, 2021

View reviewed changes

usaito restored the feature/q-learning branch November 12, 2021 15:02

usaito mentioned this pull request Nov 12, 2021

Modify Synthetic Reward/Behavior Policy Functions #145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Implement QLearner #144

Feature Implement QLearner #144

Uh oh!

usaito commented Oct 26, 2021 •

edited

Loading

Uh oh!

fullflu left a comment

Uh oh!

Uh oh!

Uh oh!

usaito commented Nov 11, 2021

Uh oh!

fullflu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

usaito commented Nov 11, 2021

Uh oh!

Uh oh!

Feature Implement QLearner #144

Feature Implement QLearner #144

Uh oh!

Conversation

usaito commented Oct 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Features

Minor Fix

Uh oh!

fullflu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

usaito commented Nov 11, 2021

Uh oh!

fullflu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

usaito commented Nov 11, 2021

Uh oh!

Uh oh!

usaito commented Oct 26, 2021 •

edited

Loading