Feature: Implementing Continuous NN Policy Learner #114

usaito · 2021-07-07T07:22:53Z

new features

implement BaseContinuousOfflinePolicyLearner in policy/base.py
implement ContinuousNNPolicyLearner in policy/offline_continuous.py, which trains a decision making policy modeled by a neural network using logged bandit data with continuous actions. This class works as follows.

# define a neural network policy
nn_policy = ContinuousNNPolicyLearner(
   dim_context=10,
   pg_method="dpg", # pg_method: policy gradient method
   hidden_layer_size=(10,10,10),
   activation="relu",
   solver="adam"
)

# train the policy on logged bandit feedback data
nn_policy.fit(
    context=bandit_feedback_train["context"],
    action=bandit_feedback_train["action"],
    reward=bandit_feedback_train["reward"],
    pscore=bandit_feedback_train["pscore"],
)

# predict new action values for the test data
predicted_actions = nn_policy.fit(
    context=bandit_feedback_test["context"],
)

where bandit_feedback is assumed to be generated by SyntheticContinuousBanditFeedback.

reference
Nathan Kallus and Masatoshi Uehara.
"Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies", NeurIPS2020.

tests

test_offline_continuous.py checks the inputs of the functions of ContinuousNNPolicyLearner.
test_offline_learner_performance.py checks whether ContinuousNNPolicyLearner can outperform the uniform random policy (which randomly samples continuous actions from the action space) on a simple synthetic setting.

refactor

add some descriptions to test_offline.py and test_offline_learner_performance.py (just for clarification)
fix typos in offline.py

…ontinuous-policy-learner

…ataset

obp/policy/offline_continuous.py

tests/policy/test_offline_continuous.py

nomuramasahir0 · 2021-08-15T06:08:10Z

After confirming the above minor comments and resolve the conflict, LGTM!

usaito · 2021-08-15T13:30:26Z

@nmasahiro Thanks!

fix typos

1c85d49

usaito changed the base branch from master to continuous-dataset July 7, 2021 07:23

usaito added 8 commits July 7, 2021 21:03

implement ContinuousNNPolicyLearner

a0a1ace

fix typos

0332bb0

add some descriptions

1d5aaf2

add tests of ContinuousNNPolicyLearner

1e04f89

add some check functions

d92a476

update

06ecd89

Merge branch 'continuous-dataset' of github.com:st-tech/zr-obp into c…

3189863

…ontinuous-policy-learner

fix some tests to adjust to the changes of SyntheticContinuousBanditD…

900c4ff

…ataset

usaito changed the title ~~[WIP] Feature: Continuous NN Policy Learner~~ Feature: Implementing Continuous NN Policy Learner Jul 8, 2021

usaito added 2 commits July 8, 2021 20:20

fix docstrings

f404d92

fix docstrings

eee7bc9

nomuramasahir0 reviewed Aug 15, 2021

View reviewed changes

obp/policy/offline_continuous.py Outdated Show resolved Hide resolved

obp/policy/offline_continuous.py Outdated Show resolved Hide resolved

obp/policy/offline_continuous.py Outdated Show resolved Hide resolved

tests/policy/test_offline_continuous.py Outdated Show resolved Hide resolved

usaito and others added 2 commits August 15, 2021 22:56

update based on review

e7c940a

Merge branch 'continuous-dataset' into continuous-policy-learner

460d8ca

usaito merged commit e52a250 into continuous-dataset Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Implementing Continuous NN Policy Learner #114

Feature: Implementing Continuous NN Policy Learner #114

Uh oh!

usaito commented Jul 7, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nomuramasahir0 commented Aug 15, 2021

Uh oh!

usaito commented Aug 15, 2021

Uh oh!

Uh oh!

Feature: Implementing Continuous NN Policy Learner #114

Feature: Implementing Continuous NN Policy Learner #114

Uh oh!

Conversation

usaito commented Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

new features

tests

refactor

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nomuramasahir0 commented Aug 15, 2021

Uh oh!

usaito commented Aug 15, 2021

Uh oh!

Uh oh!

usaito commented Jul 7, 2021 •

edited

Loading