Skip to content

Feature: Implementing Continuous NN Policy Learner #114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 15, 2021

Conversation

usaito
Copy link
Contributor

@usaito usaito commented Jul 7, 2021

new features

  • implement BaseContinuousOfflinePolicyLearner in policy/base.py
  • implement ContinuousNNPolicyLearner in policy/offline_continuous.py, which trains a decision making policy modeled by a neural network using logged bandit data with continuous actions. This class works as follows.
# define a neural network policy
nn_policy = ContinuousNNPolicyLearner(
   dim_context=10,
   pg_method="dpg", # pg_method: policy gradient method
   hidden_layer_size=(10,10,10),
   activation="relu",
   solver="adam"
)

# train the policy on logged bandit feedback data
nn_policy.fit(
    context=bandit_feedback_train["context"],
    action=bandit_feedback_train["action"],
    reward=bandit_feedback_train["reward"],
    pscore=bandit_feedback_train["pscore"],
)

# predict new action values for the test data
predicted_actions = nn_policy.fit(
    context=bandit_feedback_test["context"],
)

where bandit_feedback is assumed to be generated by SyntheticContinuousBanditFeedback.

reference
Nathan Kallus and Masatoshi Uehara.
"Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies", NeurIPS2020.

tests

  • test_offline_continuous.py checks the inputs of the functions of ContinuousNNPolicyLearner.
  • test_offline_learner_performance.py checks whether ContinuousNNPolicyLearner can outperform the uniform random policy (which randomly samples continuous actions from the action space) on a simple synthetic setting.

refactor

@usaito usaito changed the base branch from master to continuous-dataset July 7, 2021 07:23
@usaito usaito changed the title [WIP] Feature: Continuous NN Policy Learner Feature: Implementing Continuous NN Policy Learner Jul 8, 2021
@nomuramasahir0
Copy link
Contributor

After confirming the above minor comments and resolve the conflict, LGTM!

@usaito
Copy link
Contributor Author

usaito commented Aug 15, 2021

@nmasahiro Thanks!

@usaito usaito merged commit e52a250 into continuous-dataset Aug 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants