Skip to content

Commit 288a5c9

Browse files
authored
Merge pull request #151 from st-tech/update-version
Update version to 0.5.2
2 parents 73e065c + ff82deb commit 288a5c9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+2172
-4512
lines changed

examples/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Open Bandit Pipeline Examples
22

3-
This page contains a list of example codes written with the Open Bandit Pipeline.
3+
This page contains a list of examples written with Open Bandit Pipeline.
44

55
- [`obd/`](./obd/): example implementations for evaluating standard off-policy estimators with the small sample Open Bandit Dataset.
66
- [`synthetic/`](./synthetic/): example implementations for evaluating several off-policy estimators with synthetic bandit datasets.
77
- [`multiclass/`](./multiclass/): example implementations for evaluating several off-policy estimators with multi-class classification datasets.
88
- [`online/`](./online/): example implementations for evaluating Replay Method with online bandit algorithms.
99
- [`opl/`](./opl/): example implementations for comparing the performance of several off-policy learners with synthetic bandit datasets.
10-
- [`quickstart/`](./quickstart/): some quickstart notebooks to guide the usage of the Open Bandit Pipeline.
10+
- [`quickstart/`](./quickstart/): some quickstart notebooks to guide the usage of Open Bandit Pipeline.

examples/multiclass/README.md

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
# Example with Multi-class Classification Data
1+
# Example Experiment with Multi-class Classification Data
22

33

44
## Description
55

6-
Here, we use multi-class classification datasets to evaluate OPE estimators.
7-
Specifically, we evaluate the estimation performances of well-known off-policy estimators using the ground-truth policy value of an evaluation policy calculable with multi-class classification data.
6+
We use multi-class classification datasets to evaluate OPE estimators. Specifically, we evaluate the estimation performance of some well-known OPE estimators using the ground-truth policy value of an evaluation policy calculable with multi-class classification data.
87

98
## Evaluating Off-Policy Estimators
109

11-
In the following, we evaluate the estimation performances of
10+
In the following, we evaluate the estimation performance of
11+
1212
- Direct Method (DM)
1313
- Inverse Probability Weighting (IPW)
1414
- Self-Normalized Inverse Probability Weighting (SNIPW)
@@ -17,12 +17,12 @@ In the following, we evaluate the estimation performances of
1717
- Switch Doubly Robust (Switch-DR)
1818
- Doubly Robust with Optimistic Shrinkage (DRos)
1919

20-
For Switch-DR and DRos, we try some different values of hyperparameters.
20+
For Switch-DR and DRos, we tune the built-in hyperparameters using SLOPE (Su et al., 2020; Tucker et al., 2021), a data-driven hyperparameter tuning method for OPE estimators.
2121
See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
2222

2323
### Files
2424
- [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators using multi-class classification data.
25-
- [`./conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some machine learning methods used to define regression model.
25+
- [`./conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML methods used to define regression model.
2626

2727
### Scripts
2828

@@ -50,38 +50,46 @@ python evaluate_off_policy_estimators.py\
5050
- `$base_model_for_reg_model` specifies the base ML model for defining regression model and should be one of "logistic_regression", "random_forest", or "lightgbm".
5151
- `$n_jobs` is the maximum number of concurrently running jobs.
5252

53-
For example, the following command compares the estimation performances (relative estimation error; relative-ee) of the OPE estimators using the digits dataset.
53+
For example, the following command compares the estimation performance (relative estimation error; relative-ee) of the OPE estimators using the digits dataset.
5454

5555
```bash
5656
python evaluate_off_policy_estimators.py\
57-
--n_runs 20\
57+
--n_runs 30\
5858
--dataset_name digits\
5959
--eval_size 0.7\
6060
--base_model_for_behavior_policy logistic_regression\
61-
--alpha_b 0.8\
62-
--base_model_for_evaluation_policy logistic_regression\
61+
--alpha_b 0.4\
62+
--base_model_for_evaluation_policy random_forest\
6363
--alpha_e 0.9\
64-
--base_model_for_reg_model logistic_regression\
64+
--base_model_for_reg_model lightgbm\
6565
--n_jobs -1\
6666
--random_state 12345
6767

6868
# relative-ee of OPE estimators and their standard deviations (lower is better).
69-
# It appears that the performances of some OPE estimators depend on the choice of their hyperparameters.
7069
# =============================================
7170
# random_state=12345
7271
# ---------------------------------------------
73-
# mean std
74-
# dm 0.093439 0.015391
75-
# ipw 0.013286 0.008496
76-
# snipw 0.006797 0.004094
77-
# dr 0.007780 0.004492
78-
# sndr 0.007210 0.004089
79-
# switch-dr (lambda=1) 0.173282 0.020025
80-
# switch-dr (lambda=100) 0.007780 0.004492
81-
# dr-os (lambda=1) 0.079629 0.014008
82-
# dr-os (lambda=100) 0.008031 0.004634
72+
# mean std
73+
# dm 0.436541 0.017629
74+
# ipw 0.030288 0.024506
75+
# snipw 0.022764 0.017917
76+
# dr 0.016156 0.012679
77+
# sndr 0.022082 0.016865
78+
# switch-dr 0.034657 0.018575
79+
# dr-os 0.015868 0.012537
8380
# =============================================
8481
```
8582

86-
The above result can change with different situations.
87-
You can try the evaluation of OPE with other experimental settings easily.
83+
The above result can change with different situations. You can try the evaluation of OPE with other experimental settings easily.
84+
85+
86+
## References
87+
88+
- Yi Su, Pavithra Srinath, Akshay Krishnamurthy. [Adaptive Estimator Selection for Off-Policy Evaluation](https://arxiv.org/abs/2002.07729), ICML2020.
89+
- Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík. [Doubly Robust Off-policy Evaluation with Shrinkage](https://arxiv.org/abs/1907.09623), ICML2020.
90+
- George Tucker and Jonathan Lee. [Improved Estimator Selection for Off-Policy Evaluation](https://lyang36.github.io/icml2021_rltheory/camera_ready/79.pdf), Workshop on Reinforcement Learning
91+
Theory at ICML2021.
92+
- Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik. [Optimal and Adaptive Off-policy Evaluation in Contextual Bandits](https://arxiv.org/abs/1612.01205), ICML2017.
93+
- Miroslav Dudik, John Langford, Lihong Li. [Doubly Robust Policy Evaluation and Learning](https://arxiv.org/abs/1103.4601). ICML2011.
94+
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. [Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation](https://arxiv.org/abs/2008.07146). NeurIPS2021 Track on Datasets and Benchmarks.
95+

examples/multiclass/evaluate_off_policy_estimators.py

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@
1717
from obp.dataset import MultiClassToBanditReduction
1818
from obp.ope import DirectMethod
1919
from obp.ope import DoublyRobust
20-
from obp.ope import DoublyRobustWithShrinkage
20+
from obp.ope import DoublyRobustWithShrinkageTuning
2121
from obp.ope import InverseProbabilityWeighting
2222
from obp.ope import OffPolicyEvaluation
2323
from obp.ope import RegressionModel
2424
from obp.ope import SelfNormalizedDoublyRobust
2525
from obp.ope import SelfNormalizedInverseProbabilityWeighting
26-
from obp.ope import SwitchDoublyRobust
26+
from obp.ope import SwitchDoublyRobustTuning
2727

2828

2929
# hyperparameters of the regression model used in model dependent OPE estimators
@@ -50,10 +50,10 @@
5050
SelfNormalizedInverseProbabilityWeighting(),
5151
DoublyRobust(),
5252
SelfNormalizedDoublyRobust(),
53-
SwitchDoublyRobust(lambda_=1.0, estimator_name="switch-dr (lambda=1)"),
54-
SwitchDoublyRobust(lambda_=100.0, estimator_name="switch-dr (lambda=100)"),
55-
DoublyRobustWithShrinkage(lambda_=1.0, estimator_name="dr-os (lambda=1)"),
56-
DoublyRobustWithShrinkage(lambda_=100.0, estimator_name="dr-os (lambda=100)"),
53+
SwitchDoublyRobustTuning(lambdas=[10, 50, 100, 500, 1000, 5000, 10000, np.inf]),
54+
DoublyRobustWithShrinkageTuning(
55+
lambdas=[10, 50, 100, 500, 1000, 5000, 10000, np.inf]
56+
),
5757
]
5858

5959
if __name__ == "__main__":
@@ -161,7 +161,7 @@ def process(i: int):
161161
ground_truth_policy_value = dataset.calc_ground_truth_policy_value(
162162
action_dist=action_dist
163163
)
164-
# estimate the mean reward function of the evaluation set of multi-class classification data with ML model
164+
# estimate the reward function of the evaluation set of multi-class classification data with ML model
165165
regression_model = RegressionModel(
166166
n_actions=dataset.n_actions,
167167
base_model=base_model_dict[base_model_for_reg_model](
@@ -180,34 +180,35 @@ def process(i: int):
180180
bandit_feedback=bandit_feedback,
181181
ope_estimators=ope_estimators,
182182
)
183-
relative_ee_i = ope.evaluate_performance_of_estimators(
183+
metric_i = ope.evaluate_performance_of_estimators(
184184
ground_truth_policy_value=ground_truth_policy_value,
185185
action_dist=action_dist,
186186
estimated_rewards_by_reg_model=estimated_rewards_by_reg_model,
187+
metric="relative-ee",
187188
)
188189

189-
return relative_ee_i
190+
return metric_i
190191

191192
processed = Parallel(
192193
n_jobs=n_jobs,
193194
verbose=50,
194195
)([delayed(process)(i) for i in np.arange(n_runs)])
195-
relative_ee_dict = {est.estimator_name: dict() for est in ope_estimators}
196-
for i, relative_ee_i in enumerate(processed):
196+
metric_dict = {est.estimator_name: dict() for est in ope_estimators}
197+
for i, metric_i in enumerate(processed):
197198
for (
198199
estimator_name,
199200
relative_ee_,
200-
) in relative_ee_i.items():
201-
relative_ee_dict[estimator_name][i] = relative_ee_
202-
relative_ee_df = DataFrame(relative_ee_dict).describe().T.round(6)
201+
) in metric_i.items():
202+
metric_dict[estimator_name][i] = relative_ee_
203+
result_df = DataFrame(metric_dict).describe().T.round(6)
203204

204205
print("=" * 45)
205206
print(f"random_state={random_state}")
206207
print("-" * 45)
207-
print(relative_ee_df[["mean", "std"]])
208+
print(result_df[["mean", "std"]])
208209
print("=" * 45)
209210

210211
# save results of the evaluation of off-policy estimators in './logs' directory.
211212
log_path = Path(f"./logs/{dataset_name}")
212213
log_path.mkdir(exist_ok=True, parents=True)
213-
relative_ee_df.to_csv(log_path / "relative_ee_of_ope_estimators.csv")
214+
result_df.to_csv(log_path / "evaluation_of_ope_results.csv")

examples/obd/README.md

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,27 @@
1-
# Example with the Open Bandit Dataset (OBD)
1+
# Example Experiment with Open Bandit Dataset
22

33
## Description
44

5-
Here, we use the open bandit dataset and pipeline to implement and evaluate OPE. Specifically, we evaluate the estimation performances of well-known off-policy estimators using the ground-truth policy value of an evaluation policy, which is calculable with our data using on-policy estimation.
5+
We use Open Bandit Dataset to implement the evaluation of OPE. Specifically, we evaluate the estimation performance of some well-known OPE estimators using the on-policy policy value of an evaluation policy, which is calculable with the dataset.
66

77
## Evaluating Off-Policy Estimators
88

9-
We evaluate the estimation performances of off-policy estimators, including Direct Method (DM), Inverse Probability Weighting (IPW), and Doubly Robust (DR).
9+
In the following, we evaluate the estimation performance of
10+
11+
- Direct Method (DM)
12+
- Inverse Probability Weighting (IPW)
13+
- Self-Normalized Inverse Probability Weighting (SNIPW)
14+
- Doubly Robust (DR)
15+
- Self-Normalized Doubly Robust (SNDR)
16+
- Switch Doubly Robust (Switch-DR)
17+
- Doubly Robust with Optimistic Shrinkage (DRos)
18+
19+
For Switch-DR and DRos, we tune the built-in hyperparameters using SLOPE, a data-driven hyperparameter tuning method for OPE estimators.
20+
See [our documentation](https://zr-obp.readthedocs.io/en/latest/estimators.html) for the details about these estimators.
1021

1122
### Files
12-
- [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators.
13-
- [`.conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some machine learning models used as the regression model in model dependent estimators (such as DM and DR).
23+
- [`./evaluate_off_policy_estimators.py`](./evaluate_off_policy_estimators.py) implements the evaluation of OPE estimators using Open Bandit Dataset.
24+
- [`.conf/hyperparams.yaml`](./conf/hyperparams.yaml) defines hyperparameters of some ML models used as the regression model in model dependent estimators (such as DM and DR).
1425

1526
### Scripts
1627

@@ -34,28 +45,43 @@ They should be either 'bts' or 'random'.
3445
- `$n_sim_to_compute_action_dist` is the number of monte carlo simulation to compute the action distribution of a given evaluation policy.
3546
- `$n_jobs` is the maximum number of concurrently running jobs.
3647

37-
For example, the following command compares the estimation performances of the three OPE estimators by using Bernoulli TS as evaluation policy and Random as behavior policy in "All" campaign.
48+
For example, the following command compares the estimation performance of the three OPE estimators by using Bernoulli TS as evaluation policy and Random as behavior policy in "All" campaign.
3849

3950
```bash
4051
python evaluate_off_policy_estimators.py\
41-
--n_runs 20\
52+
--n_runs 30\
4253
--base_model logistic_regression\
4354
--evaluation_policy bts\
4455
--behavior_policy random\
4556
--campaign all\
4657
--n_jobs -1
4758

4859
# relative estimation errors of OPE estimators and their standard deviations.
49-
# our evaluation of OPE procedure suggests that DM performs best among the three OPE estimators, because it has low variance property.
50-
# (Note that this result is with the small sample data, and please use the full size data for a more reasonable experiment)
5160
# ==============================
5261
# random_state=12345
5362
# ------------------------------
54-
# mean std
55-
# dm 0.180269 0.114716
56-
# ipw 0.333113 0.350425
57-
# dr 0.304422 0.347866
63+
# mean std
64+
# dm 0.156876 0.109898
65+
# ipw 0.311082 0.311170
66+
# snipw 0.311795 0.334736
67+
# dr 0.292464 0.315485
68+
# sndr 0.302407 0.328434
69+
# switch-dr 0.258410 0.160598
70+
# dr-os 0.159520 0.109660
5871
# ==============================
5972
```
6073

61-
Please refer to [this page](https://zr-obp.readthedocs.io/en/latest/evaluation_ope.html) for the evaluation of OPE protocol using our real-world data. Please visit [synthetic](../synthetic/) to try the evaluation of OPE estimators with synthetic bandit datasets. Moreover, in [benchmark/ope](https://github.com/st-tech/zr-obp/tree/master/benchmark/ope), we performed the benchmark experiments on several OPE estimators using the full size Open Bandit Dataset.
74+
Please refer to [this page](https://zr-obp.readthedocs.io/en/latest/evaluation_ope.html) for the evaluation of OPE protocol using our real-world data. Please visit [synthetic](../synthetic/) to try the evaluation of OPE estimators with synthetic bandit data. Moreover, in [benchmark/ope](https://github.com/st-tech/zr-obp/tree/master/benchmark/ope), we performed the benchmark experiments on several OPE estimators using the full size Open Bandit Dataset.
75+
76+
77+
78+
## References
79+
80+
- Yi Su, Pavithra Srinath, Akshay Krishnamurthy. [Adaptive Estimator Selection for Off-Policy Evaluation](https://arxiv.org/abs/2002.07729), ICML2020.
81+
- Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, Miroslav Dudík. [Doubly Robust Off-policy Evaluation with Shrinkage](https://arxiv.org/abs/1907.09623), ICML2020.
82+
- George Tucker and Jonathan Lee. [Improved Estimator Selection for Off-Policy Evaluation](https://lyang36.github.io/icml2021_rltheory/camera_ready/79.pdf), Workshop on Reinforcement Learning
83+
Theory at ICML2021.
84+
- Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik. [Optimal and Adaptive Off-policy Evaluation in Contextual Bandits](https://arxiv.org/abs/1612.01205), ICML2017.
85+
- Miroslav Dudik, John Langford, Lihong Li. [Doubly Robust Policy Evaluation and Learning](https://arxiv.org/abs/1103.4601). ICML2011.
86+
- Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita. [Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation](https://arxiv.org/abs/2008.07146). NeurIPS2021 Track on Datasets and Benchmarks.
87+

examples/obd/evaluate_off_policy_estimators.py

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,13 @@
1313
from obp.dataset import OpenBanditDataset
1414
from obp.ope import DirectMethod
1515
from obp.ope import DoublyRobust
16+
from obp.ope import DoublyRobustWithShrinkageTuning
1617
from obp.ope import InverseProbabilityWeighting
1718
from obp.ope import OffPolicyEvaluation
1819
from obp.ope import RegressionModel
20+
from obp.ope import SelfNormalizedDoublyRobust
21+
from obp.ope import SelfNormalizedInverseProbabilityWeighting
22+
from obp.ope import SwitchDoublyRobustTuning
1923
from obp.policy import BernoulliTS
2024
from obp.policy import Random
2125

@@ -32,8 +36,19 @@
3236
random_forest=RandomForestClassifier,
3337
)
3438

35-
# OPE estimators compared
36-
ope_estimators = [DirectMethod(), InverseProbabilityWeighting(), DoublyRobust()]
39+
# compared OPE estimators
40+
ope_estimators = [
41+
DirectMethod(),
42+
InverseProbabilityWeighting(),
43+
SelfNormalizedInverseProbabilityWeighting(),
44+
DoublyRobust(),
45+
SelfNormalizedDoublyRobust(),
46+
SwitchDoublyRobustTuning(lambdas=[10, 50, 100, 500, 1000, 5000, 10000, np.inf]),
47+
DoublyRobustWithShrinkageTuning(
48+
lambdas=[10, 50, 100, 500, 1000, 5000, 10000, np.inf]
49+
),
50+
]
51+
3752

3853
if __name__ == "__main__":
3954
parser = argparse.ArgumentParser(description="evaluate off-policy estimators.")
@@ -123,7 +138,7 @@
123138
def process(b: int):
124139
# sample bootstrap from batch logged bandit feedback
125140
bandit_feedback = obd.sample_bootstrap_bandit_feedback(random_state=b)
126-
# estimate the mean reward function with an ML model
141+
# estimate the reward function with an ML model
127142
regression_model = RegressionModel(
128143
n_actions=obd.n_actions,
129144
len_list=obd.len_list,
@@ -151,6 +166,7 @@ def process(b: int):
151166
ground_truth_policy_value=ground_truth_policy_value,
152167
action_dist=action_dist,
153168
estimated_rewards_by_reg_model=estimated_rewards_by_reg_model,
169+
metric="relative-ee",
154170
)
155171

156172
return relative_ee_b
@@ -159,22 +175,22 @@ def process(b: int):
159175
n_jobs=n_jobs,
160176
verbose=50,
161177
)([delayed(process)(i) for i in np.arange(n_runs)])
162-
relative_ee_dict = {est.estimator_name: dict() for est in ope_estimators}
178+
metric_dict = {est.estimator_name: dict() for est in ope_estimators}
163179
for b, relative_ee_b in enumerate(processed):
164180
for (
165181
estimator_name,
166182
relative_ee_,
167183
) in relative_ee_b.items():
168-
relative_ee_dict[estimator_name][b] = relative_ee_
169-
relative_ee_df = DataFrame(relative_ee_dict).describe().T.round(6)
184+
metric_dict[estimator_name][b] = relative_ee_
185+
results_df = DataFrame(metric_dict).describe().T.round(6)
170186

171187
print("=" * 30)
172188
print(f"random_state={random_state}")
173189
print("-" * 30)
174-
print(relative_ee_df[["mean", "std"]])
190+
print(results_df[["mean", "std"]])
175191
print("=" * 30)
176192

177193
# save results of the evaluation of off-policy estimators in './logs' directory.
178194
log_path = Path("./logs") / behavior_policy / campaign
179195
log_path.mkdir(exist_ok=True, parents=True)
180-
relative_ee_df.to_csv(log_path / "relative_ee_of_ope_estimators.csv")
196+
results_df.to_csv(log_path / "evaluation_of_ope_results.csv")

0 commit comments

Comments
 (0)