Skip to content

Commit 0bf531c

Browse files
authored
Merge pull request #168 from st-tech/feature/tuning-sort
Automatic candidate hyperparamer sorting for slope
2 parents e1291c0 + 437ab5c commit 0bf531c

File tree

2 files changed

+39
-21
lines changed

2 files changed

+39
-21
lines changed

obp/ope/estimators.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -526,11 +526,11 @@ def _estimate_mse_score(
526526
(If only a single action is chosen for each data, you can just ignore this argument.)
527527
528528
use_bias_upper_bound: bool, default=True
529-
Whether to use bias upper bound in hyperparameter tuning.
529+
Whether to use a bias upper bound in hyperparameter tuning.
530530
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
531531
532532
delta: float, default=0.05
533-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
533+
A confidence delta to construct a high probability upper bound used in SLOPE.
534534
535535
Returns
536536
----------
@@ -1207,11 +1207,11 @@ def _estimate_mse_score(
12071207
Estimated expected rewards given context, action, and position, i.e., :math:`\\hat{q}(x_i,a_i)`.
12081208
12091209
use_bias_upper_bound: bool, default=True
1210-
Whether to use bias upper bound in hyperparameter tuning.
1210+
Whether to use a bias upper bound in hyperparameter tuning.
12111211
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
12121212
12131213
delta: float, default=0.05
1214-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
1214+
A confidence delta to construct a high probability upper bound used in SLOPE.
12151215
12161216
Returns
12171217
----------
@@ -1511,11 +1511,11 @@ def _estimate_mse_score(
15111511
(If only a single action is chosen for each data, you can just ignore this argument.)
15121512
15131513
use_bias_upper_bound: bool, default=True
1514-
Whether to use bias upper bound in hyperparameter tuning.
1514+
Whether to use a bias upper bound in hyperparameter tuning.
15151515
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
15161516
15171517
delta: float, default=0.05
1518-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
1518+
A confidence delta to construct a high probability upper bound used in SLOPE.
15191519
15201520
Returns
15211521
----------
@@ -1719,11 +1719,11 @@ def _estimate_mse_score(
17191719
Indices to differentiate positions in a recommendation interface where the actions are presented.
17201720
17211721
use_bias_upper_bound: bool, default=True
1722-
Whether to use bias upper bound in hyperparameter tuning.
1722+
Whether to use a bias upper bound in hyperparameter tuning.
17231723
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
17241724
17251725
delta: float, default=0.05
1726-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
1726+
A confidence delta to construct a high probability upper bound used in SLOPE.
17271727
17281728
Returns
17291729
----------
@@ -1907,11 +1907,11 @@ def _estimate_mse_score(
19071907
Indices to differentiate positions in a recommendation interface where the actions are presented.
19081908
19091909
use_bias_upper_bound: bool, default=True
1910-
Whether to use bias upper bound in hyperparameter tuning.
1910+
Whether to use a bias upper bound in hyperparameter tuning.
19111911
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
19121912
19131913
delta: float, default=0.05
1914-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
1914+
A confidence delta to construct a high probability upper bound used in SLOPE.
19151915
19161916
Returns
19171917
----------
@@ -2106,11 +2106,11 @@ def _estimate_mse_score(
21062106
Indices to differentiate positions in a recommendation interface where the actions are presented.
21072107
21082108
use_bias_upper_bound: bool, default=True
2109-
Whether to use bias upper bound in hyperparameter tuning.
2109+
Whether to use a bias upper bound in hyperparameter tuning.
21102110
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
21112111
21122112
delta: float, default=0.05
2113-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
2113+
A confidence delta to construct a high probability upper bound used in SLOPE.
21142114
21152115
Returns
21162116
----------

obp/ope/estimators_tuning.py

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,11 @@ class BaseOffPolicyEstimatorTuning:
4141
which improves the original SLOPE proposed by Su et al.(2020).
4242
4343
use_bias_upper_bound: bool, default=True
44-
Whether to use bias upper bound in hyperparameter tuning.
44+
Whether to use a bias upper bound in hyperparameter tuning.
4545
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
4646
47-
delta: float, default=0.05
48-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
47+
delta: float, default=0.1
48+
A confidence delta to construct a high probability upper bound used in SLOPE.
4949
5050
use_estimated_pscore: bool, default=False.
5151
If True, `estimated_pscore` is used, otherwise, `pscore` (the true propensity scores) is used.
@@ -70,7 +70,7 @@ class BaseOffPolicyEstimatorTuning:
7070
lambdas: List[float] = None
7171
tuning_method: str = "slope"
7272
use_bias_upper_bound: bool = True
73-
delta: float = 0.05
73+
delta: float = 0.1
7474
use_estimated_pscore: bool = False
7575

7676
def __new__(cls, *args, **kwargs):
@@ -151,7 +151,6 @@ def _tune_hyperparam_with_slope(
151151
) -> float:
152152
"""Find the best hyperparameter value from the candidate set by SLOPE."""
153153
C = np.sqrt(6) - 1
154-
theta_list, cnf_list = [], []
155154
theta_list_for_sort, cnf_list_for_sort = [], []
156155
for hyperparam_ in self.lambdas:
157156
estimated_round_rewards = self.base_ope_estimator(
@@ -172,6 +171,7 @@ def _tune_hyperparam_with_slope(
172171
)
173172
cnf_list_for_sort.append(cnf)
174173

174+
theta_list, cnf_list = [], []
175175
sorted_idx_list = np.argsort(cnf_list_for_sort)[::-1]
176176
for i, idx in enumerate(sorted_idx_list):
177177
cnf_i = cnf_list_for_sort[idx]
@@ -380,6 +380,8 @@ class InverseProbabilityWeightingTuning(BaseOffPolicyEstimatorTuning):
380380
A list of candidate clipping hyperparameters.
381381
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
382382
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
383+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
384+
assumption of SLOPE.
383385
384386
tuning_method: str, default="slope".
385387
A method used to tune the hyperparameter of an OPE estimator.
@@ -388,11 +390,11 @@ class InverseProbabilityWeightingTuning(BaseOffPolicyEstimatorTuning):
388390
which improves the original SLOPE proposed by Su et al.(2020).
389391
390392
use_bias_upper_bound: bool, default=True
391-
Whether to use bias upper bound in hyperparameter tuning.
393+
Whether to use a bias upper bound in hyperparameter tuning.
392394
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
393395
394396
delta: float, default=0.05
395-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
397+
A confidence delta to construct a high probability upper bound used in SLOPE.
396398
397399
use_estimated_pscore: bool, default=False.
398400
If True, `estimated_pscore` is used, otherwise, `pscore` (the true propensity scores) is used.
@@ -417,6 +419,7 @@ def __post_init__(self) -> None:
417419
self.base_ope_estimator = InverseProbabilityWeighting
418420
super()._check_lambdas()
419421
super()._check_init_inputs()
422+
self.lambdas.sort(reverse=True)
420423

421424
def estimate_policy_value(
422425
self,
@@ -583,6 +586,8 @@ class DoublyRobustTuning(BaseOffPolicyEstimatorTuning):
583586
A list of candidate clipping hyperparameters.
584587
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
585588
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
589+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
590+
assumption of SLOPE.
586591
587592
tuning_method: str, default="slope".
588593
A method used to tune the hyperparameter of an OPE estimator.
@@ -614,6 +619,7 @@ def __post_init__(self) -> None:
614619
self.base_ope_estimator = DoublyRobust
615620
super()._check_lambdas()
616621
super()._check_init_inputs()
622+
self.lambdas.sort(reverse=True)
617623

618624
def estimate_policy_value(
619625
self,
@@ -801,6 +807,8 @@ class SwitchDoublyRobustTuning(BaseOffPolicyEstimatorTuning):
801807
A list of candidate switching hyperparameters.
802808
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
803809
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
810+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
811+
assumption of SLOPE.
804812
805813
tuning_method: str, default="slope".
806814
A method used to tune the hyperparameter of an OPE estimator.
@@ -831,6 +839,7 @@ def __post_init__(self) -> None:
831839
self.base_ope_estimator = SwitchDoublyRobust
832840
super()._check_lambdas()
833841
super()._check_init_inputs()
842+
self.lambdas.sort(reverse=True)
834843

835844
def estimate_policy_value(
836845
self,
@@ -1018,6 +1027,8 @@ class DoublyRobustWithShrinkageTuning(BaseOffPolicyEstimatorTuning):
10181027
A list of candidate shrinkage hyperparameters.
10191028
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
10201029
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
1030+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
1031+
assumption of SLOPE.
10211032
10221033
tuning_method: str, default="slope".
10231034
A method used to tune the hyperparameter of an OPE estimator.
@@ -1048,6 +1059,7 @@ def __post_init__(self) -> None:
10481059
self.base_ope_estimator = DoublyRobustWithShrinkage
10491060
super()._check_lambdas()
10501061
super()._check_init_inputs()
1062+
self.lambdas.sort()
10511063

10521064
def estimate_policy_value(
10531065
self,
@@ -1234,6 +1246,8 @@ class SubGaussianInverseProbabilityWeightingTuning(BaseOffPolicyEstimatorTuning)
12341246
A list of candidate hyperparameter values, which should be in the range of [0.0, 1.0].
12351247
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
12361248
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
1249+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
1250+
assumption of SLOPE.
12371251
12381252
tuning_method: str, default="slope".
12391253
A method used to tune the hyperparameter of an OPE estimator.
@@ -1242,11 +1256,11 @@ class SubGaussianInverseProbabilityWeightingTuning(BaseOffPolicyEstimatorTuning)
12421256
which improves the original SLOPE proposed by Su et al.(2020).
12431257
12441258
use_bias_upper_bound: bool, default=True
1245-
Whether to use bias upper bound in hyperparameter tuning.
1259+
Whether to use a bias upper bound in hyperparameter tuning.
12461260
If False, the direct bias estimator is used to estimate the MSE. See Su et al.(2020) for details.
12471261
12481262
delta: float, default=0.05
1249-
A confidence delta to construct a high probability upper bound based on Bernstein inequality.
1263+
A confidence delta to construct a high probability upper bound used in SLOPE.
12501264
12511265
use_estimated_pscore: bool, default=False.
12521266
If True, `estimated_pscore` is used, otherwise, `pscore` (the true propensity scores) is used.
@@ -1274,6 +1288,7 @@ def __post_init__(self) -> None:
12741288
self.base_ope_estimator = SubGaussianInverseProbabilityWeighting
12751289
super()._check_lambdas(max_val=1.0)
12761290
super()._check_init_inputs()
1291+
self.lambdas.sort()
12771292

12781293
def estimate_policy_value(
12791294
self,
@@ -1437,6 +1452,8 @@ class SubGaussianDoublyRobustTuning(BaseOffPolicyEstimatorTuning):
14371452
A list of candidate hyperparameter values, which should be in the range of [0.0, 1.0].
14381453
The automatic hyperparameter tuning procedure proposed by Su et al.(2020)
14391454
or Tucker and Lee.(2021) will choose the best hyperparameter value from the logged data.
1455+
The candidate hyperparameter values will be sorted automatically to ensure the monotonicity
1456+
assumption of SLOPE.
14401457
14411458
tuning_method: str, default="slope".
14421459
A method used to tune the hyperparameter of an OPE estimator.
@@ -1470,6 +1487,7 @@ def __post_init__(self) -> None:
14701487
self.base_ope_estimator = SubGaussianDoublyRobust
14711488
super()._check_lambdas(max_val=1.0)
14721489
super()._check_init_inputs()
1490+
self.lambdas.sort()
14731491

14741492
def estimate_policy_value(
14751493
self,

0 commit comments

Comments
 (0)