Skip to content

feature: faster calc_pscore #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 29, 2021
Merged

feature: faster calc_pscore #101

merged 3 commits into from
May 29, 2021

Conversation

usaito
Copy link
Contributor

@usaito usaito commented May 28, 2021

feature

  • implemented a faster calc_pscore function _calc_pscore_given_policy_logit (instead of the previous _calc_pscore_given_action_list)
    • In the previous implementation, _calc_pscore_given_action_list( calculates the pscore of given policy_logit_i_ and action_list, meaning that we had to iterate all the rounds and combinatorial actions to calculate the pscores.
    • In fact, we can calculate the pscore of all possible slates at the same time, and this will make some functions, such as sample_action_and_obtain_pscore( and calc_ground_truth_policy_value(, much faster

before

def _calc_pscore_given_action_list(
self, action_list: List[int], policy_logit_i_: np.ndarray
) -> float:
"""Calculate the propensity score given combinatorial set of actions.
Parameters
------------
action_list: List[int], len=len_list
List of combinatorial set of slate actions.
policy_logit_i_: array-like, (n_unique_action, )
Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`.
"""
unique_action_set = np.arange(self.n_unique_action)
pscore_ = 1.0
for action in action_list:
score_ = softmax(policy_logit_i_[:, unique_action_set])[0]
action_index = np.where(unique_action_set == action)[0][0]
pscore_ *= score_[action_index]
unique_action_set = np.delete(
unique_action_set, unique_action_set == action
)
return pscore_

after

def _calc_pscore_given_policy_logit(
self, all_slate_actions: np.ndarray, policy_logit_i_: np.ndarray
) -> np.ndarray:
"""Calculate the propensity score of each of the possible slate actions given policy_logit.
Parameters
------------
all_slate_actions: array-like, (n_action, len_list)
All possible slate actions.
policy_logit_i_: array-like, (n_unique_action, )
Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`.
Returns
------------
pscores: array-like, (n_action, )
Propensity scores of all the possible slate actions given policy_logit.
"""
n_actions = len(all_slate_actions)
unique_action_set_2d = np.tile(np.arange(self.n_unique_action), (n_actions, 1))
pscores = np.ones(n_actions)
for position_ in np.arange(self.len_list):
action_index = np.where(
unique_action_set_2d == all_slate_actions[:, position_][:, np.newaxis]
)[1]
pscores *= softmax(policy_logit_i_[unique_action_set_2d])[
np.arange(n_actions), action_index
]
# delete actions
if position_ + 1 != self.len_list:
mask = np.ones((n_actions, self.n_unique_action - position_))
mask[np.arange(n_actions), action_index] = 0
unique_action_set_2d = unique_action_set_2d[mask.astype(bool)].reshape(
(-1, self.n_unique_action - position_ - 1)
)
return pscores

@usaito usaito merged commit e9333d0 into master May 29, 2021
@usaito usaito deleted the feature/fast-calc-pscore-slate branch May 29, 2021 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant