feature: faster calc_pscore #101

usaito · 2021-05-28T23:53:24Z

feature

implemented a faster calc_pscore function _calc_pscore_given_policy_logit (instead of the previous _calc_pscore_given_action_list)
- In the previous implementation, _calc_pscore_given_action_list( calculates the pscore of given policy_logit_i_ and action_list, meaning that we had to iterate all the rounds and combinatorial actions to calculate the pscores.
- In fact, we can calculate the pscore of all possible slates at the same time, and this will make some functions, such as sample_action_and_obtain_pscore( and calc_ground_truth_policy_value(, much faster

before

Lines 324 to 347 in 6dc904c

    
               def _calc_pscore_given_action_list( 
        
                   self, action_list: List[int], policy_logit_i_: np.ndarray 
        
               ) -> float: 
        
                   """Calculate the propensity score given combinatorial set of actions. 
        
                   Parameters 
        
                   ------------ 
        
                   action_list: List[int], len=len_list 
        
                       List of combinatorial set of slate actions. 
        
                   policy_logit_i_: array-like, (n_unique_action, ) 
        
                       Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`. 
        
                   """ 
        
                   unique_action_set = np.arange(self.n_unique_action) 
        
                   pscore_ = 1.0 
        
                   for action in action_list: 
        
                       score_ = softmax(policy_logit_i_[:, unique_action_set])[0] 
        
                       action_index = np.where(unique_action_set == action)[0][0] 
        
                       pscore_ *= score_[action_index] 
        
                       unique_action_set = np.delete( 
        
                           unique_action_set, unique_action_set == action 
        
                       ) 
        
                   return pscore_

after

zr-obp/obp/dataset/synthetic_slate.py

Lines 324 to 361 in 1187a89

    
               def _calc_pscore_given_policy_logit( 
        
                   self, all_slate_actions: np.ndarray, policy_logit_i_: np.ndarray 
        
               ) -> np.ndarray: 
        
                   """Calculate the propensity score of each of the possible slate actions given policy_logit. 
        
                   Parameters 
        
                   ------------ 
        
                   all_slate_actions: array-like, (n_action, len_list) 
        
                       All possible slate actions. 
        
                   policy_logit_i_: array-like, (n_unique_action, ) 
        
                       Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`. 
        
                   Returns 
        
                   ------------ 
        
                   pscores: array-like, (n_action, ) 
        
                       Propensity scores of all the possible slate actions given policy_logit. 
        
                   """ 
        
                   n_actions = len(all_slate_actions) 
        
                   unique_action_set_2d = np.tile(np.arange(self.n_unique_action), (n_actions, 1)) 
        
                   pscores = np.ones(n_actions) 
        
                   for position_ in np.arange(self.len_list): 
        
                       action_index = np.where( 
        
                           unique_action_set_2d == all_slate_actions[:, position_][:, np.newaxis] 
        
                       )[1] 
        
                       pscores *= softmax(policy_logit_i_[unique_action_set_2d])[ 
        
                           np.arange(n_actions), action_index 
        
                       ] 
        
                       # delete actions 
        
                       if position_ + 1 != self.len_list: 
        
                           mask = np.ones((n_actions, self.n_unique_action - position_)) 
        
                           mask[np.arange(n_actions), action_index] = 0 
        
                           unique_action_set_2d = unique_action_set_2d[mask.astype(bool)].reshape( 
        
                               (-1, self.n_unique_action - position_ - 1) 
        
                           ) 
        
                   return pscores

usaito added 3 commits May 29, 2021 08:52

make calc_pscore faster

60c3152

fix bug

2624b91

fix bug

1187a89

usaito merged commit e9333d0 into master May 29, 2021

usaito deleted the feature/fast-calc-pscore-slate branch May 29, 2021 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: faster calc_pscore #101

feature: faster calc_pscore #101

Uh oh!

usaito commented May 28, 2021 •

edited

Loading

Uh oh!

Uh oh!

	def _calc_pscore_given_action_list(
	self, action_list: List[int], policy_logit_i_: np.ndarray
	) -> float:
	"""Calculate the propensity score given combinatorial set of actions.

	Parameters
	------------
	action_list: List[int], len=len_list
	List of combinatorial set of slate actions.

	policy_logit_i_: array-like, (n_unique_action, )
	Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`.

	"""
	unique_action_set = np.arange(self.n_unique_action)
	pscore_ = 1.0
	for action in action_list:
	score_ = softmax(policy_logit_i_[:, unique_action_set])[0]
	action_index = np.where(unique_action_set == action)[0][0]
	pscore_ *= score_[action_index]
	unique_action_set = np.delete(
	unique_action_set, unique_action_set == action
	)
	return pscore_

	def _calc_pscore_given_policy_logit(
	self, all_slate_actions: np.ndarray, policy_logit_i_: np.ndarray
	) -> np.ndarray:
	"""Calculate the propensity score of each of the possible slate actions given policy_logit.

	Parameters
	------------
	all_slate_actions: array-like, (n_action, len_list)
	All possible slate actions.

	policy_logit_i_: array-like, (n_unique_action, )
	Logit values given context (:math:`x`), i.e., :math:`\\f: \\mathcal{X} \\rightarrow \\mathbb{R}^{\\mathcal{A}}`.

	Returns
	------------
	pscores: array-like, (n_action, )
	Propensity scores of all the possible slate actions given policy_logit.

	"""
	n_actions = len(all_slate_actions)
	unique_action_set_2d = np.tile(np.arange(self.n_unique_action), (n_actions, 1))
	pscores = np.ones(n_actions)
	for position_ in np.arange(self.len_list):
	action_index = np.where(
	unique_action_set_2d == all_slate_actions[:, position_][:, np.newaxis]
	)[1]
	pscores *= softmax(policy_logit_i_[unique_action_set_2d])[
	np.arange(n_actions), action_index
	]
	# delete actions
	if position_ + 1 != self.len_list:
	mask = np.ones((n_actions, self.n_unique_action - position_))
	mask[np.arange(n_actions), action_index] = 0
	unique_action_set_2d = unique_action_set_2d[mask.astype(bool)].reshape(
	(-1, self.n_unique_action - position_ - 1)
	)

	return pscores

feature: faster calc_pscore #101

feature: faster calc_pscore #101

Uh oh!

Conversation

usaito commented May 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feature

before

after

Uh oh!

Uh oh!

usaito commented May 28, 2021 •

edited

Loading