Description
Question 1:
How can I use a method similar to pandas' groupby to group a dataset, calculate the maximum value for each group, and generate a pipeline that can create a PMML file using sklearn2pmml? I have a logic reference in the following code, but it does not execute correctly when generating the PMML file. I am currently investigating the cause and looking for alternative solutions. My guess is that Jpmml does not have a similar function, so it cannot be converted. Is my understanding correct?
Reference code for Question 1:
class MaxIncomeTransformer(BaseEstimator, TransformerMixin):
def __init__(self, groupby_column, target_column, output_columns=None):
self.groupby_column = groupby_column
self.target_column = target_column
self.output_columns = output_columns
def fit(self, X, y=None):
return self
def transform(self, X):
if not isinstance(X, pd.DataFrame):
X = pd.DataFrame(X)
# Find the index of maximum income for each group
idx = X.groupby(self.groupby_column)[self.target_column].idxmax()
result = X.loc[idx].reset_index(drop=True)
# if have output_columns, use it
if self.output_columns is not None:
result.columns = self.output_columns
return result[self.output_columns[1]]
#score1 Pipeline
fraud_final_cols = ['msisdn','score1']
mapper_final_fraud = DataFrameMapper([
(['msisdn', 'score1'],
[MaxIncomeTransformer(groupby_column='msisdn', target_column='score1',output_columns=['msisdn', 'score1'])],
{'alias':'score1'})
],input_df=True,df_out=True)
Question 2:
How can I use a function like random.uniform(0.1, 0.2) within ExpressionTransformer to randomly generate numbers in a specific range?
What I want to achieve is to add some perturbations or random values to the result, so that the result is evenly distributed in a certain interval.
Reference code for Question 2:
mapper_fea2 = DataFrameMapper([
(['score_1_wld', 'score_1_zljr', 'score_1_zlhqd'],
[ExpressionTransformer("random.uniform(0.1, 0.2) if X[0]==1 or X[1]==1 or X[2]==1 else 0")],
{'alias': 'score_1'}),
], input_df=True, df_out=True)