-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Dear ydf
authors,
I'm running ydf
version 0.8.0 (latest available at the moment) on Windows 11 and have trouble enabling the discretization of the numeric features in the local (non-distributed) training.
"How to train a model faster" page suggests that the automatic discretization can be turned on for all features with discretize_numerical_columns=True
. But when I use it as an argument for the GradientBoostedTreesLearner
I get no changes in neither the training speed nor the model performance even if I set num_discretized_numerical_bins=2
. All the features in the ydf
logs are also said to be NUMERICAL
and no DISCRETIZED_NUMERICAL
.
"How to define model features" also suggests that ydf.Semantic.DISCRETIZED_NUMERICAL
can be used to force the discretization. However if I pass the feature name and sematntic tuple to the features
option of the GradientBoostedTreesLearner
I get the following error
ValueError: Cannot import column 'XXX' with semantic=Semantic.DISCRETIZED_NUMERICAL, type=numpy's array of 'float64' and content=array(XXX)
What is the correct way to turn on the on-the-fly discretization?
Also, the GradientBoostedTreeLearner
API reference has mentions of some columns
parameter, e.g. in features
"If include_all_columns=True, all the columns are imported as features and only the semantic of the columns NOT in
columns
is determined automatically
but this parameter is not described anywhere. Can you please explain what is this parameter and what is it used for?
Best regards,
Aleksandr