Enabling numeric feature discretization

Dear `ydf` authors,

I'm running `ydf` version 0.8.0 (latest available at the moment) on Windows 11 and have trouble enabling the discretization of the numeric features in the local (non-distributed) training. 
"How to train a model faster" page [suggests](https://ydf.readthedocs.io/en/latest/guide_how_to_improve_learner/#approximated-splits) that the automatic discretization can be turned on for all features with `discretize_numerical_columns=True`. But when I use it as an argument for the `GradientBoostedTreesLearner` I get no changes in neither the training speed nor the model performance even if I set `num_discretized_numerical_bins=2`. All the features in the `ydf` logs are also said to be `NUMERICAL` and no `DISCRETIZED_NUMERICAL`.
"How to define model features" also [suggests](https://ydf.readthedocs.io/en/stable/guide_feature_semantics/#ydfsemanticdiscretized_numerical) that `ydf.Semantic.DISCRETIZED_NUMERICAL` can be used to force the discretization. However if I pass the feature name and sematntic tuple to the `features` option of the `GradientBoostedTreesLearner` I get the following error
```
ValueError: Cannot import column 'XXX' with semantic=Semantic.DISCRETIZED_NUMERICAL, type=numpy's array of 'float64' and content=array(XXX)
```

What is the correct way to turn on the on-the-fly discretization?

Also, the [`GradientBoostedTreeLearner`](https://ydf.readthedocs.io/en/latest/py_api/GradientBoostedTreesLearner/) API reference has mentions of some `columns` parameter, e.g. in `features` 
>"If include_all_columns=True, all the columns are imported as features and only the semantic of the columns NOT in `columns` is determined automatically
 
but this parameter is not described anywhere. Can you please explain what is this parameter and what is it used for?

Best regards,
Aleksandr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling numeric feature discretization #179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enabling numeric feature discretization #179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions