Align hyperparameter tuning GBT models #835

fritshermans · 2025-05-18T19:29:38Z

In the section 'Hyperparameter tuning by randomized search', different hyperparameters are tuned for Histogram gradient-boosting decision trees than in section 'Hyperparameter tuning with ensemble models'. In the former section, l2_regularization and max_bins are tuned but not in the latter. In the latter section max_depth is tuned but not in the former section. My proposal would be to:

remove tuning of max_bins; this argument is only to set the granularity of optimal split finding in the trees so I don't think it affects the complexity of the model and the ability to generalize
add a line on how l2-regularisation works for GBT as it is not explained or remove it
add tuning of max_depth in the former section

Please let me know what you think of this. I would be happy to create a PR.

The text was updated successfully, but these errors were encountered:

ArturoAmorQ · 2025-06-02T14:54:07Z

Hi @fritshermans sorry for taking so long to answer!

I would say that I don't really see the problem on not having consistent hyperparameters between those 2 notebooks, otherwise we might end up being redundant. Instead, the Hyperparameter tuning by randomized search notebook presents some hyperparameters, but the focus is mostly on how one can pass distributions to RandomizedSearchCV rather than giving a detailed description of what they do; whereas in the Hyperparameter tuning with ensemble models notebook the emphasis is on interactions between learning_rate and both max_iter and max_leaf_nodes.

About your proposals:

max_bins is used and interpreted in the Analysis of hyperparameter search results, so we need it there
I do agree that the very shallow explanation "l2_regularization: it corresponds to the strength of the regularization" (in the randomized search notebook) is not very informative, maybe we can compare this hyperparameter with alpha in Ridge and 1/C in the LogisticRegression)
We don't really tune max_depth in either notebook, but maybe we can make it explicit that we demo the interactions using max_leaf_nodes only to keep the discussion simple, but encourage students to modify the code and experiment what would happen when using max_depth instead.

Then I would possibly modify the param_distributions in the Hyperparameter tuning with ensemble models notebook to try fixed values of learning_rate e.g. [0.01, 0.03, 0.1, 0.3], then add a parallel plot after the Caution message and before the interpretation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align hyperparameter tuning GBT models #835

Align hyperparameter tuning GBT models #835

fritshermans commented May 18, 2025

ArturoAmorQ commented Jun 2, 2025

Uh oh!

Align hyperparameter tuning GBT models #835

Align hyperparameter tuning GBT models #835

Comments

fritshermans commented May 18, 2025

ArturoAmorQ commented Jun 2, 2025

Uh oh!