Skip to content

Align hyperparameter tuning GBT models #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fritshermans opened this issue May 18, 2025 · 1 comment
Open

Align hyperparameter tuning GBT models #835

fritshermans opened this issue May 18, 2025 · 1 comment

Comments

@fritshermans
Copy link
Contributor

In the section 'Hyperparameter tuning by randomized search', different hyperparameters are tuned for Histogram gradient-boosting decision trees than in section 'Hyperparameter tuning with ensemble models'. In the former section, l2_regularization and max_bins are tuned but not in the latter. In the latter section max_depth is tuned but not in the former section. My proposal would be to:

  • remove tuning of max_bins; this argument is only to set the granularity of optimal split finding in the trees so I don't think it affects the complexity of the model and the ability to generalize
  • add a line on how l2-regularisation works for GBT as it is not explained or remove it
  • add tuning of max_depth in the former section

Please let me know what you think of this. I would be happy to create a PR.

@ArturoAmorQ
Copy link
Collaborator

Hi @fritshermans sorry for taking so long to answer!

I would say that I don't really see the problem on not having consistent hyperparameters between those 2 notebooks, otherwise we might end up being redundant. Instead, the Hyperparameter tuning by randomized search notebook presents some hyperparameters, but the focus is mostly on how one can pass distributions to RandomizedSearchCV rather than giving a detailed description of what they do; whereas in the Hyperparameter tuning with ensemble models notebook the emphasis is on interactions between learning_rate and both max_iter and max_leaf_nodes.

About your proposals:

  • max_bins is used and interpreted in the Analysis of hyperparameter search results, so we need it there
  • I do agree that the very shallow explanation "l2_regularization: it corresponds to the strength of the regularization" (in the randomized search notebook) is not very informative, maybe we can compare this hyperparameter with alpha in Ridge and 1/C in the LogisticRegression)
  • We don't really tune max_depth in either notebook, but maybe we can make it explicit that we demo the interactions using max_leaf_nodes only to keep the discussion simple, but encourage students to modify the code and experiment what would happen when using max_depth instead.

Then I would possibly modify the param_distributions in the Hyperparameter tuning with ensemble models notebook to try fixed values of learning_rate e.g. [0.01, 0.03, 0.1, 0.3], then add a parallel plot after the Caution message and before the interpretation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants