You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>For both the gradient-boosting and random forest models, create a validation
745
745
curve using the training set to assess the impact of the number of trees on
746
-
the performance of each model. Evaluate the list of parameters <codeclass="docutils literal notranslate"><spanclass="pre">param_range</span><spanclass="pre">=</span><spanclass="pre">np.array([1,</span><spanclass="pre">2,</span><spanclass="pre">5,</span><spanclass="pre">10,</span><spanclass="pre">20,</span><spanclass="pre">50,</span><spanclass="pre">100])</span></code> and use the mean absolute error.</p>
746
+
the performance of each model. Evaluate the list of parameters <codeclass="docutils literal notranslate"><spanclass="pre">param_range</span><spanclass="pre">=</span><spanclass="pre">np.array([1,</span><spanclass="pre">2,</span><spanclass="pre">5,</span><spanclass="pre">10,</span><spanclass="pre">20,</span><spanclass="pre">50,</span><spanclass="pre">100,</span><spanclass="pre">200])</span></code> and score it using
747
+
<codeclass="docutils literal notranslate"><spanclass="pre">neg_mean_absolute_error</span></code>. Remember to set <codeclass="docutils literal notranslate"><spanclass="pre">negate_score=True</span></code> to recover the
748
+
right sign of the Mean Absolute Error.</p>
747
749
<divclass="cell docutils container">
748
750
<divclass="cell_input docutils container">
749
751
<divclass="highlight-ipython3 notranslate"><divclass="highlight"><pre><span></span><spanclass="c1"># Write your code here.</span>
<p>For both the gradient-boosting and random forest models, create a validation
751
751
curve using the training set to assess the impact of the number of trees on
752
-
the performance of each model. Evaluate the list of parameters <codeclass="docutils literal notranslate"><spanclass="pre">param_range</span><spanclass="pre">=</span><spanclass="pre">np.array([1,</span><spanclass="pre">2,</span><spanclass="pre">5,</span><spanclass="pre">10,</span><spanclass="pre">20,</span><spanclass="pre">50,</span><spanclass="pre">100])</span></code> and use the mean absolute error.</p>
752
+
the performance of each model. Evaluate the list of parameters <codeclass="docutils literal notranslate"><spanclass="pre">param_range</span><spanclass="pre">=</span><spanclass="pre">np.array([1,</span><spanclass="pre">2,</span><spanclass="pre">5,</span><spanclass="pre">10,</span><spanclass="pre">20,</span><spanclass="pre">50,</span><spanclass="pre">100,</span><spanclass="pre">200])</span></code> and score it using
753
+
<codeclass="docutils literal notranslate"><spanclass="pre">neg_mean_absolute_error</span></code>. Remember to set <codeclass="docutils literal notranslate"><spanclass="pre">negate_score=True</span></code> to recover the
<spanclass="n">xlabel</span><spanclass="o">=</span><spanclass="s2">"Number of trees in the gradient boosting model"</span><spanclass="p">,</span>
<p>We see that the number of trees used is far below 1000 with the current
814
844
dataset. Training the gradient boosting model with the entire 1000 trees would
815
845
have been detrimental.</p>
846
+
<p>Please note that one should not hyperparameter tune the number of estimators
847
+
for both random forest and gradient boosting models. In this exercise we only
848
+
show model performance with varying <codeclass="docutils literal notranslate"><spanclass="pre">n_estimators</span></code> for educational purposes.</p>
816
849
<p>Estimate the generalization performance of this model again using the
817
850
<codeclass="docutils literal notranslate"><spanclass="pre">sklearn.metrics.mean_absolute_error</span></code> metric but this time using the test set
818
851
that we held out at the beginning of the notebook. Compare the resulting value
<divclass="output stream highlight-myst-ansi notranslate"><divclass="highlight"><pre><span></span>On average, our GBDT regressor makes an error of 34.93 k$
864
+
<divclass="output stream highlight-myst-ansi notranslate"><divclass="highlight"><pre><span></span>On average, our GBDT regressor makes an error of 36.93 k$
0 commit comments