Skip to content

Commit 7cef83d

Browse files
author
Github Actions
committed
Eddie Bergman: Dataset size reduction fixed, updated TargetValidator to match signatures (#1250)
1 parent 610e437 commit 7cef83d

File tree

86 files changed

+3596
-3278
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

86 files changed

+3596
-3278
lines changed

development/.buildinfo

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 19b39b196a4ce26d6f98b3eb2c061df5
3+
config: 8a26f7fbaa1576935d6b4916c5b79de9
44
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.
Loading
Loading
Loading
Loading
Loading

development/_modules/autosklearn/estimators.html

+87-67
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../index.html">Start</a></li>
6464
<li><a href="../../releases.html">Releases</a></li>
6565
<li><a href="../../installation.html">Installation</a></li>
66-
<li><a href="../../manual.html">Manual</a></li>
6766
<li><a href="../../examples/index.html">Examples</a></li>
6867
<li><a href="../../api.html">API</a></li>
6968
<li><a href="../../extending.html">Extending</a></li>
@@ -269,39 +268,58 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
269268
<span class="sd"> &#39;feature_preprocessor&#39;: [&quot;no_preprocessing&quot;]</span>
270269
<span class="sd"> }</span>
271270

272-
<span class="sd"> resampling_strategy : str | BaseCrossValidator | _RepeatedSplits | BaseShuffleSplit = &quot;holdout&quot;</span>
271+
<span class="sd"> resampling_strategy : Union[str, BaseCrossValidator, _RepeatedSplits, BaseShuffleSplit] = &quot;holdout&quot;</span>
273272
<span class="sd"> How to to handle overfitting, might need to use ``resampling_strategy_arguments``</span>
274273
<span class="sd"> if using ``&quot;cv&quot;`` based method or a Splitter object.</span>
275274

276-
<span class="sd"> * **Options**</span>
277-
<span class="sd"> * ``&quot;holdout&quot;`` - Use a 67:33 (train:test) split</span>
278-
<span class="sd"> * ``&quot;cv&quot;``: perform cross validation, requires &quot;folds&quot; in ``resampling_strategy_arguments``</span>
279-
<span class="sd"> * ``&quot;holdout-iterative-fit&quot;`` - Same as &quot;holdout&quot; but iterative fit where possible</span>
280-
<span class="sd"> * ``&quot;cv-iterative-fit&quot;``: Same as &quot;cv&quot; but iterative fit where possible</span>
281-
<span class="sd"> * ``&quot;partial-cv&quot;``: Same as &quot;cv&quot; but uses intensification.</span>
282-
<span class="sd"> * ``BaseCrossValidator`` - any BaseCrossValidator subclass (found in scikit-learn model_selection module)</span>
283-
<span class="sd"> * ``_RepeatedSplits`` - any _RepeatedSplits subclass (found in scikit-learn model_selection module)</span>
284-
<span class="sd"> * ``BaseShuffleSplit`` - any BaseShuffleSplit subclass (found in scikit-learn model_selection module)</span>
285-
286275
<span class="sd"> If using a Splitter object that relies on the dataset retaining it&#39;s current</span>
287276
<span class="sd"> size and order, you will need to look at the ``dataset_compression`` argument</span>
288277
<span class="sd"> and ensure that ``&quot;subsample&quot;`` is not included in the applied compression</span>
289278
<span class="sd"> ``&quot;methods&quot;`` or disable it entirely with ``False``.</span>
290279

291-
<span class="sd"> resampling_strategy_arguments : Optional[Dict]</span>
292-
<span class="sd"> Additional arguments for ``resampling_strategy``, this is required if</span>
293-
<span class="sd"> using a ``cv`` based strategy:</span>
294-
295-
<span class="sd"> .. code-block:: python</span>
296-
297-
<span class="sd"> {</span>
298-
<span class="sd"> &quot;train_size&quot;: 0.67, # The size of the training set</span>
299-
<span class="sd"> &quot;shuffle&quot;: True, # Whether to shuffle before splitting data</span>
300-
<span class="sd"> &quot;folds&quot;: 5 # Used in &#39;cv&#39; based resampling strategies</span>
301-
<span class="sd"> }</span>
302-
303-
<span class="sd"> If using a custom splitter class, which takes ``n_splits`` such as</span>
304-
<span class="sd"> `PredefinedSplit &lt;https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn-model-selection-kfold&gt;`_, the value of ``&quot;folds&quot;`` will be used.</span>
280+
<span class="sd"> **Options**</span>
281+
282+
<span class="sd"> * ``&quot;holdout&quot;``:</span>
283+
<span class="sd"> 67:33 (train:test) split</span>
284+
<span class="sd"> * ``&quot;holdout-iterative-fit&quot;``:</span>
285+
<span class="sd"> 67:33 (train:test) split, iterative fit where possible</span>
286+
<span class="sd"> * ``&quot;cv&quot;``:</span>
287+
<span class="sd"> crossvalidation,</span>
288+
<span class="sd"> requires ``&quot;folds&quot;`` in ``resampling_strategy_arguments``</span>
289+
<span class="sd"> * ``&quot;cv-iterative-fit&quot;``:</span>
290+
<span class="sd"> crossvalidation,</span>
291+
<span class="sd"> calls iterative fit where possible,</span>
292+
<span class="sd"> requires ``&quot;folds&quot;`` in ``resampling_strategy_arguments``</span>
293+
<span class="sd"> * &#39;partial-cv&#39;:</span>
294+
<span class="sd"> crossvalidation with intensification,</span>
295+
<span class="sd"> requires ``&quot;folds&quot;`` in ``resampling_strategy_arguments``</span>
296+
<span class="sd"> * ``BaseCrossValidator`` subclass:</span>
297+
<span class="sd"> any BaseCrossValidator subclass (found in scikit-learn model_selection module)</span>
298+
<span class="sd"> * ``_RepeatedSplits`` subclass:</span>
299+
<span class="sd"> any _RepeatedSplits subclass (found in scikit-learn model_selection module)</span>
300+
<span class="sd"> * ``BaseShuffleSplit`` subclass:</span>
301+
<span class="sd"> any BaseShuffleSplit subclass (found in scikit-learn model_selection module)</span>
302+
303+
<span class="sd"> resampling_strategy_arguments : dict, optional if &#39;holdout&#39; (train_size default=0.67)</span>
304+
<span class="sd"> Additional arguments for resampling_strategy:</span>
305+
306+
<span class="sd"> * ``train_size`` should be between 0.0 and 1.0 and represent the</span>
307+
<span class="sd"> proportion of the dataset to include in the train split.</span>
308+
<span class="sd"> * ``shuffle`` determines whether the data is shuffled prior to</span>
309+
<span class="sd"> splitting it into train and validation.</span>
310+
311+
<span class="sd"> Available arguments:</span>
312+
313+
<span class="sd"> * &#39;holdout&#39;: {&#39;train_size&#39;: float}</span>
314+
<span class="sd"> * &#39;holdout-iterative-fit&#39;: {&#39;train_size&#39;: float}</span>
315+
<span class="sd"> * &#39;cv&#39;: {&#39;folds&#39;: int}</span>
316+
<span class="sd"> * &#39;cv-iterative-fit&#39;: {&#39;folds&#39;: int}</span>
317+
<span class="sd"> * &#39;partial-cv&#39;: {&#39;folds&#39;: int, &#39;shuffle&#39;: bool}</span>
318+
<span class="sd"> * BaseCrossValidator or _RepeatedSplits or BaseShuffleSplit object: all arguments</span>
319+
<span class="sd"> required by chosen class as specified in scikit-learn documentation.</span>
320+
<span class="sd"> If arguments are not provided, scikit-learn defaults are used.</span>
321+
<span class="sd"> If no defaults are available, an exception is raised.</span>
322+
<span class="sd"> Refer to the &#39;n_splits&#39; argument as &#39;folds&#39;.</span>
305323

306324
<span class="sd"> tmp_folder : string, optional (None)</span>
307325
<span class="sd"> folder to store configuration output and log files, if ``None``</span>
@@ -313,12 +331,12 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
313331

314332
<span class="sd"> n_jobs : int, optional, experimental</span>
315333
<span class="sd"> The number of jobs to run in parallel for ``fit()``. ``-1`` means</span>
316-
<span class="sd"> using all processors.</span>
317-
318-
<span class="sd"> **Important notes**:</span>
319-
320-
<span class="sd"> * By default, Auto-sklearn uses one core.</span>
321-
<span class="sd"> * Ensemble building is not affected by ``n_jobs`` but can be controlled by the number</span>
334+
<span class="sd"> using all processors. </span>
335+
<span class="sd"> </span>
336+
<span class="sd"> **Important notes**: </span>
337+
<span class="sd"> </span>
338+
<span class="sd"> * By default, Auto-sklearn uses one core. </span>
339+
<span class="sd"> * Ensemble building is not affected by ``n_jobs`` but can be controlled by the number </span>
322340
<span class="sd"> of models in the ensemble.</span>
323341
<span class="sd"> * ``predict()`` is not affected by ``n_jobs`` (in contrast to most scikit-learn models)</span>
324342
<span class="sd"> * If ``dask_client`` is ``None``, a new dask client is created.</span>
@@ -382,14 +400,16 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
382400

383401
<span class="sd"> dataset_compression: Union[bool, Mapping[str, Any]] = True</span>
384402
<span class="sd"> We compress datasets so that they fit into some predefined amount of memory.</span>
385-
<span class="sd"> Currently this does not apply to dataframes or sparse arrays, only to raw</span>
386-
<span class="sd"> numpy arrays.</span>
403+
<span class="sd"> Currently this does not apply to dataframes or sparse arrays, only to raw numpy arrays.</span>
387404

388-
<span class="sd"> **NOTE** - If using a custom ``resampling_strategy`` that relies on specific</span>
405+
<span class="sd"> **NOTE**</span>
406+
407+
<span class="sd"> If using a custom ``resampling_strategy`` that relies on specific</span>
389408
<span class="sd"> size or ordering of data, this must be disabled to preserve these properties.</span>
390409

391-
<span class="sd"> You can disable this entirely by passing ``False`` or leave as the default</span>
392-
<span class="sd"> ``True`` for configuration below.</span>
410+
<span class="sd"> You can disable this entirely by passing ``False``.</span>
411+
412+
<span class="sd"> Default configuration when left as ``True``:</span>
393413

394414
<span class="sd"> .. code-block:: python</span>
395415

@@ -403,36 +423,36 @@ <h1>Source code for autosklearn.estimators</h1><div class="highlight"><pre>
403423

404424
<span class="sd"> The available options are described here:</span>
405425

406-
<span class="sd"> * **memory_allocation**</span>
407-
<span class="sd"> By default, we attempt to fit the dataset into ``0.1 * memory_limit``.</span>
408-
<span class="sd"> This float value can be set with ``&quot;memory_allocation&quot;: 0.1``.</span>
409-
<span class="sd"> We also allow for specifying absolute memory in MB, e.g. 10MB is</span>
410-
<span class="sd"> ``&quot;memory_allocation&quot;: 10``.</span>
411-
412-
<span class="sd"> The memory used by the dataset is checked after each reduction method is</span>
413-
<span class="sd"> performed. If the dataset fits into the allocated memory, any further</span>
414-
<span class="sd"> methods listed in ``&quot;methods&quot;`` will not be performed.</span>
415-
416-
<span class="sd"> For example, if ``methods: [&quot;precision&quot;, &quot;subsample&quot;]`` and the</span>
417-
<span class="sd"> ``&quot;precision&quot;`` reduction step was enough to make the dataset fit into</span>
418-
<span class="sd"> memory, then the ``&quot;subsample&quot;`` reduction step will not be performed.</span>
419-
420-
<span class="sd"> * **methods**</span>
421-
<span class="sd"> We provide the following methods for reducing the dataset size.</span>
422-
<span class="sd"> These can be provided in a list and are performed in the order as given.</span>
423-
424-
<span class="sd"> * ``&quot;precision&quot;`` - We reduce floating point precision as follows:</span>
425-
<span class="sd"> * ``np.float128 -&gt; np.float64``</span>
426-
<span class="sd"> * ``np.float96 -&gt; np.float64``</span>
427-
<span class="sd"> * ``np.float64 -&gt; np.float32``</span>
428-
429-
<span class="sd"> * ``subsample`` - We subsample data such that it **fits directly into</span>
430-
<span class="sd"> the memory allocation** ``memory_allocation * memory_limit``.</span>
431-
<span class="sd"> Therefore, this should likely be the last method listed in</span>
432-
<span class="sd"> ``&quot;methods&quot;``.</span>
433-
<span class="sd"> Subsampling takes into account classification labels and stratifies</span>
434-
<span class="sd"> accordingly. We guarantee that at least one occurrence of each</span>
435-
<span class="sd"> label is included in the sampled set.</span>
426+
<span class="sd"> **memory_allocation**</span>
427+
428+
<span class="sd"> By default, we attempt to fit the dataset into ``0.1 * memory_limit``. This</span>
429+
<span class="sd"> float value can be set with ``&quot;memory_allocation&quot;: 0.1``. We also allow for</span>
430+
<span class="sd"> specifying absolute memory in MB, e.g. 10MB is ``&quot;memory_allocation&quot;: 10``.</span>
431+
432+
<span class="sd"> The memory used by the dataset is checked after each reduction method is</span>
433+
<span class="sd"> performed. If the dataset fits into the allocated memory, any further methods</span>
434+
<span class="sd"> listed in ``&quot;methods&quot;`` will not be performed.</span>
435+
436+
<span class="sd"> For example, if ``methods: [&quot;precision&quot;, &quot;subsample&quot;]`` and the</span>
437+
<span class="sd"> ``&quot;precision&quot;`` reduction step was enough to make the dataset fit into memory,</span>
438+
<span class="sd"> then the ``&quot;subsample&quot;`` reduction step will not be performed.</span>
439+
440+
<span class="sd"> **methods**</span>
441+
442+
<span class="sd"> We currently provide the following methods for reducing the dataset size.</span>
443+
<span class="sd"> These can be provided in a list and are performed in the order as given.</span>
444+
445+
<span class="sd"> * ``&quot;precision&quot;`` - We reduce floating point precision as follows:</span>
446+
<span class="sd"> * ``np.float128 -&gt; np.float64``</span>
447+
<span class="sd"> * ``np.float96 -&gt; np.float64``</span>
448+
<span class="sd"> * ``np.float64 -&gt; np.float32``</span>
449+
450+
<span class="sd"> * ``subsample`` - We subsample data such that it **fits directly into the</span>
451+
<span class="sd"> memory allocation** ``memory_allocation * memory_limit``. Therefore, this</span>
452+
<span class="sd"> should likely be the last method listed in ``&quot;methods&quot;``.</span>
453+
<span class="sd"> Subsampling takes into account classification labels and stratifies</span>
454+
<span class="sd"> accordingly. We guarantee that at least one occurrence of each label is</span>
455+
<span class="sd"> included in the sampled set.</span>
436456

437457
<span class="sd"> Attributes</span>
438458
<span class="sd"> ----------</span>

development/_modules/autosklearn/experimental/askl2.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../../index.html">Start</a></li>
6464
<li><a href="../../../releases.html">Releases</a></li>
6565
<li><a href="../../../installation.html">Installation</a></li>
66-
<li><a href="../../../manual.html">Manual</a></li>
6766
<li><a href="../../../examples/index.html">Examples</a></li>
6867
<li><a href="../../../api.html">API</a></li>
6968
<li><a href="../../../extending.html">Extending</a></li>

development/_modules/autosklearn/metrics.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../index.html">Start</a></li>
6464
<li><a href="../../releases.html">Releases</a></li>
6565
<li><a href="../../installation.html">Installation</a></li>
66-
<li><a href="../../manual.html">Manual</a></li>
6766
<li><a href="../../examples/index.html">Examples</a></li>
6867
<li><a href="../../api.html">API</a></li>
6968
<li><a href="../../extending.html">Extending</a></li>

development/_modules/autosklearn/pipeline/components/base.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../../../index.html">Start</a></li>
6464
<li><a href="../../../../releases.html">Releases</a></li>
6565
<li><a href="../../../../installation.html">Installation</a></li>
66-
<li><a href="../../../../manual.html">Manual</a></li>
6766
<li><a href="../../../../examples/index.html">Examples</a></li>
6867
<li><a href="../../../../api.html">API</a></li>
6968
<li><a href="../../../../extending.html">Extending</a></li>

development/_modules/autosklearn/pipeline/components/classification.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../../../index.html">Start</a></li>
6464
<li><a href="../../../../releases.html">Releases</a></li>
6565
<li><a href="../../../../installation.html">Installation</a></li>
66-
<li><a href="../../../../manual.html">Manual</a></li>
6766
<li><a href="../../../../examples/index.html">Examples</a></li>
6867
<li><a href="../../../../api.html">API</a></li>
6968
<li><a href="../../../../extending.html">Extending</a></li>

development/_modules/autosklearn/pipeline/components/feature_preprocessing.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../../../index.html">Start</a></li>
6464
<li><a href="../../../../releases.html">Releases</a></li>
6565
<li><a href="../../../../installation.html">Installation</a></li>
66-
<li><a href="../../../../manual.html">Manual</a></li>
6766
<li><a href="../../../../examples/index.html">Examples</a></li>
6867
<li><a href="../../../../api.html">API</a></li>
6968
<li><a href="../../../../extending.html">Extending</a></li>

development/_modules/autosklearn/pipeline/components/regression.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../../../../index.html">Start</a></li>
6464
<li><a href="../../../../releases.html">Releases</a></li>
6565
<li><a href="../../../../installation.html">Installation</a></li>
66-
<li><a href="../../../../manual.html">Manual</a></li>
6766
<li><a href="../../../../examples/index.html">Examples</a></li>
6867
<li><a href="../../../../api.html">API</a></li>
6968
<li><a href="../../../../extending.html">Extending</a></li>

development/_modules/index.html

-1
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@
6363
<li><a href="../index.html">Start</a></li>
6464
<li><a href="../releases.html">Releases</a></li>
6565
<li><a href="../installation.html">Installation</a></li>
66-
<li><a href="../manual.html">Manual</a></li>
6766
<li><a href="../examples/index.html">Examples</a></li>
6867
<li><a href="../api.html">API</a></li>
6968
<li><a href="../extending.html">Extending</a></li>

0 commit comments

Comments
 (0)