You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the dataset has outliers and is big enough to be subsampled, it can produce a probability matrix which has fewer columns than classes in the training data.
The number of columns in the probability matrix to match the number of classes in the training data.
(60000017, 19)
Or alternatively a way to tell for which column belongs to which class and for which classes no predictions have been made.
Actual behavior, stacktrace or logfile
(venv) root@486c0ae472af:/bench# python mwe.py
[WARNING] [2021-07-27 16:19:41,000:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Dataset too large for memory limit 10000MB, reducing the precision from float64 to <class 'numpy.float32'>
[WARNING] [2021-07-27 16:19:42,210:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Dataset too large for memory limit 10000MB, reducing number of samples from 60000017 to 13107200.
[WARNING] [2021-07-27 16:19:45,795:Client-AutoML(1):6d574018-eef6-11eb-9953-0242ac110004] Could not sample dataset in stratified manner, resorting to random sampling
Traceback (most recent call last):
File "/bench/frameworks/autosklearn/lib/auto-sklearn/autosklearn/automl.py", line 940, in subsample_if_too_large
stratify=y,
File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split
train, test = next(cv.split(X=arrays[0], y=stratify))
File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1387, in split
for train, test in self._iter_indices(X, y, groups):
File "/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices
raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
/bench/frameworks/autosklearn/venv/lib/python3.7/site-packages/smac/intensification/parallel_scheduling.py:152: UserWarning: SuccessiveHalving is intended to be used with more than 1 worker but num_workers=1
num_workers
(60000017, 5)
Environment and installation:
Please give details about your installation:
OS: Debian 10 in docker hosted by Windows 10
virtual environment
Python version: 3.7.11
Auto-sklearn version: development (11afae22b8c9a6309d2b6fcf7cfb9a947711cd1e)
The text was updated successfully, but these errors were encountered:
Just letting you know this is addressed in PR #1218 and your error log was very helpful in diagnosing it. It also sheds light on some other potential areas of concern regarding outliers,
Describe the bug
When the dataset has outliers and is big enough to be subsampled, it can produce a probability matrix which has fewer columns than classes in the training data.
To Reproduce
Alternatively much slower with the automl benchmark on KDDCup:
Expected behavior
The number of columns in the probability matrix to match the number of classes in the training data.
Or alternatively a way to tell for which column belongs to which class and for which classes no predictions have been made.
Actual behavior, stacktrace or logfile
Environment and installation:
Please give details about your installation:
11afae22b8c9a6309d2b6fcf7cfb9a947711cd1e
)The text was updated successfully, but these errors were encountered: