-
Notifications
You must be signed in to change notification settings - Fork 23
[MRG] Subsampling transformer #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
skada/transformers.py
Outdated
), | ||
) | ||
|
||
idx = self.rng_.choice(X.shape[0], self.n_subsample, replace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practical scenarios, do we need the sampling to be done per-domain? 🤔 It might be the case that the result of sampler omits domains with few(-er) entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case do we create a DomainStratifiedSampleTransformer or do we add a parameter? In any Case we should probably use sklearn or skada Splitters here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always for having separate classes if the use case is clear. Much easier to just open the list of things to spot what you're looking for! Having separate class DomainStratifiedSampleTransformer
sounds great.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #259 +/- ##
=======================================
Coverage 97.15% 97.16%
=======================================
Files 57 59 +2
Lines 6049 6135 +86
=======================================
+ Hits 5877 5961 +84
- Misses 172 174 +2 |
The main idea is to provide a Subsampler that can be done in a pipeline in order to speedup some methods that do not scale well with number of samples.