Submission: pylaundry (Python)

Submitting Author: Name (@cgostic, @zanderhinton, @amank90, @arunmarria)  
Package Name: pylaundry
One-Line Description of Package: Perform standard preprocessing of a dataframe to be used in machine learning algorithms in a streamlined workflow.
Repository Link:  https://github.com/UBC-MDS/pylaundry
Version submitted:   1.0.8
Editor: @kvarada 
Reviewer 1: @SamEdwardes 
Reviewer 2: @robilizando 
Archive: TBD  
Version accepted: TBD   

---

## Description

The `pylaundry` package performs many standard preprocessing techniques for Pandas dataframes,  before use in statistical analysis and machine learning. The package functionality includes categorizing column types, handling missing data and imputation, transforming/standardizing columns and feature selection. The `pylaundry` package aims to remove much of the grunt work in the typical data science workflow, allowing the analyst maximum time and energy to devote to modelling!

## Scope 
- Please indicate which [category or categories](https://www.pyopensci.org/dev_guide/peer_review/aims_scope.html) this package falls under:
	- [ ] Data retrieval
	- [ ] Data extraction
	- [x] Data munging
	- [ ] Data deposition
	- [ ] Reproducibility
	- [ ] Geospatial
	- [ ] Education
	- [ ] Data visualization*

\* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see [this section](https://www.pyopensci.org/dev_guide/peer_review/aims_scope.html#notes-on-categories) of our guidebook.

- Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

All functions in this package take a Pandas DataFrame as an input. Two of the functions return transformed DataFrames (filled missing values, encoded and scaled), and the other two functions return information about the data gleaned from the DataFrame itself (column types, and most important features).

-   Who is the target audience and what are scientific applications of this package?  

Pylaundry is made for data scientists, or anyone applying statistical methods or machine learning algorithms to their data. It transforms a dataset into a format that is ready to be passed into a `.fit` method, with all NAs imputed, categorical columns encoded, numerical columns scaled, and important features identified.

-   Are there other Python packages that accomplish the same thing? If so, how does yours differ?

[sklearn.Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) offers similar functionality for the fill_missing and transform_columns functions, where similar functions can be wrapped in a Pipeline and carried out sequentially. However, creating a pipeline is a several step process depending on how many types of transformations need to be performed. pylaundry can accomplish numerical and categorical transformations in one step.

There are many feature selection packages and functions, for instance [sklearn.feature_selection](https://scikit-learn.org/stable/modules/feature_selection.html), which carry out similar functionality to our `feature_selector` function. Pylaundry simplifies feature selection by linear and logistic regression into a single function.

As far as we know, there are no similar packages for Categorizing Columns. `pylaundry` is the first package we are aware of to abstract away the full dataframe pre-processing workflow with a unified and simple API.

-   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or `@tag` the editor you contacted:

## Technical checks

For details about the pyOpenSci packaging requirements, see our [packaging guide](https://www.pyopensci.org/dev_guide/packaging/packaging_guide.html). Confirm each of the following by checking the box.  This package:

- [x] does not violate the Terms of Service of any service it interacts with. 
- [x] has an [OSI approved license](https://opensource.org/licenses)
- [x] contains a README with instructions for installing the development version. 
- [x] includes documentation with examples for all functions.
- [x] contains a vignette with examples of its essential functions and uses.
- [x] has a test suite.
- [x] has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

## Publication options

- [ ] Do you wish to automatically submit to the [Journal of Open Source Software](http://joss.theoj.org/)? If so:

<details>
 <summary>JOSS Checks</summary>  

- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. 
- [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`.
- [ ] The package is deposited in a long-term repository with the DOI: 

*Note: Do not submit your package separately to JOSS*
  
</details>

## Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

- [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

## Code of conduct

- [x] I agree to abide by [pyOpenSci's Code of Conduct](https://www.pyopensci.org/dev_guide/peer_review/coc.html) during the review process and in maintaining my package should it be accepted.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Submission: pylaundry (Python) #14

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Submission: pylaundry (Python) #14

Description

Description

Scope

Technical checks

Publication options

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

Code of conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions