Skip to content

Submission: nurser(R) #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
11 of 27 tasks
merveshin opened this issue Mar 16, 2020 · 4 comments
Open
11 of 27 tasks

Submission: nurser(R) #21

merveshin opened this issue Mar 16, 2020 · 4 comments
Assignees

Comments

@merveshin
Copy link


name: Submit Software for Review
about: Use to submit your Python package for peer review
title: ''
labels: 1/editor-checks, New Submission!
assignees: ''


Submitting Author: Group 24 (@merveshin, @evhend, @elliott-ribner )
Package Name: nurser
One-Line Description of Package: An R package for streamlining the front end of the machine learning workflow.
Repository Link: https://github.com/UBC-MDS/nurser
Version submitted: v2.1.0
Editor: @kvarada
Reviewer 1: @evelynmoorhouse
Reviewer 2: @MrThomasPin
Archive: TBD
Version accepted: TBD


Description

  • nurser aims to streamline the front end of the machine learning pipeline by generating descriptive summary tables and figures, various feature imputation summaries, and automating preprocessing. Automated preprocessing detection has been implemented to minimize time and optimize the processing methods used. The functions in nurser were developed to provide useful and informative metrics that are applicable to a wide array of datasets.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences):

nurser automates the plotting process and the summary statistics while conducting Exploratory Data Analysis tasks. It will handle the NaN values and preprocess the data including one-hot encoding, scaling, and label encoding.

  • Who is the target audience and what are scientific applications of this package?

Any person who is interested in analyzing and preprocessing data before running machine learning models.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

There are other individual R packages that have some similar functions(summary, ggplot) but the functions contained in nurser combines those function in an elegant way to proceed much analysis easily.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you intend for this package to go on Bioconductor?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@MrThomasPin MrThomasPin self-assigned this Mar 17, 2020
@MrThomasPin
Copy link

MrThomasPin commented Mar 17, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2 hours

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • When I call ?eda, I would like it if you spelled out Exploratory Data Analysis (eda) in the description because someone who is less data science savvy might not know what it stands for. Furthermore, the first google search for eda is “Electricity Distributors Association”.

  • When I tried:

result <- eda(iris)
result 

The plot is blank because the last column in the iris dataset is character data type. A warning message would be helpful stating that it does not work with character data. Note if I call results$stat[[5]] it works fine.

  • nurser package loads rlang 0.4.4 and this is preventing me from loading tidyverse because it requires rlang >= 0.4.5. I had to restart my console load tidyverse first then load nurser

  • impute_summary() works very well. It does everything it promises it would do.

  • preproc() needs more in its description. In the READme, it states preproc will “preprocess features”. However, I was unsure what preprocessing meant. After digging into your function, I realized it was just normalizing the numerical columns. I would like to see a little more clarity added to the description.

  • Your package is passing all your tests when I call devtools::test(), your coverage is at 100%.

  • When I call devtools::check() it wants you to declare ‘magrittr’ in your vignettes.

@evelynmoorhouse
Copy link

evelynmoorhouse commented Mar 21, 2020

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

1 hour

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

  • The Eda help docs work well, but when I try to run the example in the help doc I don't get a rendered histogram. It might be good to have to output be a histogram rather than a variable saved as a histogram.

  • There are also 10 rows of red warnings every time I run the eda function from the example in docstring. I'm not sure what they imply, but since they are unclear it would be good to either have them give a useful warning or remove them. They are shown below.

image

  • In the help docs for eda() equal signs are used instead of arrows, and since this package is supposed to work with tidyverse in the R ecosystem the example should probably be changed to have arrows. This is shown below:
result <- eda(mtcars)
hist_mpg <- result$histograms[[1]]
stats_mpg <- result$stats$mpg
  • Overall impute_summary was very well done. The example ran well and all the tables generated were good.

  • The function example for preproc is redundant. Renaming result to processed_X is not necessary. If you wanted it could be changed to one line processed_X <- preproc(mtcars) and be done in one line.

  • The description in Preproc is also no useful and it is not clear exactly what the function does.

  • Also once again for the example in preproc arrows should be used instead of equal signs since it fits into the tidyverse ecosystem.

@evhend
Copy link

evhend commented Mar 26, 2020

Thank you for your feedback @evelynmoorhouse and in response to your comments:

Addressed

Note Response
eda warnings this has been addressed
eda correct syntax (<- instead of =) this has been addressed
prepoc general changes Function has been modified

PR incorporating changes:

New Release with changes: v3.0.0

Not Addressed

Note Response
eda histogram output will be addressed in future iterations
better function descriptions descriptions will be updated on an ongoing basis

@evhend
Copy link

evhend commented Mar 26, 2020

Thank you for your feedback @MrThomasPin and in response to your comments:

Addressed

Note Response
?eda clarification Taken into considerations, although anyone using the function would have read the readme and understands what an eda is
edablank character data output Function now works with character data
preproc function Function has been modified

PR incorporating changes:

New Release with changes: v3.0.0

Not Addressed

Note Response
rlang version to be addressed in future iterations
magrittr declaration this is just a warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants