Skip to content

[REVIEW]: pyCeterisParibus: explaining Machine Learning models with Ceteris Paribus Profiles in Python #1389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
36 tasks done
whedon opened this issue Apr 16, 2019 · 54 comments
Closed
36 tasks done
Assignees
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review

Comments

@whedon
Copy link

whedon commented Apr 16, 2019

Submitting author: @kmichael08 (Michał Kuźba)
Repository: https://github.com/ModelOriented/pyCeterisParibus
Version: v0.5.2
Editor: @katyhuff
Reviewer: @janfreyberg, @JustinShenk
Archive: 10.5281/zenodo.2667756

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92"><img src="http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92/status.svg)](http://joss.theoj.org/papers/aad9a21c61c01adebe11bc5bc1ceca92)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@janfreyberg & @JustinShenk, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @katyhuff know.

Please try and complete your review in the next two weeks

Review checklist for @janfreyberg

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: v0.5.2
  • Authorship: Has the submitting author (@kmichael08) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @JustinShenk

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: v0.5.2
  • Authorship: Has the submitting author (@kmichael08) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon
Copy link
Author

whedon commented Apr 16, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @janfreyberg, it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

@whedon
Copy link
Author

whedon commented Apr 16, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Apr 16, 2019

@JustinShenk
Copy link

Overview: pyCeterisParibus is a library for explaining machine learning models with ceteris paribus profiles. These are useful for adding to visual story telling and supporting model interpretability. The idea is great, the implementation is clean, and I may use it in some of my projects. Some minor improvements are suggested below.

Installation: Installed without issues via pip and local copy of the source code.

Functionality: I was able to run the example on my Mac, but was not able to load the plot, due to an issue with how file paths are handled on Macs. I have opened a pull request at ModelOriented/pyCeterisParibus#24 fixing this issue on my machine. After this is accepted or otherwise addressed I will consider it completed.
I opened an issue (ModelOriented/pyCeterisParibus#23) regarding the scrollbars obscuring the data. This could be fixed by adding additional padding to the bottom of the frame.

Performance: No measure of performance is given, but the model loaded fast on the Titanic dataset.

Documentation: The explanation for how the model works could be improved. For example, in the paper the author's write "For this purpose, methods for sampling and selecting neighbouring observations are implemented along with the Gower's distance [@gower] function. A more detailed description might be found in the package documentation." I was not able to find description of Gower's distance in the linked to readthedocs. Adding details of how the model works would be helpful for people who are not familiar with Gower's distance or how it applies to machine learning models.

Software Paper: The software paper has a few minor typos or questionable stylistic choices for an academic paper:

  • "He died on [the] Titanic"
  • "1. [rather first or 1st] class"
  • BUT. Not really a typo, but stylistically questionable to have a period after "BUT". Would be more fitting of an academic journal without the word.
  • Local [I]nterpretable [M]odel-agnostic [E]xplanations (LIME)
  • Capitalization of "machine learning"
  • "(e.g.[,] a bank)"
  • "black-boxes" [no space needed]
  • "leads to the next industrial revolution" - unverified claim

Example Usage: The notebooks and example scripts without problems.

References: Every reference mentioned in the paper is documented as BibTex entries.

@kmichael08
Copy link

@JustinShenk Great thanks for all these valuable remarks! I merged your pull request. Also, I applied your comments referred to the paper and put the Gower's distance description in the documentation. I'll solve the scrollbar problem (ModelOriented/pyCeterisParibus#23) as fast as possible.

@kmichael08
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Apr 17, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Apr 17, 2019

@JustinShenk
Copy link

@kmichael08 Thanks for the quick response and edits.

@katyhuff My review (#1389 (comment)) is now complete.

@janfreyberg
Copy link

Installation

Installing the package in a fresh docker alpine image leads to dependencies being installed that I don't think need to be, for example sphinx, m2r, codecov, etc.

There are a few ways around this so I haven't made a PR but I think you can do the following:

Split requirements into actual requirements (what's needed to run the package), documentation requirements, and test requirements (e.g. pytest). You can add these as additional requirements using the extras_require key in setup.py, or simply install them from txt files wherever you need them.

Additionally, as far as I can tell tensorflow is never imported and so should be removed from the requirements.

I would even go so far as to say XGBoost and sklearn should not be in the requirements, even though you use it in the paper and documentation, becuase it's not essential to the functioning of the package. Instead, you could make a note that people should install them to run the examples.

Otherwise, installation works great.

Functionality / Performance

This all worked great for me.

Documentation

I think the docs can be improved:

  • the index page should be clearer; I would remove the automated sphinx content and add more of a "landing" page that contains an introduction
  • I would consider adding the jupyter notebooks to the docs using a tool like sphinx-jupyter

But that's just a recommendation.

Paper

The paper is very good. Only point: the R package CeterisParibus is not included in the references.

@katyhuff
Copy link
Member

katyhuff commented May 2, 2019

Thanks for the speedy reviews, @janfreyberg @JustinShenk . And, thanks for responding quickly to the suggestions @kmichael08 .

@kmichael08 , there are a few items in @janfreyberg's review that will need to be handled before we should move forward with acceptance:

  • Package installation instructions should be cleaned up to remove tensorflow if it's not used anywhere.
  • Though I'm not sure the best citation for it, I do agree with @janfreyberg that the paper would benefit from a citation to the CeterisParibus R package.

The rest of the comments from @janfreyberg would certainly clean things up, but aren't explicitly need for our JOSS requirements, so I'll just recommend that you consider the recommendation from @janfreyberg : "Split requirements into actual requirements (what's needed to run the package), documentation requirements, and test requirements (e.g. pytest). You can add these as additional requirements using the extras_require key in setup.py, or simply install them from txt files wherever you need them.... I would even go so far as to say XGBoost and sklearn should not be in the requirements, even though you use it in the paper and documentation, becuase it's not essential to the functioning of the package. Instead, you could make a note that people should install them to run the examples."

I have looked over the package and have found it installs pretty easily. Once you've seen this message @kmichael08 and implemented the two above changes, please ping me and we'll move on with next steps.

@kmichael08
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented May 3, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented May 3, 2019

@kmichael08
Copy link

Thanks a lot @janfreyberg and @katyhuff!

  • I added the citation to the R package
  • You're right about the requirements. I kept only essential dependencies in requirements.txt and moved others to requirements-dev.txt if that's ok. As for the tensorflow, it is required although not directly imported. There is a test using keras, and this needs some backend DL library (tensorflow here).

So, as far as this two sounds ok for you, we can move on. I'll definitely enhance the docs soon and use sphinx-jupyter. Thanks for that!

@katyhuff
Copy link
Member

katyhuff commented May 3, 2019

@whedon generate pdf

@whedon
Copy link
Author

whedon commented May 3, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented May 3, 2019

@katyhuff
Copy link
Member

katyhuff commented May 3, 2019

@whedon check references

@whedon
Copy link
Author

whedon commented May 3, 2019

Attempting to check references...

@whedon
Copy link
Author

whedon commented May 3, 2019


OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@katyhuff
Copy link
Member

katyhuff commented May 3, 2019

@kmichael08 I'm going through some of the final checks (first up, the bibliography):

@arfon
Copy link
Member

arfon commented May 3, 2019

Weird. If I change the bib file field to:

howpublished = {\url{https://www.openrightsgroup.org/blog/2018/machine-learning-and-the-right-to-explanation-in-gdpr}},

Then it seems to compile OK. Changing the flag to breaklinks=true doesn't seem to fix anything.

@kmichael08
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented May 3, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented May 3, 2019

@kmichael08
Copy link

@katyhuff I updated the version of the paper to the one you mentioned (it's ok) and added DOI. As for the url breaking I changed it into the workaround, that @arfon applied above. Let me know if that's ok

@kmichael08
Copy link

@katyhuff I updated repository to the v0.5.2 and archived it in Zenodo.
DOI: 10.5281/zenodo.2667756

@kyleniemeyer
Copy link

@whedon set 10.5281/zenodo.2667756 as archive

@whedon
Copy link
Author

whedon commented May 6, 2019

OK. 10.5281/zenodo.2667756 is the archive.

@kyleniemeyer
Copy link

@whedon accept

@whedon
Copy link
Author

whedon commented May 6, 2019

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented May 6, 2019

PDF failed to compile for issue #1389 with the following error:

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 13 0 13 0 0 158 0 --:--:-- --:--:-- --:--:-- 160
pandoc: 10.21105.joss.01389.crossref.xml: openFile: does not exist (No such file or directory)
Looks like we failed to compile the Crossref XML

@whedon
Copy link
Author

whedon commented May 6, 2019


OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.1214/aos/1013203451 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@arfon
Copy link
Member

arfon commented May 6, 2019

@whedon accept

@whedon
Copy link
Author

whedon commented May 6, 2019

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented May 6, 2019


OK DOIs

- 10.1145/2939672.2939778 is OK
- 10.1080/10618600.2014.907095 is OK
- 10.1214/aos/1013203451 is OK
- 10.5281/zenodo.1198885 is OK
- 10.2307/2528823 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented May 6, 2019

Check final proof 👉 openjournals/joss-papers#662

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#662, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@arfon
Copy link
Member

arfon commented May 6, 2019

@whedon accept deposit=true

@whedon whedon added the accepted label May 6, 2019
@whedon
Copy link
Author

whedon commented May 6, 2019

Doing it live! Attempting automated processing of paper acceptance...

@whedon
Copy link
Author

whedon commented May 6, 2019

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.01389 joss-papers#663
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01389
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? notify your editorial technical team...

@katyhuff
Copy link
Member

katyhuff commented May 6, 2019

@kyleniemeyer @arfon Thanks for jumping forward with the submission. That said, I didn't get a chance to execute the whedon set version command in time to beat you to that accept function!

Usually, that's part of my task list at this stage -- do we need to fix and re-accept? That is, the submission was v0.5, but, at my request, the author updated the version when creating the archive release, to reflect the version that includes joss-related changes. The new version, to be incorporated in the JOSS publication, is v0.5.2, so I would usually have run whedon set version before whedon accept. Can you confirm whether this is going to be an issue?

@arfon
Copy link
Member

arfon commented May 6, 2019

Usually, that's part of my task list at this stage -- do we need to fix and re-accept? That is, the submission was v0.5, but, at my request, the author updated the version when creating the archive release, to reflect the version that includes joss-related changes. The new version, to be incorporated in the JOSS publication, is v0.5.2, so I would usually have run whedon set version before whedon accept. Can you confirm whether this is going to be an issue?

Sorry my/our bad - looks like we got ahead of ourselves here. The version isn't actually captured in the paper so please go ahead and update that here.

@arfon
Copy link
Member

arfon commented May 6, 2019

Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01389

Also, please note, Crossref is still having some issues so this DOI doesn't resolve yet.

@katyhuff
Copy link
Member

katyhuff commented May 6, 2019

@whedon set v0.5.2 as version

@whedon
Copy link
Author

whedon commented May 6, 2019

OK. v0.5.2 is the version.

@katyhuff
Copy link
Member

katyhuff commented May 6, 2019

So (@arfon @kyleniemeyer ) do we just run accept again?

@arfon
Copy link
Member

arfon commented May 6, 2019

So (@arfon @kyleniemeyer ) do we just run accept again?

There's no need to because the version isn't captured anywhere other than here. The archive DOI is correct right? (This is linked to in the paper)

@katyhuff
Copy link
Member

katyhuff commented May 6, 2019

#fancy .

@kyleniemeyer
Copy link

@katyhuff sorry for rushing your process... just trying to help get this one wrapped up!

@arfon
Copy link
Member

arfon commented May 7, 2019

Also, please note, Crossref is still having some issues so this DOI doesn't resolve yet.

Just to let you know, the DOI is now resolving properly.

@kmichael08
Copy link

Wonderful! Thanks for everyone engaged in the process: reviews, editing and the super fast finish.

@labarba labarba closed this as completed May 9, 2019
@whedon
Copy link
Author

whedon commented May 9, 2019

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](http://joss.theoj.org/papers/10.21105/joss.01389/status.svg)](https://doi.org/10.21105/joss.01389)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01389">
  <img src="http://joss.theoj.org/papers/10.21105/joss.01389/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.01389/status.svg
   :target: https://doi.org/10.21105/joss.01389

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

@whedon whedon added published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. labels Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review
Projects
None yet
Development

No branches or pull requests

8 participants