Skip to content

[PRE REVIEW]: Balsa: A Fast C++ Random Forest Classifier with Commandline and Python Interface #7599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
editorialbot opened this issue Dec 17, 2024 · 29 comments
Assignees
Labels
C++ CMake pre-review TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Dec 17, 2024

Submitting author: @tobiasborsdorff (Tobias Borsdorff)
Repository: https://github.com/SRON-Earth/Balsa
Branch with paper.md (empty if default branch):
Version: v1.0.0
Editor: @HaoZeke
Reviewers: Pending
Managing EiC: Chris Vernon

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f324b8495db8e2983e97cb9692817b48"><img src="https://joss.theoj.org/papers/f324b8495db8e2983e97cb9692817b48/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f324b8495db8e2983e97cb9692817b48/status.svg)](https://joss.theoj.org/papers/f324b8495db8e2983e97cb9692817b48)

Author instructions

Thanks for submitting your paper to JOSS @tobiasborsdorff. Currently, there isn't a JOSS editor assigned to your paper.

@tobiasborsdorff if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
@editorialbot editorialbot added pre-review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning labels Dec 17, 2024
@editorialbot
Copy link
Collaborator Author

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.03 s (1658.5 files/s, 282255.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C/C++ Header                    24            702           1133           2515
C++                             16            347            231           1854
Markdown                         2            343              0            725
TeX                              1             12              0             82
Bourne Shell                     1             13             15             70
YAML                             1              0              0             55
CMake                            3             20              7             45
-------------------------------------------------------------------------------
SUM:                            48           1437           1386           5346
-------------------------------------------------------------------------------

Commit count by author:

   136	Joris van Zwieten
   104	Denis de Leeuw Duarte
     5	Tobias Borsdorff

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.3390/rs16071208 is OK
- 10.5281/zenodo.14186320 is OK
- 10.5281/zenodo.14186406 is OK
- 10.1023/A:1010933404324 is OK
- 10.5194/amt-14-665-2021 is OK
- 10.5194/amt-16-1597-2023 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Balsa: A Fast C++ Random Forest Classifier
- No DOI given, and none found for title: Scikit-learn: Machine Learning in Python

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 861

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: BSD 3-Clause "New" or "Revised" License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

ADaPT-ML: A Data Programming Template for Machine Learning
Submitting author: @nulberry
Handling editor: @jmschrei (Active)
Reviewers: @aaronpeikert, @wincowgerDEV
Similarity score: 0.7037

ASCENDS: Advanced data SCiENce toolkit for Non-Data Scientists
Submitting author: @ornlpmcp
Handling editor: @terrytangyuan (Retired)
Reviewers: @zhampel, @jrbourbeau
Similarity score: 0.6984

AutoClassWrapper: a Python wrapper for AutoClass C classification
Submitting author: @pierrepo
Handling editor: @trallard (Retired)
Reviewers: @rpetit3, @lowandrew
Similarity score: 0.6955

rFBP: Replicated Focusing Belief Propagation algorithm
Submitting author: @Nico-Curti
Handling editor: @arokem (Retired)
Reviewers: @justusschock, @DanielLenz
Similarity score: 0.6913

CRATE: A Python package to perform fast material simulations
Submitting author: @BernardoFerreira
Handling editor: @Kevin-Mattheus-Moerman (Active)
Reviewers: @RahulSundar, @atzberg, @Extraweich, @kingyin3613
Similarity score: 0.6906

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@crvernon
Copy link

crvernon commented Jan 3, 2025

@editorialbot invite @HaoZeke as editor

👋 can you take this one on @HaoZeke?

@editorialbot
Copy link
Collaborator Author

Invitation to edit this submission sent!

@tobiasborsdorff
Copy link

@editorialbot invite @HaoZeke as editor

👋 can you take this one on @HaoZeke?

Dear @crvernon,

I hope this message finds you well. I’m writing to kindly follow up on the paper I submitted over a month ago. It seems that the editor assignment has not yet been initiated. Could you please let me know when this step might take place?

I truly appreciate your time and assistance and look forward to your response.

Best regards, Tobias Borsdorff

@crvernon
Copy link

👋 @tobiasborsdorff - we have a large backlog of submissions right now so it may take a little longer than usual to get you set up with an editor. Thanks for your patience!

@HaoZeke - are you able to take this one on?

@crvernon
Copy link

@HaoZeke just checking back in on this one.

@HaoZeke
Copy link
Member

HaoZeke commented Jan 29, 2025

@editorialbot assign @HaoZeke as editor

Thanks for the invite @crvernon

@editorialbot
Copy link
Collaborator Author

Assigned! @HaoZeke is now the editor

@HaoZeke
Copy link
Member

HaoZeke commented Feb 10, 2025

Hi @dostuffthatmatters 👋 would you be interested in and available to review this JOSS submission? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: joss.readthedocs.io/en/latest/review_criteria.html

@HaoZeke
Copy link
Member

HaoZeke commented Feb 10, 2025

Hi @cpellet 👋 would you be interested in and available to review this JOSS submission? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: joss.readthedocs.io/en/latest/review_criteria.html

@HaoZeke
Copy link
Member

HaoZeke commented Feb 10, 2025

Hi @bcjaeger 👋 would you be interested in and available to review this JOSS submission? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: joss.readthedocs.io/en/latest/review_criteria.html

@dostuffthatmatters
Copy link

Hi @HaoZeke,

Thank you for asking. I am busy with another review right now, so I have to kindly decline.

A general remark:

The main statement of need for this tool is about performance. But there is no performance evaluation against existing tools in the paper. The ScikitLearn implementations for decision trees and random forests are extremely efficient and there are many ML libraries like ScikitLearn with interfaces for different programming languages.

Maybe there is a good reason for the existence of Balsa, but I could not tell that from the given material. Good luck with the review!

@bcjaeger
Copy link

Hi @bcjaeger 👋 would you be interested in and available to review this JOSS submission? We carry out our checklist-driven reviews here in GitHub issues and follow these guidelines: joss.readthedocs.io/en/latest/review_criteria.html

Hello! 👋 This looks very interesting, but I don't have enough availability to review at the moment. I agree that pointing to a formal benchmark of computational efficiency in the article would be a great addition.

@tobiasborsdorff
Copy link

Dear @HaoZeke,

I hope you're doing well. I wanted to follow up regarding the review process for my manuscript, as it has not yet started. If possible, could you kindly check whether potential reviewers are available?

I appreciate your time and assistance.

Best regards,
Tobias Borsdorff

@HaoZeke
Copy link
Member

HaoZeke commented Apr 3, 2025

My apologies @tobiasborsdorff, however, have there been any updates addressing the comments of @dostuffthatmatters ?

@tobiasborsdorff
Copy link

tobiasborsdorff commented Apr 7, 2025

Dear @HaoZeke,

Thank you for your feedback. We have already conducted a performance analysis of the Balsa algorithm, which is documented in Section 6.1 of the ATBD https://zenodo.org/records/14186320 that is also referenced in our paper.

As illustrated by the figures in that chapter, Balsa demonstrates clear advantages in both memory usage and runtime compared to the scikit-learn implementation as well as ranger—a C++ implementation of the random forest algorithm.

We would be happy to include a brief summary of these results in the manuscript to enhance clarity. However, we kindly ask that this addition be addressed during the official review process. While @dostuffthatmatters raised a valuable point, it is important to note that the suggestion was made outside the formal review framework and the manuscript is still not under review.

Best regards, Tobias Borsdorff

@tobiasborsdorff
Copy link

Dear @HaoZeke and @dostuffthatmatters,

I hope this message finds you well. I wanted to inform you that I have updated the manuscript to include a full paragraph including figures discussing the runtime and memory usage of the Balsa implementation in comparison with the SKlearn - Python and Ranger C++ implementations. This addition highlights the performance advantages of Balsa, particularly in terms of memory efficiency and prediction speed making the software suitability for large-scale operations.

With this update, I believe the manuscript is now complete and ready for the next steps. As the manuscript has been awaiting review since December last year, I would greatly appreciate it if you could kindly restart the review process and initiate the search for reviewers at your earliest convenience.

Thank you for your time and consideration.

Best regards, Tobias

@HaoZeke
Copy link
Member

HaoZeke commented Apr 9, 2025

Thanks @tobiasborsdorff , if you have any suggested reviewers please let me know without the @ here.

@dostuffthatmatters
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@editorialbot
Copy link
Collaborator Author

Five most similar historical JOSS papers:

None
Submitting author: None
Handling editor: None (None)
Reviewers: None
Similarity score: 0.6816

None
Submitting author: None
Handling editor: None (None)
Reviewers: None
Similarity score: 0.6762

MetObs - a Python toolkit for using non-traditional meteorological observations
Submitting author: @vergauwenthomas
Handling editor: @hugoledoux (Active)
Reviewers: @ashwinvis, @Zeitsperre
Similarity score: 0.6761

quantile-forest: A Python Package for Quantile Regression Forests
Submitting author: @reidjohnson
Handling editor: @jbytecode (Active)
Reviewers: @jncraton, @oparisot
Similarity score: 0.6750

rFBP: Replicated Focusing Belief Propagation algorithm
Submitting author: @Nico-Curti
Handling editor: @arokem (Retired)
Reviewers: @justusschock, @DanielLenz
Similarity score: 0.6710

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

@dostuffthatmatters
Copy link

@tobiasborsdorff great, thanks - looks promising! My other review is still ongoing though. Best of luck!

@tobiasborsdorff
Copy link

Thanks @tobiasborsdorff , if you have any suggested reviewers please let me know without the @ here.

Dear @HaoZeke, I don’t know many people who review for JOSS, but a colleague of mine suggested these two: https://github.com/mkhorton and https://github.com/dgasmith. Maybe they’re a good fit! It’d be great if you could also reach out to a few people you think might be interested. regards Tobias

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ CMake pre-review TeX Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

6 participants