Skip to content

Make more clear to R-users what the value of this package is #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RMHogervorst opened this issue Nov 1, 2020 · 0 comments
Closed

Make more clear to R-users what the value of this package is #261

RMHogervorst opened this issue Nov 1, 2020 · 0 comments

Comments

@RMHogervorst
Copy link

RMHogervorst commented Nov 1, 2020

As a reviewer I'm looking at this package as if I'm a intermediate R user, a competent professional that is used to working with R. If I evaluate this package for my needs I completely miss why I would want to use this package. And that is a shame, because I think it has enormous potential!

To evaluate this package I go through the readme and what that tells me is:

  • this package gives me access to several datasets
  • to get that data I have to install python first, then some other software parts

That seems like a lot of work just to download files. When I was still a researcher, adding python to my computer too, seemed like a lot of work. (your steps are super nice and work flawlessly, it is not the installation that I think is a problem, but the communication about what this extra effort will deliver. Because the trade-off is very nice!)

Then there is the claim:

The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and loads them or stores them in variety of databases or flat file formats. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

Although succinctly described above, I think as a user I would miss some crucial parts and therefore not see the value of this package:

  • Many of the datasets are raw data that need to be processed according to specific steps
  • the datasets live all over the web in different repositories
  • some data is periodically updated
  • some data is spread over several files that need to be combined in a specific way
  • some data is so large it needs a different representation, like a database

I think I would be more hooked if you explicitly mention that these raw datasets are hard to work with, without domain knowledge, and that using this package I do not have to worry about that anymore.

Something like:

Rdataretriever gives you access to cleaned versions of over 200 commonly used datasets ecology and environmental sciences. Every dataset lives in different locations on the web and many a dataset have specific sets of cleaning rules that you have to execute in a specific order. The dataretriever package has all this functionality build in. All datasets are precisely described in an international standard and the data preparation logic is captured in python scripts. By using the dataretriever package everyone who uses these datasets has the same cleaned data, whether they work in python or R. With this R package you can run all those steps and retrieve the cleaned data in your R session, a flat file like csv or database. The Rdataretriever package allows you to focus on inference and visualisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant