Make more clear to R-users what the value of this package is #261

RMHogervorst · 2020-11-01T14:25:58Z

As a reviewer I'm looking at this package as if I'm a intermediate R user, a competent professional that is used to working with R. If I evaluate this package for my needs I completely miss why I would want to use this package. And that is a shame, because I think it has enormous potential!

To evaluate this package I go through the readme and what that tells me is:

this package gives me access to several datasets
to get that data I have to install python first, then some other software parts

That seems like a lot of work just to download files. When I was still a researcher, adding python to my computer too, seemed like a lot of work. (your steps are super nice and work flawlessly, it is not the installation that I think is a problem, but the communication about what this extra effort will deliver. Because the trade-off is very nice!)

Then there is the claim:

The Data Retriever automates the tasks of finding, downloading, and cleaning up publicly available data, and loads them or stores them in variety of databases or flat file formats. This lets data analysts spend less time cleaning up and managing data, and more time analyzing it.

Although succinctly described above, I think as a user I would miss some crucial parts and therefore not see the value of this package:

Many of the datasets are raw data that need to be processed according to specific steps
the datasets live all over the web in different repositories
some data is periodically updated
some data is spread over several files that need to be combined in a specific way
some data is so large it needs a different representation, like a database

I think I would be more hooked if you explicitly mention that these raw datasets are hard to work with, without domain knowledge, and that using this package I do not have to worry about that anymore.

Something like:

Rdataretriever gives you access to cleaned versions of over 200 commonly used datasets ecology and environmental sciences. Every dataset lives in different locations on the web and many a dataset have specific sets of cleaning rules that you have to execute in a specific order. The dataretriever package has all this functionality build in. All datasets are precisely described in an international standard and the data preparation logic is captured in python scripts. By using the dataretriever package everyone who uses these datasets has the same cleaned data, whether they work in python or R. With this R package you can run all those steps and retrieve the cleaned data in your R session, a flat file like csv or database. The Rdataretriever package allows you to focus on inference and visualisation.

The text was updated successfully, but these errors were encountered:

RMHogervorst mentioned this issue Nov 1, 2020

[REVIEW]: rdataretriever: An R package for downloading, cleaning, and installing publicly available datasets openjournals/joss-reviews#2800

Closed

40 tasks

ethanwhite closed this as completed in d2a5f29 Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make more clear to R-users what the value of this package is #261

Make more clear to R-users what the value of this package is #261

RMHogervorst commented Nov 1, 2020 •

edited

Loading

Make more clear to R-users what the value of this package is #261

Make more clear to R-users what the value of this package is #261

Comments

RMHogervorst commented Nov 1, 2020 • edited Loading

RMHogervorst commented Nov 1, 2020 •

edited

Loading