Description
An assignment workflow I'd like to be able to support:
- In
preprocess.arr
:include csv raw-table = load-table: ... source: csv-table-url("...") end processed-table = ... table-to-csv-file(processed-table, "processed.csv")
- In
analyze.arr
:include csv processed-table = load-table: ... source: csv-table-file("processed.csv") end ...
The motivation for this is to allow students to start from actual data (e.g., I have a 76MB property assessment database for Boston that I'd like to use), which works, but is slow (in our tests, took about 20-30 seconds to load on a pretty new laptop), too slow to be loaded and filtered on every run.
Alternately, rather than processed.csv
, it could be test-data.csv
, with the idea that they first identify a subset to use for testing, and use that for their development of analyze.arr
, and then at the end, once it works, replace test-data.csv
with the raw url.
Obviously, we could do the preprocess.arr
steps offline and just present them with processed.csv
but, given they know all the skills to implement it, and in the VSCode/Github setting, having multiple files is not a challenge, this minor change would allow it to be much more realistic (and easier for them to, e.g., adapt to other scenarios).
(I know this workflow could probably be even better supported by the jupiter-esque UI you have in development, but I think that's too much of a change for me, right now).