This is an overview of how to use Seta, Spore, and Forage to run a standalone data engineering program that scales from complete beginner to pretty challenging.
The basic goal: learners must get a dashboard showing, in as real-time as possible, some calculations and metadata about a bunch of high-throughput application databases.
The dashboard (Forage) contains several charts, which start simple, and get more complex in a variety of open-ended ways.
Each learner can have their own dashboard on Forage, so can have their progress tracked individually.
Learners are given three levels of challenge:
- To get Forage chart(s) displaying correct summary data no more than an hour old.
- The same – but no more than a minute old.
- The same – but up to the second.
Learners can tackle charts them in any order, tackle them partially, spend the whole time making one single chart work every second, or split the responsibilities for tackling them among a team.
It's expected that learners will have to:
- Do a data audit on the country databases, figuring out what all the columns mean (with the help of the resources), and how to handle the slightly messy data.
- Retro-engineer the data structures needed to make the Forage charts work, with the help of the specifications.
- Construct an appropriate analytical database structure to supply the data, and make it available via a single, poll-able endpoint.
- Figure out how to batch- or stream-process data from the application databases into their relevant analytical databases.
There are three main applications involved:
- Seta, which is a tool that sets up the application databases and deploys a configurable traffic simulator,
- Spore. You can use Spore to turn up the traffic speed, slow it down, empty databases, and so on.
- Forage is a dashboard system. Each learner has an account, and simply has to supply 6 URLs to the 6 charts on their dashboard. These URLs are polled every second, displaying data in an interesting way.
At the beginning of the module:
- Use Seta to set up application databases and a Spore.
- Set each learner up with an account on Forage. (Even if they're working in a team).
- Share the database URIs and Forage account details with the learners.
- Set the learners off with an informative kickoff making the goal clear, and helping them get started with the first graph.
- Seta: Set up a bunch of application databases with the right tables. ✔️
- Harvest: Play with the remote data. ✔️
- Spore: Start an application which pumps a configurable stream of data into the databases, simulating survey submissions from around the world. ✔️ And allows you to manage the contents of those databases. ✔️
- Forage: A dashboard with some charts that expect data in a certain format and will poll a given endpoint for it. ✔️