Skip to content

Commit 44933f8

Browse files
authored
Document PerseusItem parser code (#1877)
Issue: none ## Test plan: Read the README. It should make sense. Author: benchristel Reviewers: jeremywiebe, anakaren-rojas, nishasy Required Reviewers: Approved By: jeremywiebe Checks: ✅ Publish npm snapshot (ubuntu-latest, 20.x), ✅ Cypress (ubuntu-latest, 20.x), ✅ Check builds for changes in size (ubuntu-latest, 20.x), ✅ Lint, Typecheck, Format, and Test (ubuntu-latest, 20.x), ✅ Check for .changeset entries for all changed files (ubuntu-latest, 20.x), ✅ Publish Storybook to Chromatic (ubuntu-latest, 20.x), ✅ gerald Pull Request URL: #1877
1 parent 5003151 commit 44933f8

File tree

3 files changed

+52
-0
lines changed

3 files changed

+52
-0
lines changed

.changeset/modern-boats-wash.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@khanacademy/perseus": patch
3+
---
4+
5+
Internal: Add README.md for packages/perseus/src/util/parse-perseus-json
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Perseus JSON Parsers
2+
3+
The code in this directory takes raw Perseus JSON and parses it into a
4+
`PerseusItem` object. If the parse succeeds, the resulting object is guaranteed
5+
to conform to the `PerseusItem` TypeScript type.
6+
7+
The parser gracefully handles old data formats that don't conform to the TS
8+
types. It does this by defaulting missing fields and migrating ones that have
9+
been renamed or restructured.
10+
11+
## Regression testing against old data
12+
13+
The tests in the `regression-tests` directory ensure that the parsing code can
14+
handle old data formats. **Understand that if you change existing regression
15+
tests, you risk breaking compatibility with old data.** The regression tests
16+
were generated from a snapshot of Khan Academy content taken in November 2024.
17+
18+
## Exhaustive testing
19+
20+
You can run an exhaustive test of the parser (testing against every single
21+
content item) by following the steps documented in
22+
`exhaustive-test-tool/index.ts`. This test takes about 4 hours to run and
23+
requires downloading many gigabytes of data, so it does not run as part of our
24+
normal CI builds. Run this test only if you suspect that the parser has somehow
25+
drifted out of sync with the production data.
26+
27+
## Architecture
28+
29+
See [ADR #773] for context. [ADR #776] describes why we chose to write our own
30+
runtime typechecking code (in `general-purpose-parsers/`) rather than use
31+
a third-party library.
32+
33+
[ADR #773]: https://khanacademy.atlassian.net/wiki/spaces/ENG/pages/3318349891/ADR+773+Validate+widget+data+on+input+in+Perseus
34+
[ADR #776]: https://khanacademy.atlassian.net/wiki/spaces/ENG/pages/3328147539/ADR+776+Write+our+own+code+to+typecheck+Perseus+data+at+runtime
35+
36+
A good place to start reading this code is `parser-types.ts` and `result.ts`.
37+
Then you should skim the parsers in `general-purpose-parsers/` to get a sense
38+
of what's available. The Perseus-specific parsers are all in `perseus-parsers/`.
39+
The public API is in `index.ts`.

packages/perseus/src/util/parse-perseus-json/exhaustive-test-tool/index.ts

+8
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,14 @@
1212
// Then, run the test tool over the content like so:
1313
//
1414
// find ~/Desktop/content/*/* -type d | xargs -n1 packages/perseus/src/util/parse-perseus-json/exhaustive-test-tool/index.ts ~/Desktop/test-results
15+
//
16+
// Output will be written to ~/Desktop/test-results. The output format is:
17+
//
18+
// - one directory per unique parse error, named after the hash of the error
19+
// message. Each directory will contain:
20+
// - mismatch.txt: a description of the parse error
21+
// - item.json: the shortest assessmentItem (in number of JSON bytes) that
22+
// produced that parse error.
1523

1624
import {createHash} from "crypto";
1725
import * as fs from "fs/promises";

0 commit comments

Comments
 (0)