Document PerseusItem parser code (#1877)

benchristel · web-flow · commit 44933f88e90c · 2024-11-19T08:48:13.000-08:00
Issue: none ## Test plan: Read the README. It should make sense. Author: benchristel Reviewers: jeremywiebe, anakaren-rojas, nishasy Required Reviewers: Approved By: jeremywiebe Checks: ✅ Publish npm snapshot (ubuntu-latest, 20.x), ✅ Cypress (ubuntu-latest, 20.x), ✅ Check builds for changes in size (ubuntu-latest, 20.x), ✅ Lint, Typecheck, Format, and Test (ubuntu-latest, 20.x), ✅ Check for .changeset entries for all changed files (ubuntu-latest, 20.x), ✅ Publish Storybook to Chromatic (ubuntu-latest, 20.x), ✅ gerald Pull Request URL: #1877
diff --git a/.changeset/modern-boats-wash.md b/.changeset/modern-boats-wash.md
@@ -0,0 +1,5 @@
+---
+"@khanacademy/perseus": patch
+---
+
+Internal: Add README.md for packages/perseus/src/util/parse-perseus-json
diff --git a/packages/perseus/src/util/parse-perseus-json/README.md b/packages/perseus/src/util/parse-perseus-json/README.md
@@ -0,0 +1,39 @@
+# Perseus JSON Parsers
+
+The code in this directory takes raw Perseus JSON and parses it into a
+`PerseusItem` object. If the parse succeeds, the resulting object is guaranteed
+to conform to the `PerseusItem` TypeScript type.
+
+The parser gracefully handles old data formats that don't conform to the TS
+types. It does this by defaulting missing fields and migrating ones that have
+been renamed or restructured.
+
+## Regression testing against old data
+
+The tests in the `regression-tests` directory ensure that the parsing code can
+handle old data formats. **Understand that if you change existing regression
+tests, you risk breaking compatibility with old data.** The regression tests
+were generated from a snapshot of Khan Academy content taken in November 2024.
+
+## Exhaustive testing
+
+You can run an exhaustive test of the parser (testing against every single
+content item) by following the steps documented in
+`exhaustive-test-tool/index.ts`. This test takes about 4 hours to run and
+requires downloading many gigabytes of data, so it does not run as part of our
+normal CI builds. Run this test only if you suspect that the parser has somehow
+drifted out of sync with the production data.
+
+## Architecture
+
+See [ADR #773] for context. [ADR #776] describes why we chose to write our own
+runtime typechecking code (in `general-purpose-parsers/`) rather than use
+a third-party library.
+
+[ADR #773]: https://khanacademy.atlassian.net/wiki/spaces/ENG/pages/3318349891/ADR+773+Validate+widget+data+on+input+in+Perseus
+[ADR #776]: https://khanacademy.atlassian.net/wiki/spaces/ENG/pages/3328147539/ADR+776+Write+our+own+code+to+typecheck+Perseus+data+at+runtime
+
+A good place to start reading this code is `parser-types.ts` and `result.ts`.
+Then you should skim the parsers in `general-purpose-parsers/` to get a sense
+of what's available. The Perseus-specific parsers are all in `perseus-parsers/`.
+The public API is in `index.ts`.
diff --git a/packages/perseus/src/util/parse-perseus-json/exhaustive-test-tool/index.ts b/packages/perseus/src/util/parse-perseus-json/exhaustive-test-tool/index.ts
@@ -12,6 +12,14 @@
 // Then, run the test tool over the content like so:
 //
 //     find ~/Desktop/content/*/* -type d | xargs -n1 packages/perseus/src/util/parse-perseus-json/exhaustive-test-tool/index.ts  ~/Desktop/test-results
+//
+// Output will be written to ~/Desktop/test-results. The output format is:
+//
+// - one directory per unique parse error, named after the hash of the error
+//   message. Each directory will contain:
+//     - mismatch.txt: a description of the parse error
+//     - item.json: the shortest assessmentItem (in number of JSON bytes) that
+//       produced that parse error.
 
 import {createHash} from "crypto";
 import * as fs from "fs/promises";