Skip to content

Reduce duplication of PDF viewer resources and tools across repositories #4397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
robertknight opened this issue Apr 7, 2022 · 0 comments

Comments

@robertknight
Copy link
Member

robertknight commented Apr 7, 2022

Hypothesis currently includes builds of the PDF.js library and viewer application in four separate repositories:

  1. client, for use in the dev-server
  2. via
  3. browser-extension
  4. pdf.js-hypothesis

In each of these repositories there are duplicated or near-duplicated resources:

  1. An update-pdfjs script that fetches the latest version of PDF.js, in the repo's directory for miscellaneous scripts (eg. tools/, scripts/ or bin/)
  2. A vendored copy of PDF.js and its viewer application with some resources stripped out and minor modifications to the viewer's HTML
  3. A pdfjs-init.js script that loads Hypothesis into the viewer once the viewer has fully initialized

The process of updating PDF.js involves creating a branch, running the update-pdfjs script (or make update-pdfjs in the case of Via), testing the changes and creating a PR with the results.

This status quo has some downsides:

  1. The process of updating the PDF viewer is different than updating other dependencies, so it is less obvious how to do it
  2. Changes to the PDF.s initialization script (pdfjs-init.js) and the tools that update PDF.js (update-pdfjs and the viewer HTML generator) have to be applied separately in each of the repositories, and there is a risk that this doesn't happen
  3. There isn't an obvious central place to document things related to how the Hypothesis-augmented PDF.js viewer works
  4. The update-pdfjs script currently fetches whatever the latest PDF.js build is from PDF.js' GitHub Pages site, which means that when it is run separately in each of the repos it can end up fetching slightly different versions, if changes are happening upstream

It would be worth looking into ways to reduce this duplication. One possibility is to make pdf.js-hypothesis the canonical repo for the Hypothesis-augmented PDF.js viewer and build an npm package from that repo containing a PDF.js build along with the pdfjs-init.js script. Other repositories could then consume the npm package as they do for other dependencies. A variation on this would be to create a new repository for a packaged PDF viewer, which is then consumed by all the other repositories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant