Acroforms: Storing entered values for when the page is destroyed when it is not visible #11951

bpetty-formfast · 2020-06-01T17:21:32Z

Related: #7613
This issue is to discuss in detail the implementation of this feature: Storing entered values for when the page is destroyed when it is not visible

Overview:
Currently, hidden pages are destroyed in order to free up resources as new pages get are rendered. Any annotation information is lost when the page is cleared out. There are a number of other issues that this causes.

Events associated with the fields are also lost (for those adding their own acroscript-like routines)
The worker has to re-parse the annotations for a given page for the viewer to re-render, consuming CPU

Proposed implementation:
I am proposing that as the PDF loads, we generate all annotations layers for each page, if annotation support is enabled. These layers would never be destroyed. This has a number of advantages.

It achieves the goal of never losing any annotation data, or attached events.
While the initial load may be slightly slower on extremely large forms, rendering each page will be slightly faster as the annotation layer will already be there.
It simplifies triggering calculations on a form that requires form data from a different page.
This might allow Printing to work as intended.
This also allows one to easily pre-fill the form with external default values, saved state, etc.

While the dom would contain more elements, having all of the elements present should not cause an undue burden.

What doesn't work:
I was able to implement this differently, by creating an annotation value array. On "render" in annotation_layer_builder I was injecting stored values (also needed for form prefill). On "destroy" in pdf_page_viewer.js I was saving off field values back into the array. This ensured my form data was never lost.

There were a number of problems with this approach, chief among them was that if I Submit this form data, my array was not actually up to date as I was only saving back values when the page was destroyed... not as they were entered. Pages that were never rendered also would never have an entry in this structure (needed for processing all form fields). This also adds yet another additional delay to the system as it tries to process each page.

Changes:
This would require two changes:

That we never remove the annotation layer when the page is destroyed. The div would have to remain, and be hidden. I am not sure if keepAnnotations on reset by itself would do this or not.
We load annotation layers for all pages on load, instead of when the page is rendered.

I am working on this now. I am looking for feedback on this approach. This is my first time working with this library and I see little documentation for it. Any guidance would be helpful if you would like a clean pull request. I am also curious as to how much work would need to be done to get annotation value Printing working using this approach. I have not seen the code for it, but this could make it easier (and faster).

Snuffleupagus · 2020-06-01T19:07:59Z

I am proposing that as the PDF loads, we generate all annotations layers for each page, if annotation support is enabled.

Please note that if by that you mean actually loading all pages initially (i.e. by calling PDFDocumentProxy.getPage for every page in a document), that's seems basically like a non-starter since it'd would essentially render the ranged/streamed loading of data that the PDF.js library implements completely meaningless by always forcing the entire document to be loaded before rendering can start.

There's also the general question of how something like this would work in the default viewer, since in order to save resources (such as memory) only a handful of PDFPageView instances (by default 10) are currently active at any one time.

timvandermeij · 2020-06-01T21:18:37Z

Given that the form elements now all have an ID, I think storing any values on enter outside of the page views in a separate storage should work. That way we can still destroy pages, but when they are returned into view the annotation layer could be re-populated with the entered values. I don't really understand why not having entries in that storage would be a problem because if there is no entry we know that the user did not interact with the form element yet.

bpetty-formfast · 2020-06-02T03:58:22Z

@timvandermeij, the company I am working for is hoping to use PDFJS to capture form data. To do that it can't lose that data. We are also doing things that are out of scope for PDFJS, like being able to have acroscript functionality (which means hooking input elements) and being able to get all information stored in PDF fields.

That said, simply saving the state can be done with a few lines of code as detailed above. I have tested it and it works. The problem is that such a simple implementation makes it incredibly difficult to actually to use PDFJS. In order to get all of the form data, for example, I would somehow have to "flush" all of the page fields that are visible and about to be destroyed if using my technique above. Even then, if i wrote an event hook for a field, I would have to inject it every time the annotation is created, and could only access it with some SDK call (probably also requiring a flush) instead of simply getting the value by element name.

If loading all of the annotation layers is a non-starter, @Snuffleupagus, is there a better way of saving the page state other than waiting for it to be destroyed? If i wanted to get all fields in the form, would one modify getAnnotations to take into account the actual value stored in the annotation fields? What would that look like? We would also want a way to initialize these fields with data when viewing the PDF as well (defaults, saved state, etc).

bpetty-formfast · 2020-06-03T17:06:04Z

I have experimented with my original proposed technique, sans initializing all annotation layers.
By always setting keepAnnotations = true in pdf_page_view.js's reset function, all annotation layers are retained. This means that there is no ranged/streamed loading issue as mentioned by @Snuffleupagus, but the elements themselves act as a natural data store. The memory difference between keeping them as dom objects vs some javascript array is negligible.

However, I noticed something that could lead to a larger performance improvement. Right now when we call render(...) on the annotationLayer for the page, it uses the getAnnotations(...) call. This is fairly costly to make. Instead, if the annotation layer is already there it can skip that call and just call update(...) in annotation_layer.js. This function would have to be refactored so that instead of taking in an array of annotation objects it simply iterates through all dom annotations elements for that layer (which it is already kind of doing). All it is doing is a style transform change, I assume for resizing. This would undoubtedly improve performance since you don't have to reprocess that part of the pdf in the worker and bubble the results back up... just to change the styling on elements that are already there.

Would this variation be a non-starter as well?

Snuffleupagus · 2020-08-11T12:36:35Z

Given that the issue as filed, i.e. Acroforms: Storing entered values for when the page is destroyed when it is not visible, has been fixed by PR #12106 (see also Form fill) I believe that this can now be closed as fixed. /cc @timvandermeij

timvandermeij · 2020-08-11T20:40:17Z

Yes, indeed.

- basic print works now. Form data will be lost, possible help: mozilla/pdf.js#11951

bpetty-formfast changed the title ~~Storing entered values for when the page is destroyed when it is not visible~~ Acroforms: Storing entered values for when the page is destroyed when it is not visible Jun 1, 2020

timvandermeij added annotations form-acroform labels Jun 1, 2020

timvandermeij closed this as completed Aug 11, 2020

blizzardengle added a commit to caboodle-tech/pdf.js-mini-viewers that referenced this issue Jun 20, 2021

print working

b5efb1d

- basic print works now. Form data will be lost, possible help: mozilla/pdf.js#11951

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acroforms: Storing entered values for when the page is destroyed when it is not visible #11951

Acroforms: Storing entered values for when the page is destroyed when it is not visible #11951

bpetty-formfast commented Jun 1, 2020

Snuffleupagus commented Jun 1, 2020 •

edited

Loading

timvandermeij commented Jun 1, 2020

bpetty-formfast commented Jun 2, 2020 •

edited

Loading

bpetty-formfast commented Jun 3, 2020

Snuffleupagus commented Aug 11, 2020

timvandermeij commented Aug 11, 2020

Acroforms: Storing entered values for when the page is destroyed when it is not visible #11951

Acroforms: Storing entered values for when the page is destroyed when it is not visible #11951

Comments

bpetty-formfast commented Jun 1, 2020

Snuffleupagus commented Jun 1, 2020 • edited Loading

timvandermeij commented Jun 1, 2020

bpetty-formfast commented Jun 2, 2020 • edited Loading

bpetty-formfast commented Jun 3, 2020

Snuffleupagus commented Aug 11, 2020

timvandermeij commented Aug 11, 2020

Snuffleupagus commented Jun 1, 2020 •

edited

Loading

bpetty-formfast commented Jun 2, 2020 •

edited

Loading