-
Notifications
You must be signed in to change notification settings - Fork 339
MAINT: Broaden coverage of accessibility tests with Playwright #1378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For the execution libs, we don't have control over how jupyterlites is displayed so I expect a11y failure. |
Adding a note for visibility. @gabalafou will be working on this in the upcoming weeks. |
Working on this issue has raised fresh questions in my head about accessibility testing and this repo. The following are some of my thoughts about using Lighthouse versus Playwright. When I worked on #1260 - Accessibility test Kitchen Sink with Playwright, I went for Playwright because I was already familiar with using it for automated accessibility tests. It also has a Pytest plugin and a Python API, which I thought would make it more friendly for this repo. However, now that I am revisiting these themes, I wonder if I was too quick to abandon Lighthouse. In my mind the guiding questions should be:
I want to stress that Lighthouse's reporting is a big factor in my head. But I guess it depends on how we imagine the tests being used in the future. I don't think that our current Playwright + Axe accessibility test output is particularly friendly. Thinking towards the future, after we have fixed all of the current Axe-detectable accessibility issues, who is going to have to deal with future Axe failures and will they be inclined to do so from the CI system's outputs, or will they find the outputs overwhelming or too difficult to bother trying to understand them? Lighthouse has some very nice reporting features, for example it will take a screenshot of the part of the page that fails a particular accessibility test. Another potential Lighthouse advantage is that it bundles other kinds of testing that might be relevant to PyData Sphinx Theme—namely, performance and best practices. That would benefit the participants in this repo by reducing the number of different tools and technologies that need to be learned and maintained. Not to mention the Lighthouse ecosystem already has GitHub Actions that you can use, so you can more easily integrate it with GitHub (such as displaying on a PR if the PR broke the accessibility of a certain page), and you don't have to think about configuring the runner to make sure it has all the needed dependencies and appropriate caching and such (on the other hand, between Tania and myself, we already have a pretty good amount of knowledge on how to do this). Lighthouse also provides a server (LHCI Server), so if we could get some cloud credits, we could run a Lighthouse CI server in the cloud and collect test results over time. With history, you can check if performance or accessibility improved over time. More importantly, you can also easily discover if a change was made that decreased performance or accessibility. On the other hand, I don't think Lighthouse is very easy to customize. As far as I understand, when it comes to accessibility testing, all you can do is give Lighthouse a list of URLs and it will scan each of those URLs with the Axe-core library, and then collect and report the results. If we want more fine-grained control—for example, the ability to scan only parts of pages, or even just to configure which rules get run from the Axe-core library—then I think we have to roll our own. So it's all the standard trade-offs between choosing something off-the-shelf versus building your own solution. |
To summarize, here are the pros and cons of using Lighthouse, as I see it. Lighthouse pros:
Lighthouse cons:
|
Personally I don't mind viewing CI logs to get the relevant failure information --- it's what we have to do for ordinary pytest failures too. Also of note is that I already added a non-a11y test that uses playwright (to confirm that version switcher JSON data was correctly loading and getting styled properly). This is not to say that I'm strongly leaning one way or the other --- I can see the benefits of Lighthouse's out-of-the-box solution and reporting features. Would it be a nightmare to do both? Lighthouse for the "first pass" and custom Playwright tests for anything that we know that Lighthouse won't catch? |
Thanks @gabalafou, thoughts below
Yes, that is correct, but also there is #585 where there was already the proposal to use Playwright for testing (as there are several testing gaps), and more recently, Dan worked on adding some tests for the version dropdown.
Reporting does need to be improved - this is one of the items I have been meaning to work on, but the many spinning plates are tricky sometimes. Considering that the test primary users/consumers are the maintainers of this theme (and not end-users) and that we already have to check CI logs regularly, this would not be an annoyance. However, as I said before, the reporting format could be improved.
I have already shared some thoughts on Lighthouse reporting with @gabalafou separately. Still, while it does have some nice features like screenshotting, and its reports are visually nice, the way the accessibility reporting is done in terms of an absolute percentage is / can be grossly misleading. What does a 90 vs. an 80 score look like regarding issues found/severity? Because not all accessibility violations are weighted the same.
Here is where I see the same data granularity issue. Since the report is an absolute score (1-100%), how confident can we have that an 80% now vs. 80% in three months does not imply regressions on some already fixed accessibility issues but improvements in other areas? I find the current reports challenging to navigate, perhaps because we are consuming them through GitHub artifacts (more details below). I think here is where Playwright + axe core is much better suited, along with the customization and extensibility (write tests specific to the theme.), which goes in hand with the:
I would undoubtedly prefer writing more accessibility tests to include tab trapping, focus, and potentially other cases I am missing. The theme is also evolving, and new components might be added or layouts so having that extensibility and customization options is good from that point of view. We are early in our accessibility journey with the theme. While I cannot say for sure what type of tests we will need to add in the future, this, with the very limited options of Lighthouse customization, can be a significant blocker or limiting factor for us in the long run.
Yes, the rendered HTML is nice, but not very user-friendly or intuitive. This may be the current configuration we have. Still, since the only way to interact is by downloading the reports from the GitHub action artifacts, we end up with many HTML files (that, at first glance not easy to identify what each one is for - i.e., admonitions, images, etc.) This is how the files look: Is this customizable? or is this one of those things that would be too hard to do for little gain? Also, we'd need to know if we can run the accessibility tests on the dark theme because it seems to be only running on the light theme.
This may be something I am not too worried about; I spend a lot of time on automation and CI/CD spaces (and I am biased as a consequence). But if the only pro here is that there is a GitHub action that handles the setup and dependencies, I would not class this as a pro for Lighthouse but rather someone in the community took the time and care to create an action. Overall we can keep both - at least; we said initially we would do that anyway. And the concerns you've raised can be handled in one way or another. |
Thanks so much for this detailed feedback to questions. This allows me to move forward confidently with Playwright. One possibility this discussion sparked for me that I hadn't considered before is perhaps that I develop down both directions (Lighthouse AND Playwright) simultaneously for a while until we hit a point where a clear winner emerges and then we axe the other one. Is that something we think would be worthwhile? If not, I'm happy to just move forward with Playwright development. All of the feedback and considerations taken together make Playwright seem like a solid choice to move forward on. |
Could we resolve the "TBD" part of the PR description? Have we made a decision about whether those pages need to be added to the accessibility tests? If not, should we move that discussion into a separate PR? That way we can merge #1408 and allow it to close this issue, knowing exactly what it accomplished (one issue, one PR). |
#1378 (comment) addresses the execution libraries I think? Namely that we shouldn't test them here. IDK about ablog... I guess if we're customizing the styling of ablog elements then we should probably test to make sure our customizations aren't violating WCAG. @trallard does that make sense to you? |
Right now Ablog should inherit from our styling. I do not think we are doing any custom modification (I do not remember having to do any overwrites when doing the design system colour work) and a quick glance did not flag anything. So we should be safe skipping this too. On the library side of things. Let's focus on Playwright and keep the Lighthouse as is to avoid making changes on both and ending with tho kind of working solutions and instead focus on a robust single approach (with Lighthouse as a fallback on the side). |
closed by #1408 |
So if I understand correctly, we're leaving Lighthouse in place just in case we decide we need to backtrack, or is there some other reason? Should we create a task to eventually remove Lighthouse if all goes well down the Playwright route? |
I would say "in case something goes wrong with playwright AND lighthouse gets better"
Probably not? Lighthouse does some stuff that playwright will never do (the performance testing) so I don't think it will ever be 100% redundant. And anyway it shows up in our CI list on every PR so it will be hard to forget that it's there --> it should be easy enough to revisit the "do we need lighthouse" question at any point in the future. |
Right, of course. The performance testing. I guess I meant removing the accessibility portion of the Lighthouse tests, but there's not really a lot to do there, in terms of code. But I do think the docs are quite confusing in their current state, since they reference both Lighthouse and Playwright accessibility tests without explaining the redundancy. Besides that, then, the only downside I see is that it will make CI slower than it could be otherwise, since we'll be running accessibility tests in both Lighthouse and Playwright. |
That is something we could improve, for sure
I think they're concurrent, not serial? And if not, we could see about making them concurrent. |
The current set of accessibility tests with Playwright and axe-core only check for the Kitchen Sink pages.
We need to extend these to cover the following:
Edit: these are skipped
The text was updated successfully, but these errors were encountered: