Skip to content

Identify DAP version for more self-hosting sites #1323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sanason opened this issue Jan 28, 2025 · 2 comments
Open

Identify DAP version for more self-hosting sites #1323

sanason opened this issue Jan 28, 2025 · 2 comments
Assignees
Labels
data engineering stakeholders relates to customers or consumers of Site Scanning data

Comments

@sanason
Copy link

sanason commented Jan 28, 2025

Hi site scanning team, from the DAP team!

I've been looking at the site scanning data, trying to assess how promptly our DAP self-hosting customers upgrade to newly released versions of DAP. I'm having a problem with sites that report dap=true but don't report a version for DAP. I see that you already have several GitHub issues related to this problem, so I know I'm not telling you anything you don't know. I hope you don't mind that I created a new issue - I wasn't sure which existing issue to comment on.

I'm specifically interested in the problem of self-hosted DAP installations where the DAP file is not named Universal-Federated-Analytics-Min.js. Looking at the site scanning code, I can see how it would have trouble identifying the request for the DAP script in such a situation.

I have a couple of suggestions for different ways of identifying DAP and extracting its configuration:

Option 1: Search the page for a script tag with id="_fed_an_ua_tag".

If the site is following the DAP installation instructions from our wiki, this tag should be the one that loads the DAP JavaScript, whether or not the site is self-hosting. The id is functional, not just cosmetic - if the site doesn't include that id, then they're still loading DAP but it's not going to be configured correctly. FWIW, I spot-checked a few sites that are self-hosting DAP with a different file name and they all had the _fed_an_ua_tag id.

Option 2: Search the outgoing requests for calls to https://www.google-analytics.com/g/collect?tid=G-CSLL4ZEK4L.

At least one of these requests should have a bunch of additional query parameters that describe the configuration of DAP (version, agency, subagency, etc.). For example, look at the ep.* parameters in this URL:

https://www.google-analytics.com/g/collect?v=2&tid=G-CSLL4ZEK4L&gtm=45je51r0h2v9131934939z8813539606za200zb813539606&_p=1738081172573&gcd=13l3l3l3l1l1&npa=0&dma=0&tag_exp=102067808~102081485~102123608~102308675&cid=1654668632.1737647156&ul=en-us&sr=1512x982&uaa=arm&uab=64&uafvl=Not%2520A(Brand%3B8.0.0.0%7CChromium%3B132.0.6834.84%7CGoogle%2520Chrome%3B132.0.6834.84&uamb=0&uam=&uap=macOS&uapv=14.7.2&uaw=0&are=1&frm=0&pscdl=noapi&_s=3&dl=https%3A%2F%2Ftouchpoints.digital.gov%2F&dt=Touchpoints&sid=1738081168&sct=8&seg=1&en=dap_event&_c=1&ep.agency=GSA&ep.subagency=TTS&ep.site_topic=unspecified%3Atouchpoints.digital.gov&ep.site_platform=unspecified%3Atouchpoints.digital.gov&ep.script_source=https%3A%2F%2Fdap.digitalgov.gov%2Funiversal-federated-analytics-min.js&ep.version=20241218%20v8.5%20-%20ga4&ep.protocol=https%3A&ep.using_parallel_tracker=no&_et=43&tfd=1209

The pro of this approach is that it should work no matter how the DAP code is loaded.

The con is that it depends on Google Analytics not changing the format of this URL. I don't think GA considers the https://www.google-analytics.com/g/collect endpoint to be a public API, since it isn't documented anywhere, so I assume they'd feel free to change it.

@gbinal gbinal added this to the Sprint 198 (1/23-1/29) milestone Jan 28, 2025
@laurenancona
Copy link

@sanason Thank you so much, looking for the id="_fed_an_ua_tag" in particular is super helpful and we've added it to our queue, likely to land in the next week or two.

We'll ping you again when it gets pushed to prod. Cheers!

@laurenancona laurenancona added data engineering stakeholders relates to customers or consumers of Site Scanning data labels Jan 28, 2025
@gbinal gbinal assigned gbinal and luke-at-flexion and unassigned gbinal Jan 29, 2025
@gbinal
Copy link
Collaborator

gbinal commented Feb 4, 2025

We're working on this now (option 1 first) and will update you more when we have news.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data engineering stakeholders relates to customers or consumers of Site Scanning data
Projects
None yet
Development

No branches or pull requests

4 participants