Skip to content

report(third party): filter out third party urls #6351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
Apr 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
28cc931
core(jsdom): upgrade jsdom to support closest
wardpeet Oct 26, 2018
d99057e
remove lh extension
wardpeet Oct 26, 2018
9885704
fix eslint
wardpeet Oct 27, 2018
0c7d01b
remove jsdom from lighthouse-viewer
wardpeet Oct 27, 2018
9dd3047
add filter 3p logic to report
wardpeet Oct 21, 2018
bc9b131
Show 3p table
wardpeet Oct 26, 2018
7e23d46
fix eslint
wardpeet Oct 26, 2018
1dac41b
Merge branch 'master' into feat/report-3p
wardpeet Oct 31, 2018
52253b3
update 3rd party
wardpeet Oct 31, 2018
350bf5d
fix unit test
wardpeet Oct 31, 2018
787f649
fix eslint
wardpeet Oct 31, 2018
6ccbc88
fix typescript
wardpeet Nov 1, 2018
32f0832
fix eslint
wardpeet Nov 5, 2018
745b4cf
Merge branch 'master' into feat/report-3p
wardpeet Dec 18, 2018
be2627e
Merge branch 'master' into feat/report-3p
wardpeet Jan 21, 2019
60c1081
review changes
wardpeet Jan 21, 2019
7a17378
fix urlshim & reviews
wardpeet Jan 22, 2019
fa071b1
fix lint
wardpeet Jan 28, 2019
274bc47
fix type-check..
wardpeet Jan 29, 2019
7d4cb53
review changes
wardpeet Feb 9, 2019
be6d8fb
Update lighthouse-core/lib/url-shim.js
paulirish Apr 2, 2019
2149757
Apply suggestions from code review
paulirish Apr 2, 2019
e20dae9
move getrootdomain to util from report render
wardpeet Apr 2, 2019
bfb3b56
Enable UIstrings in report for third party label
wardpeet Apr 3, 2019
960139f
review changes
wardpeet Apr 3, 2019
78403fc
Merge branch 'master' into feat/report-3p
wardpeet Apr 3, 2019
f9605d4
fix tests
wardpeet Apr 3, 2019
4b117b1
fix eslint
wardpeet Apr 3, 2019
8b3b814
fix tsc
wardpeet Apr 4, 2019
3823b51
update golden lhr
wardpeet Apr 4, 2019
bf66efe
fix tests & lint
wardpeet Apr 4, 2019
1888642
AAAAAAhhhh please work
wardpeet Apr 4, 2019
9c018f8
fix proto test
wardpeet Apr 4, 2019
045eb9d
Merge branch 'master' into feat/report-3p
wardpeet Apr 15, 2019
9d17141
update description of label message
wardpeet Apr 16, 2019
f2d8fb5
add localhost test
wardpeet Apr 16, 2019
9a3f617
fix i18n lhr check
wardpeet Apr 16, 2019
a410185
Fix third paryt check for domains with port
wardpeet Apr 16, 2019
2380a04
fix eslint
wardpeet Apr 16, 2019
34f1790
Merge branch 'master' into feat/report-3p
connorjclark Apr 23, 2019
d8c7ba0
fix merge
connorjclark Apr 23, 2019
75c6ef5
URL cleanup
brendankenny Apr 23, 2019
5b8e948
comment
connorjclark Apr 25, 2019
87d5e14
comment
connorjclark Apr 25, 2019
b52ab91
Merge remote-tracking branch 'origin/master' into feat/report-3p
connorjclark Apr 25, 2019
3ff478e
empty commit to trigger CI
connorjclark Apr 27, 2019
341bd5d
add render test
connorjclark Apr 29, 2019
99b0b13
fix edge case
connorjclark Apr 29, 2019
c9a1b72
use dom.find
connorjclark Apr 29, 2019
2b90b66
empty commit to trigger CI
connorjclark Apr 29, 2019
a29cd61
Merge remote-tracking branch 'origin/master' into feat/report-3p
connorjclark Apr 29, 2019
a99422a
updates
brendankenny Apr 29, 2019
56e5551
strings
paulirish Apr 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 1 addition & 14 deletions lighthouse-core/audits/seo/canonical.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,19 +35,6 @@ const UIStrings = {

const str_ = i18n.createMessageInstanceIdFn(__filename, UIStrings);

/**
* Returns a primary domain for provided URL (e.g. http://www.example.com -> example.com).
* Note that it does not take second-level domains into account (.co.uk).
* @param {URL} url
* @returns {string}
*/
function getPrimaryDomain(url) {
return url.hostname
.split('.')
.slice(-2)
.join('.');
}

/**
* @typedef CanonicalURLData
* @property {Set<string>} uniqueCanonicalURLs
Expand Down Expand Up @@ -173,7 +160,7 @@ class Canonical extends Audit {

// bing and yahoo don't allow canonical URLs pointing to different domains, it's also
// a common mistake to publish a page with canonical pointing to e.g. a test domain or localhost
if (getPrimaryDomain(canonicalURL) !== getPrimaryDomain(baseURL)) {
if (!URL.rootDomainsMatch(canonicalURL, baseURL)) {
return {
score: 0,
explanation: str_(UIStrings.explanationDifferentDomain, {url: canonicalURL}),
Expand Down
4 changes: 4 additions & 0 deletions lighthouse-core/lib/i18n/en-US.json
Original file line number Diff line number Diff line change
Expand Up @@ -1343,6 +1343,10 @@
"message": "Expand snippet",
"description": "Label for button that shows all lines of the snippet when clicked"
},
"lighthouse-core/report/html/renderer/util.js | thirdPartyResourcesLabel": {
"message": "Show 3rd-party resources",
"description": "This label is for a checkbox above a table of items loaded by a web page. The checkbox is used to show or hide third-party (or \"3rd-party\") resources in the table, where \"third-party resources\" refers to items loaded by a web page from URLs that aren't controlled by the owner of the web page."
},
"lighthouse-core/report/html/renderer/util.js | toplevelWarningsMessage": {
"message": "There were issues affecting this run of Lighthouse:",
"description": "Label shown preceding any important warnings that may have invalidated the entire report. For example, if the user has Chrome extensions installed, they may add enough performance overhead that Lighthouse's performance metrics are unreliable. If shown, this will be displayed at the top of the report UI."
Expand Down
47 changes: 7 additions & 40 deletions lighthouse-core/lib/url-shim.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,10 @@
* URL shim so we keep our code DRY
*/

/* global self */
/* global URL */

const Util = require('../report/html/renderer/util.js');

// Type cast so tsc sees window.URL and require('url').URL as sufficiently equivalent.
const URL = /** @type {!Window["URL"]} */ (typeof self !== 'undefined' && self.URL) ||
require('url').URL;

// 25 most used tld plus one domains (aka public suffixes) from http archive.
// @see https://github.com/GoogleChrome/lighthouse/pull/5065#discussion_r191926212
// The canonical list is https://publicsuffix.org/learn/ but we're only using subset to conserve bytes
const listOfTlds = [
'com', 'co', 'gov', 'edu', 'ac', 'org', 'go', 'gob', 'or', 'net', 'in', 'ne', 'nic', 'gouv',
'web', 'spb', 'blog', 'jus', 'kiev', 'mil', 'wi', 'qc', 'ca', 'bel', 'on',
];

const allowedProtocols = [
'https:', 'http:', 'chrome:', 'chrome-extension:',
];
Expand Down Expand Up @@ -99,34 +87,18 @@ class URLShim extends URL {
}
}

/**
* Gets the tld of a domain
*
* @param {string} hostname
* @return {string} tld
*/
static getTld(hostname) {
const tlds = hostname.split('.').slice(-2);

if (!listOfTlds.includes(tlds[0])) {
return `.${tlds[tlds.length - 1]}`;
}

return `.${tlds.join('.')}`;
}

/**
* Check if rootDomains matches
*
* @param {string} urlA
* @param {string} urlB
* @param {string|URL} urlA
* @param {string|URL} urlB
*/
static rootDomainsMatch(urlA, urlB) {
let urlAInfo;
let urlBInfo;
try {
urlAInfo = new URL(urlA);
urlBInfo = new URL(urlB);
urlAInfo = Util.createOrReturnURL(urlA);
urlBInfo = Util.createOrReturnURL(urlB);
} catch (err) {
return false;
}
Expand All @@ -135,14 +107,9 @@ class URLShim extends URL {
return false;
}

const tldA = URLShim.getTld(urlAInfo.hostname);
const tldB = URLShim.getTld(urlBInfo.hostname);

// get the string before the tld
const urlARootDomain = urlAInfo.hostname.replace(new RegExp(`${tldA}$`), '')
.split('.').splice(-1)[0];
const urlBRootDomain = urlBInfo.hostname.replace(new RegExp(`${tldB}$`), '')
.split('.').splice(-1)[0];
const urlARootDomain = Util.getRootDomain(urlAInfo);
const urlBRootDomain = Util.getRootDomain(urlBInfo);

return urlARootDomain === urlBRootDomain;
}
Expand Down
6 changes: 5 additions & 1 deletion lighthouse-core/report/html/renderer/details-renderer.js
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,11 @@ class DetailsRenderer {
element.appendChild(hostElem);
}

if (title) element.title = url;
if (title) {
element.title = url;
// set the url on the element's dataset which we use to check 3rd party origins
element.dataset.url = url;
}
return element;
}

Expand Down
99 changes: 98 additions & 1 deletion lighthouse-core/report/html/renderer/report-ui-features.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,25 @@
*/
'use strict';

/* eslint-env browser */

/**
* @fileoverview Adds export button, print, and other dynamic functionality to
* the report.
*/

/* globals self URL Blob CustomEvent getFilenamePrefix window */
/* globals getFilenamePrefix Util */

/** @typedef {import('./dom.js')} DOM */

/**
* @param {HTMLTableElement} tableEl
* @return {Array<HTMLTableRowElement>}
*/
function getTableRows(tableEl) {
return Array.from(tableEl.tBodies[0].rows);
}

class ReportUIFeatures {
/**
* @param {DOM} dom
Expand Down Expand Up @@ -73,6 +83,7 @@ class ReportUIFeatures {
this.json = report;
this._setupMediaQueryListeners();
this._setupExportButton();
this._setupThirdPartyFilter();
this._setupStickyHeaderElements();
this._setUpCollapseDetailsAfterPrinting();
this._resetUIState();
Expand Down Expand Up @@ -119,6 +130,92 @@ class ReportUIFeatures {
dropdown.addEventListener('click', this.onExport);
}

_setupThirdPartyFilter() {
// Some audits should not display the third party filter option.
const thirdPartyFilterAuditExclusions = [
// This audit deals explicitly with third party resources.
'uses-rel-preconnect',
];

// Get all tables with a text url column.
/** @type {Array<HTMLTableElement>} */
const tables = Array.from(this._document.querySelectorAll('.lh-table'));
const tablesWithUrls = tables
.filter(el => el.querySelector('td.lh-table-column--url'))
.filter(el => {
const containingAudit = el.closest('.lh-audit');
if (!containingAudit) throw new Error('.lh-table not within audit');
return !thirdPartyFilterAuditExclusions.includes(containingAudit.id);
});

tablesWithUrls.forEach((tableEl, index) => {
const thirdPartyRows = this._getThirdPartyRows(tableEl, this.json.finalUrl);
// No 3rd parties, no checkbox!
if (!thirdPartyRows.size) return;

// create input box
const filterTemplate = this._dom.cloneTemplate('#tmpl-lh-3p-filter', this._document);
const filterInput = this._dom.find('input', filterTemplate);
const id = `lh-3p-filter-label--${index}`;

filterInput.id = id;
filterInput.addEventListener('change', e => {
// Remove rows from the dom and keep track of them to readd on uncheck.
// Why removing instead of hiding? To keep nth-child(even) background-colors working.
if (e.target instanceof HTMLInputElement && !e.target.checked) {
for (const row of thirdPartyRows.values()) {
row.remove();
}
} else {
// Add row elements back to original positions.
for (const [position, row] of thirdPartyRows.entries()) {
const childrenArr = getTableRows(tableEl);
tableEl.tBodies[0].insertBefore(row, childrenArr[position]);
}
}
});

this._dom.find('label', filterTemplate).setAttribute('for', id);
this._dom.find('.lh-3p-filter-count', filterTemplate).textContent =
`${thirdPartyRows.size}`;
this._dom.find('.lh-3p-ui-string', filterTemplate).textContent =
Util.UIStrings.thirdPartyResourcesLabel;

// Finally, add checkbox to the DOM.
if (!tableEl.parentNode) return; // Keep tsc happy.
tableEl.parentNode.insertBefore(filterTemplate, tableEl);
});
}

/**
* From a table with URL entries, finds the rows containing third-party URLs
* and returns a Map of those rows, mapping from row index to row Element.
* @param {HTMLTableElement} el
* @param {string} finalUrl
* @return {Map<number, HTMLTableRowElement>}
*/
_getThirdPartyRows(el, finalUrl) {
const urlItems = this._dom.findAll('.lh-text__url', el);
const finalUrlRootDomain = Util.getRootDomain(finalUrl);

/** @type {Map<number, HTMLTableRowElement>} */
const thirdPartyRows = new Map();
for (const urlItem of urlItems) {
const datasetUrl = urlItem.dataset.url;
if (!datasetUrl) continue;
const isThirdParty = Util.getRootDomain(datasetUrl) !== finalUrlRootDomain;
if (!isThirdParty) continue;

const urlRowEl = urlItem.closest('tr');
if (urlRowEl) {
const rowPosition = getTableRows(el).indexOf(urlRowEl);
thirdPartyRows.set(rowPosition, urlRowEl);
}
}

return thirdPartyRows;
}

_setupStickyHeaderElements() {
this.topbarEl = this._dom.find('.lh-topbar', this._document);
this.scoreScaleEl = this._dom.find('.lh-scorescale', this._document);
Expand Down
58 changes: 57 additions & 1 deletion lighthouse-core/report/html/renderer/util.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
*/
'use strict';

/* globals self URL */
/* globals self, URL */

const ELLIPSIS = '\u2026';
const NBSP = '\xa0';
Expand All @@ -29,6 +29,14 @@ const RATINGS = {
ERROR: {label: 'error'},
};

// 25 most used tld plus one domains (aka public suffixes) from http archive.
// @see https://github.com/GoogleChrome/lighthouse/pull/5065#discussion_r191926212
// The canonical list is https://publicsuffix.org/learn/ but we're only using subset to conserve bytes
const listOfTlds = [
'com', 'co', 'gov', 'edu', 'ac', 'org', 'go', 'gob', 'or', 'net', 'in', 'ne', 'nic', 'gouv',
'web', 'spb', 'blog', 'jus', 'kiev', 'mil', 'wi', 'qc', 'ca', 'bel', 'on',
];

class Util {
static get PASS_THRESHOLD() {
return PASS_THRESHOLD;
Expand Down Expand Up @@ -336,6 +344,51 @@ class Util {
};
}

/**
* @param {string|URL} value
* @return {URL}
*/
static createOrReturnURL(value) {
if (value instanceof URL) {
return value;
}

return new URL(value);
}

/**
* Gets the tld of a domain
*
* @param {string} hostname
* @return {string} tld
*/
static getTld(hostname) {
const tlds = hostname.split('.').slice(-2);

if (!listOfTlds.includes(tlds[0])) {
return `.${tlds[tlds.length - 1]}`;
}

return `.${tlds.join('.')}`;
}

/**
* Returns a primary domain for provided hostname (e.g. www.example.com -> example.com).
* @param {string|URL} url hostname or URL object
* @returns {string}
*/
static getRootDomain(url) {
const hostname = Util.createOrReturnURL(url).hostname;
const tld = Util.getTld(hostname);

// tld is .com or .co.uk which means we means that length is 1 to big
// .com => 2 & .co.uk => 3
const splitTld = tld.split('.');

// get TLD + root domain
return hostname.split('.').slice(-splitTld.length).join('.');
}

/**
* @param {LH.Config.Settings} settings
* @return {Array<{name: string, description: string}>}
Expand Down Expand Up @@ -524,6 +577,9 @@ Util.UIStrings = {
lsPerformanceCategoryDescription: '[Lighthouse](https://developers.google.com/web/tools/lighthouse/) analysis of the current page on an emulated mobile network. Values are estimated and may vary.',
/** Title of the lab data section of the Performance category. Within this section are various speed metrics which quantify the pageload performance into values presented in seconds and milliseconds. "Lab" is an abbreviated form of "laboratory", and refers to the fact that the data is from a controlled test of a website, not measurements from real users visiting that site. */
labDataTitle: 'Lab Data',

/** This label is for a checkbox above a table of items loaded by a web page. The checkbox is used to show or hide third-party (or "3rd-party") resources in the table, where "third-party resources" refers to items loaded by a web page from URLs that aren't controlled by the owner of the web page. */
thirdPartyResourcesLabel: 'Show 3rd-party resources',
};

if (typeof module !== 'undefined' && module.exports) {
Expand Down
Loading