Skip to content

Commit baf490d

Browse files
hadiamjadtunethewebgithub-actions[bot]dependabot[bot]ksakae1216
authored
Privacy Sql Tracking Detection Using Easylist Adservers (#3730)
* Add GA4 fields to match documentation (#3679) * Add standard GA4 web-vital fields * Add value * Update Timestamps (#3680) Co-authored-by: tunetheweb <[email protected]> * Bump web-vitals from 4.1.0 to 4.1.1 in /src (#3681) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.0 to 4.1.1. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.1.0...v4.1.1) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.10.0 to 22.10.1 in /src (#3682) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.0 to 22.10.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.10.0...puppeteer-v22.10.1) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump prettier from 3.3.1 to 3.3.2 in /src (#3683) Bumps [prettier](https://github.com/prettier/prettier) from 3.3.1 to 3.3.2. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.3.1...3.3.2) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.10.1 to 22.11.0 in /src (#3684) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.1 to 22.11.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.10.1...puppeteer-v22.11.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Translation of security chapter to Japanese (#3685) * Bump puppeteer from 22.11.0 to 22.11.2 in /src (#3688) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.0 to 22.11.2. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.11.0...puppeteer-v22.11.2) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.1.1 to 4.2.0 in /src (#3690) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.1 to 4.2.0. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.1.1...v4.2.0) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.11.2 to 22.12.0 in /src (#3689) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.2 to 22.12.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.11.2...puppeteer-v22.12.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3691) Co-authored-by: tunetheweb <[email protected]> * Remove deploy.zip step of deployment (#3692) * Remove deploy.zip * Remove from ignore files * Bump puppeteer from 22.12.0 to 22.12.1 in /src (#3694) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.0 to 22.12.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.12.0...puppeteer-v22.12.1) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0 (#3693) * Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0 Bumps [treosh/lighthouse-ci-action](https://github.com/treosh/lighthouse-ci-action) from 11.4.0 to 12.1.0. - [Release notes](https://github.com/treosh/lighthouse-ci-action/releases) - [Commits](treosh/lighthouse-ci-action@11.4.0...12.1.0) --- updated-dependencies: - dependency-name: treosh/lighthouse-ci-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> * Upgrade to Node 20 --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Barry Pollard <[email protected]> * Bump web-vitals from 4.2.0 to 4.2.1 in /src (#3695) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.0 to 4.2.1. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.0...v4.2.1) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-python from 5.1.0 to 5.1.1 (#3699) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.1.0 to 5.1.1. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5.1.0...v5.1.1) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.12.1 to 22.13.0 in /src (#3698) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.1 to 22.13.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.12.1...puppeteer-v22.13.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Translation of mobile-web chapter to Japanese (#3700) * Bump puppeteer from 22.13.0 to 22.15.0 in /src (#3711) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.13.0 to 22.15.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.13.0...puppeteer-v22.15.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump jsdom from 24.1.0 to 24.1.1 in /src (#3707) Bumps [jsdom](https://github.com/jsdom/jsdom) from 24.1.0 to 24.1.1. - [Release notes](https://github.com/jsdom/jsdom/releases) - [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md) - [Commits](jsdom/jsdom@24.1.0...24.1.1) --- updated-dependencies: - dependency-name: jsdom dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.2.1 to 4.2.2 in /src (#3706) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.1 to 4.2.2. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.1...v4.2.2) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump prettier from 3.3.2 to 3.3.3 in /src (#3702) Bumps [prettier](https://github.com/prettier/prettier) from 3.3.2 to 3.3.3. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.3.2...3.3.3) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.2.2 to 4.2.3 in /src (#3715) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.2 to 4.2.3. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.2...v4.2.3) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3716) Co-authored-by: rviscomi <[email protected]> * tracking detection using easylist adservers * easylist_adserver tracking detection and query * 2022 cdn portuguese (#3725) * add file to translation * done translation cdn.md Makes progress on #505 * Bump puppeteer from 22.15.0 to 23.0.2 in /src (#3719) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.15.0 to 23.0.2. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.15.0...puppeteer-v23.0.2) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3726) Co-authored-by: tunetheweb <[email protected]> * Replace `<object>` with `<iframe>` for embedded SVG (#3727) * Replace object with iframe for embedded SVG * Translations * auto upload easylist data to table * Fix the build to ignore 2024 chapters (for now) (#3728) * Fix the build to ignore 2024 chapters (for now) * Remove test line * Update Timestamps (#3729) Co-authored-by: tunetheweb <[email protected]> * liniting * liniting * linting * linting * linting * linting * fixes of Simplified Chinese translation for 2020 Performance (#3734) --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Barry Pollard <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tunetheweb <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sakae Kotaro <[email protected]> Co-authored-by: rviscomi <[email protected]> Co-authored-by: Hadi Amjad <[email protected]> Co-authored-by: William Constantinov <[email protected]> Co-authored-by: Zuckjet <[email protected]> Co-authored-by: Max Ostapenko <[email protected]>
1 parent a239c25 commit baf490d

File tree

2 files changed

+114
-0
lines changed

2 files changed

+114
-0
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
CREATE TEMP FUNCTION
2+
CheckDomainInURL(url STRING, domain STRING)
3+
RETURNS INT64
4+
LANGUAGE js AS """
5+
return url.includes(domain) ? 1 : 0;
6+
""";
7+
8+
-- We need to use the `easylist_adservers.csv` to populate the table to get the list of domains to block
9+
-- https://github.com/easylist/easylist/blob/master/easylist/easylist_adservers.txt
10+
WITH easylist_data AS (
11+
SELECT string_field_0
12+
FROM `httparchive.almanac.easylist_adservers`
13+
),
14+
requests_data AS (
15+
SELECT url
16+
FROM `httparchive.all.requests`
17+
WHERE
18+
date = '2024-06-01' AND
19+
is_root_page = TRUE
20+
),
21+
block_status AS (
22+
SELECT
23+
r.url,
24+
MAX(
25+
CASE
26+
WHEN CheckDomainInURL(r.url, e.string_field_0) = 1 THEN 1
27+
ELSE 0
28+
END
29+
) AS should_block
30+
FROM requests_data r
31+
LEFT JOIN easylist_data e
32+
ON CheckDomainInURL(r.url, e.string_field_0) = 1
33+
GROUP BY r.url
34+
)
35+
SELECT
36+
COUNT(0) AS blocked_url_count
37+
FROM block_status
38+
WHERE should_block = 1;
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# pylint: disable=import-error
2+
import requests
3+
import pandas as pd
4+
from google.cloud import bigquery
5+
6+
7+
def extract_domains_from_file(file_path):
8+
domains = []
9+
try:
10+
with open(file_path, "r") as file:
11+
for line in file:
12+
# Remove the '||' prefix and '^' suffix
13+
domain = line.strip().lstrip("||").rstrip("^")
14+
if domain: # Ensure the line is not empty
15+
domains.append(domain)
16+
except FileNotFoundError:
17+
print(f"Error: The file {file_path} does not exist.")
18+
except Exception as e:
19+
print(f"An error occurred: {e}")
20+
return domains
21+
22+
23+
def save_domains_to_csv(domains, csv_file_path):
24+
try:
25+
# Create a DataFrame from the list of domains
26+
df = pd.DataFrame(domains, columns=["Domain"])
27+
# Save the DataFrame to a CSV file
28+
df.to_csv(csv_file_path, index=False)
29+
except Exception as e:
30+
print(f"An error occurred while writing to CSV: {e}")
31+
32+
33+
def upload_csv_to_bigquery(csv_file_path):
34+
# this needs the GOOGLE_APPLICATION_CREDENTIALS env variable to be set
35+
client = bigquery.Client()
36+
37+
# Configure the job
38+
job_config = bigquery.LoadJobConfig(
39+
source_format=bigquery.SourceFormat.CSV,
40+
skip_leading_rows=1, # Adjust if your CSV doesn't have a header row
41+
autodetect=True, # Automatically infer schema
42+
)
43+
44+
# Load data from the CSV file
45+
with open(csv_file_path, "rb") as source_file:
46+
load_job = client.load_table_from_file(
47+
source_file, "httparchive.almanac.easylist_adservers",
48+
job_config=job_config
49+
)
50+
51+
# Wait for the job to complete
52+
load_job.result()
53+
54+
55+
# URL to the text file containing the regex patterns
56+
url = "https://raw.githubusercontent.com/easylist/easylist/master/" \
57+
"easylist/easylist_adservers.txt"
58+
file_path = "easylist_adservers.txt"
59+
# Path to the output CSV file
60+
csv_file_path = "easylist_adservers.csv"
61+
62+
# Download the file and save it locally
63+
response = requests.get(url)
64+
with open(file_path, "wb") as file:
65+
file.write(response.content)
66+
67+
# Extract domains
68+
domains = extract_domains_from_file(file_path)
69+
70+
# Save domains to CSV
71+
save_domains_to_csv(domains, csv_file_path)
72+
73+
# upload domains to BQ
74+
upload_csv_to_bigquery(csv_file_path)
75+
76+
print(f"Domains have been saved to {csv_file_path}")

0 commit comments

Comments
 (0)