Skip to content

Commit 6787f48

Browse files
committed
Dockerize scraper, upgrade to puppeteer 3.0.0
1 parent 24f8eb1 commit 6787f48

File tree

7 files changed

+249
-158
lines changed

7 files changed

+249
-158
lines changed

.dockerignore

+1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@
55
docker-compose.yml
66
docker-compose.*.yml
77
node_modules
8+
npm-debug.log
89
data/*

Dockerfile

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
FROM schliflo/docker-puppeteer:3.0.0
2+
WORKDIR /app
3+
COPY package*.json ./
4+
RUN npm install
5+
COPY . .
6+
CMD ["node", "index.js"]

README.md

+28-24
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,42 @@
33
Proof of concept about tracking contacts in WhatsApp.
44
Check out my [blog entry](https://jorislacance.fr/blog/2020/04/01/whatsapp-tracking) for in-depth info.
55

6-
![grafana](https://i.imgur.com/MMq8q4u.png)
6+
![grafana](https://i.imgur.com/MMq8q4u.png)
77

88
## architecture
99

1010
The POC is composed of:
1111

12-
* The Node.js WhatsApp scraper robot - Gather the data
13-
* The InfluxDB 2.0 service - Store the data
14-
* The Grafana 6.7 service - Visualize the data
12+
- The Node.js WhatsApp scraper robot - Gather the data
13+
- The InfluxDB 2.0 service - Store the data
14+
- The Grafana 6.7 service - Visualize the data
1515

1616
## setup
1717

1818
```bash
1919
# run the InfluxDB and Grafana services
20-
docker-compose up -d
20+
docker-compose up -d influxdb grafana
2121
```
2222

2323
### influxDB setup
2424

25-
* Go to [http://localhost:9999](http://localhost:9999), setup an admin account
26-
* Name the initial organization like you want, `initial-org` for instance
27-
* Name the initial bucket anything, like `yolo`, we won't use the initial one because there will be sample data in it
28-
* Create a new bucket `whatsapp-tracking`
29-
* Generate a token `whatsapp-tracking-scraper` with write permission to `whatsapp-tracking` bucket
30-
* Generate a token `grafana` with 'all access'
25+
- Go to [http://localhost:9999](http://localhost:9999), setup an admin account
26+
- Name the initial organization like you want, `initial-org` for instance
27+
- Name the initial bucket anything, like `yolo`, we won't use the initial one because there will be sample data in it
28+
- Create a new bucket `whatsapp-tracking`
29+
- Generate a token `whatsapp-tracking-scraper` with write permission to `whatsapp-tracking` bucket
30+
- Generate a token `grafana` with 'all access'
3131

3232
### grafana setup
3333

34-
* Go to [http://localhost:3000](http://localhost:3000), setup an admin account
35-
* Add the data source using the plugin `Flux (InfluxDB) [BETA]` (for InfluxDB 2.0)
36-
* URL: `http://influxdb:9999`
37-
* with credentials: `true`
38-
* Organization: `initial-org`
39-
* Default Bucket: `whatsapp-tracking`
40-
* Token: the grafana token
41-
* Import the dashboard (file `grafana-dashboard.json`)
34+
- Go to [http://localhost:3000](http://localhost:3000), setup an admin account
35+
- Add the data source using the plugin `Flux (InfluxDB) [BETA]` (for InfluxDB 2.0)
36+
- URL: `http://influxdb:9999`
37+
- with credentials: `true`
38+
- Organization: `initial-org`
39+
- Default Bucket: `whatsapp-tracking`
40+
- Token: the grafana token
41+
- Import the dashboard (file `grafana-dashboard.json`)
4242

4343
### scraper setup
4444

@@ -51,19 +51,23 @@ INFLUXDB_ORG=initial-org
5151
INFLUXDB_BUCKET=whatsapp-tracking
5252
```
5353

54-
## scraper usage
54+
## scraper usage Windows
5555

56-
```bash
56+
```powershell
5757
# init the robot
5858
npm install
5959
# run the robot
6060
node index.js
6161
```
6262

63-
## limitation
63+
## scraper usage docker
6464

65-
Not sure if the scraper runs on Linux...
65+
```bash
66+
docker-compose up scraper
67+
# Look at `./data/screenshots/` to get the QR code peering.
68+
```
6669

6770
## todo
6871

69-
* Dockerize the scraper too ? Hardly with the peering procedure that requires us to scan the QR code.
72+
- [x] Dockerize the scraper too ? Hardly with the peering procedure that requires us to scan the QR code. -> screenshot through file system
73+
- [ ] Track several contacts by rotating the contact tracked every hours or so (we will loose the precise `online` state tracking but gather more `last seen` data)

docker-compose.yml

+32-12
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,35 @@ services:
1515
restart: unless-stopped
1616

1717
grafana:
18-
image: grafana/grafana:6.7.2
19-
ports:
20-
- "3000:3000"
21-
volumes:
22-
- ./data/grafana:/var/lib/grafana
23-
environment:
24-
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource,grafana-influxdb-flux-datasource,natel-discrete-panel,petrslavotinek-carpetplot-panel,vonage-status-panel,flant-statusmap-panel,neocat-cal-heatmap-panel
25-
logging:
26-
options:
27-
max-size: "50m"
28-
max-file: "10"
29-
restart: unless-stopped
18+
image: grafana/grafana:6.7.2
19+
ports:
20+
- "3000:3000"
21+
volumes:
22+
- ./data/grafana:/var/lib/grafana
23+
environment:
24+
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource,grafana-influxdb-flux-datasource,natel-discrete-panel,petrslavotinek-carpetplot-panel,vonage-status-panel,flant-statusmap-panel,neocat-cal-heatmap-panel
25+
logging:
26+
options:
27+
max-size: "50m"
28+
max-file: "10"
29+
restart: unless-stopped
30+
31+
scraper:
32+
build:
33+
context: .
34+
dockerfile: Dockerfile
35+
shm_size: "1gb"
36+
volumes:
37+
- ./data/userdatadocker:/usr/data/userdata
38+
- ./data/screenshots:/usr/data/screenshots
39+
environment:
40+
- CONTACT_TARGET=${CONTACT_TARGET}
41+
- INFLUXDB_TOKEN=${INFLUXDB_TOKEN}
42+
- INFLUXDB_ORG=${INFLUXDB_ORG}
43+
- INFLUXDB_BUCKET=${INFLUXDB_BUCKET}
44+
- ISDOCKER=TRUE
45+
logging:
46+
options:
47+
max-size: "50m"
48+
max-file: "10"
49+
restart: unless-stopped

index.js

+18-6
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,42 @@
11
const puppeteer = require('puppeteer');
22
const chrono = require('chrono-node');
3-
const { InfluxDB, FluxTableMetaData } = require('@influxdata/influxdb-client')
3+
const { InfluxDB } = require('@influxdata/influxdb-client')
44
require('dotenv').config()
55
console.log(`CONTACT_TARGET=${process.env.CONTACT_TARGET}`);
66
console.log(`INFLUXDB_TOKEN=${process.env.INFLUXDB_TOKEN}`);
77
console.log(`INFLUXDB_ORG=${process.env.INFLUXDB_ORG}`);
88
console.log(`INFLUXDB_BUCKET=${process.env.INFLUXDB_BUCKET}`);
9+
console.log(`ISDOCKER=${process.env.ISDOCKER}`);
910

1011
// The contact name to track (mind the case).
1112
const contactTarget = process.env.CONTACT_TARGET;
1213

13-
let client = new InfluxDB({ url: 'http://localhost:9999', token: process.env.INFLUXDB_TOKEN });
14+
let docker = process.env.ISDOCKER === 'TRUE';
15+
let influxUrl = docker ? `http://influxdb:9999` : 'http://localhost:9999';
16+
let userDataDir = docker ? `/usr/data/userdata` : 'data/userdata';
17+
let args = docker ? ['--no-sandbox', '--disable-setuid-sandbox'] : [];
18+
let screenshotPath = docker ? '/usr/data/screenshots/screenshot.png' : 'data/screenshots/screenshot.png';
19+
20+
let client = new InfluxDB({ url: influxUrl, token: process.env.INFLUXDB_TOKEN });
1421
const writeApi = client.getWriteApi(process.env.INFLUXDB_ORG, process.env.INFLUXDB_BUCKET);
1522

1623
(async () => {
1724
const browser = await puppeteer.launch({
18-
headless: false, // No headless to scan the QR code.
19-
userDataDir: 'data/userdata' // Persist the session.
25+
args: args,
26+
headless: true,
27+
userDataDir: userDataDir // Persist the session.
2028
});
2129

2230
const page = await browser.newPage();
31+
page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0');
32+
2333
await page.goto('https://web.whatsapp.com/');
24-
await page.waitFor(5000);
34+
await page.waitFor(10000);
35+
36+
await page.screenshot({ path: screenshotPath });
2537

2638
console.log('Awaiting/Checking peering with WhatsApp phone');
27-
await page.waitFor('#side', { timeout: 60000 }).then(() => { // Scan the QR code within the next minute.
39+
await page.waitFor('#side', { timeout: 120000 }).then(() => { // Scan the QR code within the next 2 minutes.
2840
console.log('Connected !');
2941
}).catch((res) => {
3042
console.log('Not connected !', res);

0 commit comments

Comments
 (0)