Skip to content

Commit 56a9dc9

Browse files
keueugene-kulak
andauthored
Source FB Marketing: add notes and findings after #8385
Co-authored-by: Eugene Kulak <[email protected]>
1 parent 9e22a55 commit 56a9dc9

File tree

1 file changed

+86
-1
lines changed
  • airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing

1 file changed

+86
-1
lines changed

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/README.md

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Structure
1+
# Structure
22

33
- api.py - everything related to FB API, error handling, throttle, call rate
44
- source.py - mainly check and discovery logic
@@ -11,3 +11,88 @@
1111
- async_job.py - logic about asynchronous jobs
1212
- async_job_manager.py - you will find everything about managing groups of async job here
1313
- common.py - some utils
14+
15+
# FB findings
16+
17+
## API
18+
19+
FB Marketing API provides three ways to interact:
20+
- single request
21+
- batch request
22+
- async request
23+
24+
FB provides a `facebook_business` library, which is an auto generated code from their API spec.
25+
We use it because it provides:
26+
- nice error handling
27+
- batch requests helpers
28+
- auto serialize/de-serialize responses to FB objects
29+
- transparently iterates over paginated response
30+
31+
## Single request
32+
Is the most common way to request something.
33+
We use the two-steps strategy to read most of the data:
34+
1. first request to get list of IDs (filtered by cursor if supported)
35+
2. loop over list of ids and request details for each ID, this step sometimes use batch request
36+
37+
## Batch request
38+
is a batch of requests serialized in the body of a single request.
39+
The response of such request will be a list of responses for each individual request (body, headers, etc).
40+
FB lib use interface with callbacks, batch object will call corresponding (success or failure) callback for each type of response.
41+
FB lib also catch fatal errors from the API (500, …) and instead of calling `on_failure` callback will return a new batch object with list of failed requests.
42+
FB API limit number of requests in a single batch to 50.
43+
44+
**Important note**:
45+
46+
Batch object doesn’t perform pagination of individual responses,
47+
so you may lose data if the response have pagination.
48+
49+
## Async Request
50+
FB recommends to use Async Requests when common requests begin to timeout.
51+
Async Request is a 3-step process:
52+
- create async request
53+
- check its status (in a loop)
54+
- fetch response when status is done
55+
56+
### Combination with batch
57+
Unfortunately all attempts to create multiple async requests in a single batch failed - `ObjectParser` from FB lib don’t know how to parse `AdReportRun` response.
58+
Instead, we use batch to check status of multiple async jobs at once (respecting batch limit of 50)
59+
60+
### Insights
61+
We use Async Requests to read Insights, FB API for this called `AdReportRun`.
62+
Insights are reports based on ads performance, you can think about it as an SQL query:
63+
64+
```sql
65+
select <fields> from <edge_object> where <filter> group by <level>, <breakdowns>;
66+
```
67+
68+
Our insights by default look like this:
69+
70+
```sql
71+
select <all possible fields> from AdAccount(me) where start_date = …. and end_date = …. group by ad, <breakdown>
72+
```
73+
74+
FB will perform calculations on its backed with various complexity depending on fields we ask, most heavy fields are unique metrics: `unique_clicks`, `unique_actions`, etc.
75+
76+
Additionally, Insights has fields that show stats from last N days, so-called attribution window, it can be `1d`, `7d`, and `28d`, by default we use all of them.
77+
According to FB docs insights data can be changed up to 28 days after it has being published.
78+
That's why we re-read 28 days in the past from now each time we sync insight stream.
79+
80+
When amount of data and computation is too big for FB servers to handle the jobs start to failing. Throttle and call rate metrics don’t reflect this problem and can’t be used to monitor.
81+
Instead, we use the following technic.
82+
Taking into account that we group by ad we can safely change our from table to smaller dataset/edge_object (campaign, adset, ad).
83+
Empirically we figured out that account level insights contains data for all campaigns from last 28 days and, very rarely, campaigns that didn’t even start yet.
84+
To solve this mismatch, at least partially, we get list of campaigns for last 28 days from the insight start date.
85+
The current algorithm looks like this:
86+
87+
```
88+
create async job for account level insight for the day A
89+
if async job failed:
90+
restart it
91+
if async job failed again:
92+
get list of campaigns for last 28 day
93+
create async job for each campaign and day A
94+
```
95+
If campaign-level async job fails second time we split it by `AdSets` or `Ads`.
96+
97+
Reports from users show that sometimes async job can stuck for very long time (hours+),
98+
and because FB doesn’t provide any canceling API after 1 hour of waiting we start another job.

0 commit comments

Comments
 (0)