-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Ticket: 8331: Google Ads Report Data is Malformed for complex object #45852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ticket: 8331: Google Ads Report Data is Malformed for complex object #45852
Conversation
@dmykhailov is attempting to deploy a commit to the Airbyte Growth Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dmytro. Thank you for the contribution!
I've added a couple of comments to integrate type checking more comprehensively into the work.
And although the change is straightforward, we should still be adding new unit tests to ensure that your changes have the intended outcomes. My suggestion is to add a new test to test_google_ads.py
. There are some good examples to look from like test_get_field_value()
, and I can also provide an example that I wrote:
def test_get_field_value_object():
expected_response = {}
field = "ad_group_ad.ad.responsive_search_ad.headlines"
ads_row = GoogleAdsRow(
ad_group_ad={
"ad": {
"responsive_search_ad": {
"headlines": [
{
"text": "An exciting headline",
"policy_summary_info": {
"review_status": "REVIEWED",
"approval_status": "APPROVED"
}
},
{
"text": "second"
}
]
}
}
}
)
response = GoogleAds.get_field_value(ads_row, field, {})
assert response == expected_response
I think the main issue I noticed while I was reviewing the code is that we're not properly parsing the incoming RepeatedComposite
. Using the above code we'll end up with AttributeError: Unknown field for AdTextAsset: pb
. It's possible that my test case isn't testing exactly the issue you've highlighted, but let's try to figure out what the ideal test case is and then we can figure out how to implement it
airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
Outdated
Show resolved
Hide resolved
@@ -211,7 +225,7 @@ def get_field_value(field_value: GoogleAdsRow, field: str, schema_type: Mapping[ | |||
if isinstance(field_value, Enum): | |||
field_value = field_value.name | |||
elif isinstance(field_value, (Repeated, RepeatedComposite)): | |||
field_value = [str(value) for value in field_value] | |||
field_value = [GoogleAds.serialize_protobuf_message(value) for value in field_value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line may not have the right value you're expecting as input to serialize_protobuf_message()
. This list comprehension will turn Repeated
/RepeatedComposite
into an underlying value within the object. So for this example Repeated
:
[
{ "text": "An exciting headline", "policy_summary_info": { "approval_status": "APPROVED" } },
{ "text": "Two", "policy_summary_info": { "approval_status": "APPROVED" } }
]
What we end up passing to the method is the first AdTextAsset
. So in that regard I'm not sure either the method signature, or what we pass your new method is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each individual element of the Repeated
or RepeatedCoimposite
is indeed passed to the serialization. I'l explore if the whole object serialization will pass the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just tried to serialize Repeated
value via the same method using the public 'pb' and got an error that the object does not have "DESCRIPTOR" attribute. If I got it right, it means that the object is not protobuf message compatible with the serializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. It feels very weird that we would not end up with a protobuf message at all. What we may want to do instead of relying solely on the try/except block, we should add an explicit type check isinstance()
and if it not of the correct type, then we defer to str()
. We can still leave the try catch, but I think proactively checking the type and using str() when its the wrong type is a better pattern
one more question around this. have you been able to test the input for your specific use case? Our tests are semi-stripped down mocks of a real AdRow, before we do final testing and merge, it would be great to verify that given inputs similar to what you have, that we do end up serializing this correctly
@brianjlai thank you for your review and feedback. I've added two tests inspired by the example that you provided. Also, added a bit of comments and type hints where needed. |
@brianjlai I've updated the mentioned files to the best of my knowledge. Please review and let me know if I missing something. |
I see you've done that, and it's probably easier if you did. We want the version increments to be in the same PR that the change is in for better keeping track of things. Thanks for doing that. Let's see about getting a new CI run of this kicked off |
Okay, the last CI run has a few docs formatting failures, but it's unrelated to your change. The change looks good to me and all set to merge. Thanks for the contribution @dmykhailov @natikgadzhi we have the unrelated failing QA check due to docs formatting. We can fix that on our own time and it should not need to block @dmykhailov 's contribution. Can you run the approve override to release this change when you have a moment. And I'll monitor things as the connector get's published |
Sure! I'll publish in the morning so you don't have to stress at 9pm ;-) |
@brianjlai @natikgadzhi I've just resolved the merge conflict and bumped the version of this update accordingly. |
What
Working with google ads I found an issue with one of the reports. We have report that is being generated from the following GAQL (number of selected columns is much bigger, but I have removed the most because they are not relevant):
The problem that I am facing is with ad_group_ad.ad.responsive_search_ad.headlines column.Here is one sample of the values in the column:
So, it is list of strings and each string contains a representation of data which is not really JSON or yaml. that looks very similar to an string produced by protobuf toString() method.The problem is that it is quite hard to parse this values.
The type of the data from Google Ads GRPC response is "MESSAGE" which corresponds to the hypothesis of "toString()" method call.
How
As a solution I would looks for converting GRPC message to JSON and storing JSON values for ease of integration and further parsing.
Review guide
That's a very simple fix that does not fail existing tests.
User Impact
People who already relied on the text based parsing might get their code broken.
Can this PR be safely reverted and rolled back?
Help that I need
I would like to have some help with the testing approach. At this moment integration and unit tests are based on python object and the tests do not mimic GRPC messages, so, any edge cases from GRPC conversion world are not really covered.