Skip to content

Ticket: 8331: Google Ads Report Data is Malformed for complex object #45852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dmykhailov
Copy link
Contributor

What

Working with google ads I found an issue with one of the reports. We have report that is being generated from the following GAQL (number of selected columns is much bigger, but I have removed the most because they are not relevant):

SELECT
[ad_group_ad.ad.id](http://ad_group_ad.ad.id/),
ad_group_ad.resource_name,
ad_group_ad.ad.responsive_search_ad.headlines
FROM ad_group_ad

The problem that I am facing is with ad_group_ad.ad.responsive_search_ad.headlines column.Here is one sample of the values in the column:

[
"text: \"{KeyWord:Miro}\"\npinned_field: HEADLINE_1\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Comece uma retrospectiva\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Modelo de retrospectiva grátis\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Faça retrospectivas eficazes\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Faça retrospectivas na Miro\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Ouça o que o time tem a dizer\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Discuta e planeje melhorias\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Grátis, fácil, compartilhável\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Colabore em tempo real\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Mais de 300 modelos editáveis\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Mais de {CUSTOMIZER.User Amount} milhões de usuários\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n",
"text: \"Acesse em qualquer dispositivo\"\nasset_performance_label: PENDING\npolicy_summary_info {\n review_status: REVIEWED\n approval_status: APPROVED\n}\n"
]

So, it is list of strings and each string contains a representation of data which is not really JSON or yaml. that looks very similar to an string produced by protobuf toString() method.The problem is that it is quite hard to parse this values.

The type of the data from Google Ads GRPC response is "MESSAGE" which corresponds to the hypothesis of "toString()" method call.

How

As a solution I would looks for converting GRPC message to JSON and storing JSON values for ease of integration and further parsing.

Review guide

That's a very simple fix that does not fail existing tests.

User Impact

People who already relied on the text based parsing might get their code broken.

Can this PR be safely reverted and rolled back?

  • YES 💚

Help that I need

I would like to have some help with the testing approach. At this moment integration and unit tests are based on python object and the tests do not mimic GRPC messages, so, any edge cases from GRPC conversion world are not really covered.

Copy link

vercel bot commented Sep 23, 2024

@dmykhailov is attempting to deploy a commit to the Airbyte Growth Team on Vercel.

A member of the Team first needs to authorize it.

@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Sep 23, 2024
@dmykhailov dmykhailov changed the title Google Ads Report Data is Malformed for complex object Ticket: 8331: Google Ads Report Data is Malformed for complex object Sep 23, 2024
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dmytro. Thank you for the contribution!

I've added a couple of comments to integrate type checking more comprehensively into the work.

And although the change is straightforward, we should still be adding new unit tests to ensure that your changes have the intended outcomes. My suggestion is to add a new test to test_google_ads.py. There are some good examples to look from like test_get_field_value(), and I can also provide an example that I wrote:

def test_get_field_value_object():
    expected_response = {}
    field = "ad_group_ad.ad.responsive_search_ad.headlines"
    ads_row = GoogleAdsRow(
        ad_group_ad={
            "ad": {
                "responsive_search_ad": {
                    "headlines": [
                        {
                            "text": "An exciting headline",
                            "policy_summary_info": {
                                "review_status": "REVIEWED",
                                "approval_status": "APPROVED"
                            }
                        },
                        {
                            "text": "second"
                        }
                    ]
                }
            }
        }
    )

    response = GoogleAds.get_field_value(ads_row, field, {})
    assert response == expected_response

I think the main issue I noticed while I was reviewing the code is that we're not properly parsing the incoming RepeatedComposite. Using the above code we'll end up with AttributeError: Unknown field for AdTextAsset: pb. It's possible that my test case isn't testing exactly the issue you've highlighted, but let's try to figure out what the ideal test case is and then we can figure out how to implement it

@@ -211,7 +225,7 @@ def get_field_value(field_value: GoogleAdsRow, field: str, schema_type: Mapping[
if isinstance(field_value, Enum):
field_value = field_value.name
elif isinstance(field_value, (Repeated, RepeatedComposite)):
field_value = [str(value) for value in field_value]
field_value = [GoogleAds.serialize_protobuf_message(value) for value in field_value]
Copy link
Contributor

@brianjlai brianjlai Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line may not have the right value you're expecting as input to serialize_protobuf_message(). This list comprehension will turn Repeated/RepeatedComposite into an underlying value within the object. So for this example Repeated:

[
  { "text": "An exciting headline", "policy_summary_info": { "approval_status": "APPROVED" } },
  { "text": "Two", "policy_summary_info": { "approval_status": "APPROVED" } }
]

What we end up passing to the method is the first AdTextAsset. So in that regard I'm not sure either the method signature, or what we pass your new method is right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each individual element of the Repeated or RepeatedCoimposite is indeed passed to the serialization. I'l explore if the whole object serialization will pass the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried to serialize Repeated value via the same method using the public 'pb' and got an error that the object does not have "DESCRIPTOR" attribute. If I got it right, it means that the object is not protobuf message compatible with the serializer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see. It feels very weird that we would not end up with a protobuf message at all. What we may want to do instead of relying solely on the try/except block, we should add an explicit type check isinstance() and if it not of the correct type, then we defer to str(). We can still leave the try catch, but I think proactively checking the type and using str() when its the wrong type is a better pattern

one more question around this. have you been able to test the input for your specific use case? Our tests are semi-stripped down mocks of a real AdRow, before we do final testing and merge, it would be great to verify that given inputs similar to what you have, that we do end up serializing this correctly

@dmykhailov
Copy link
Contributor Author

@brianjlai thank you for your review and feedback. I've added two tests inspired by the example that you provided. Also, added a bit of comments and type hints where needed.

@dmykhailov
Copy link
Contributor Author

@brianjlai I've updated the mentioned files to the best of my knowledge. Please review and let me know if I missing something.

@brianjlai
Copy link
Contributor

Should I do the version bump or you'll do that?

I see you've done that, and it's probably easier if you did. We want the version increments to be in the same PR that the change is in for better keeping track of things. Thanks for doing that.

Let's see about getting a new CI run of this kicked off

@brianjlai
Copy link
Contributor

Okay, the last CI run has a few docs formatting failures, but it's unrelated to your change. The change looks good to me and all set to merge. Thanks for the contribution @dmykhailov

@natikgadzhi we have the unrelated failing QA check due to docs formatting. We can fix that on our own time and it should not need to block @dmykhailov 's contribution. Can you run the approve override to release this change when you have a moment. And I'll monitor things as the connector get's published

@natikgadzhi
Copy link
Contributor

Sure! I'll publish in the morning so you don't have to stress at 9pm ;-)

@dmykhailov
Copy link
Contributor Author

@brianjlai @natikgadzhi I've just resolved the merge conflict and bumped the version of this update accordingly.

@natikgadzhi natikgadzhi enabled auto-merge (squash) October 9, 2024 18:25
@natikgadzhi natikgadzhi merged commit 1178f53 into airbytehq:master Oct 9, 2024
34 checks passed
@dmykhailov dmykhailov deleted the dmykhailov/bug-google-ads-objects-serialization branch October 10, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community connectors/source/google-ads
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants