Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Google Cloud Dataflow removing accents and special chars with '??' #647

Open
@turboT4

Description

@turboT4

This is going to be quite a hit or miss question as I don't really know which context or piece of code to give you as it is a situation of it works in local, which does!

The situation here is that I have several services, and there's a step where messages are put in a PubSub topic awaiting for the Dataflow consumer to handle them and save as .parquet files (I also have another one which sends that payload to a HTTP endpoint).

The thing is, the message in that service prior sending it to that PubSub topic seems to be correct, Stackdriver logs show all the chars as they should be.

However, when I'm going to check the final output in .parquet or in the HTTP endpoint I just see, for example h?? instead of hí, which seems pretty weird as running everything in local makes the output be correct.

I can only think about encoding server-wise when deploying the Dataflow as a job and not running in local, or in any other services.

Hope someone can shed some light in something this abstract.

We're running SDK 2.9.0 (Beam 2.9.0), if that's something relevant also.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions