-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Encoding is ISO-8859-1 #12552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding is ISO-8859-1 #12552
Conversation
/test connector=connectors/source-salesforce
|
|
||
|
||
def test(): | ||
assert b"0\xe5".decode(SalesforceStream.encoding) == "0å" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed this fails if SalesforceStream.encoding
is 'utf-8'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕ for the unit test for encoding
/publish connector=connectors/source-salesforce
|
Codecov Report
@@ Coverage Diff @@
## master #12552 +/- ##
=========================================
Coverage ? 88.07%
=========================================
Files ? 7
Lines ? 562
Branches ? 0
=========================================
Hits ? 495
Misses ? 67
Partials ? 0 Continue to review full report at Codecov.
|
@@ -34,6 +34,7 @@ | |||
class SalesforceStream(HttpStream, ABC): | |||
page_size = 2000 | |||
transformer = TypeTransformer(TransformConfig.DefaultSchemaNormalization) | |||
encoding = "ISO-8859-1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is any link to Salesforce docs explaining or saying about this encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @girarda
@girarda Different sources including your links indicate most Salesforce instances UTF-8 expcept for some instances
While your fix prevents the problem happening by switching to a non-variable length encoding, characters are now incorrectly decoded. For example I added '间单的说 🪐' in an opportunity's description. The Airbyte source now converts this to 'éÂ�´åÂ�Â�çÂ�Â�说 ðÂ�ªÂ�' |
* Encoding is ISO-8859-1 * rename test * bump * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]>
What
How