Skip to content

Encoding is ISO-8859-1 #12552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 4, 2022
Merged

Encoding is ISO-8859-1 #12552

merged 4 commits into from
May 4, 2022

Conversation

girarda
Copy link
Contributor

@girarda girarda commented May 3, 2022

What

How

  • Change the expected encoding from UTF-8 to ISO-8859-1 based on these 2 two threads which seem to indicate the encoding should be ISO-8859
  • Add a unit test to confirm we can decode the byte we failed to decode in production

@github-actions github-actions bot added the area/connectors Connector related issues label May 3, 2022
@girarda
Copy link
Contributor Author

girarda commented May 3, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/2266902416
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/2266902416
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/utils/common.py                  80     17    79%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/tests/test_core.py              285    106    63%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/tests/test_incremental.py        69     38    45%
------------------------------------------------------------------------
TOTAL                                                  896    259    71%
Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/__init__.py            2      0   100%
source_salesforce/exceptions.py          8      1    88%
source_salesforce/api.py               150     19    87%
source_salesforce/streams.py           295     68    77%
source_salesforce/rate_limiting.py      22      6    73%
source_salesforce/source.py             77     33    57%
source_salesforce/utils.py               8      7    12%
--------------------------------------------------------
TOTAL                                  562    134    76%
Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/utils.py               8      0   100%
source_salesforce/__init__.py            2      0   100%
source_salesforce/source.py             77      6    92%
source_salesforce/api.py               150     14    91%
source_salesforce/exceptions.py          8      1    88%
source_salesforce/rate_limiting.py      22      3    86%
source_salesforce/streams.py           295     43    85%
--------------------------------------------------------
TOTAL                                  562     67    88%



def test():
assert b"0\xe5".decode(SalesforceStream.encoding) == "0å"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed this fails if SalesforceStream.encoding is 'utf-8'

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label May 3, 2022
@girarda girarda marked this pull request as ready for review May 3, 2022 23:40
@girarda girarda requested review from brianjlai and marcosmarxm May 3, 2022 23:41
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➕ for the unit test for encoding

@girarda
Copy link
Contributor Author

girarda commented May 3, 2022

/publish connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/2266983181
🚀 Successfully published connectors/source-salesforce
🚀 Auto-bumped version for connectors/source-salesforce
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/2266983181

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets May 4, 2022 00:12 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets May 4, 2022 00:12 Inactive
@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

❗ No coverage uploaded for pull request base (master@075bec3). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #12552   +/-   ##
=========================================
  Coverage          ?   88.07%           
=========================================
  Files             ?        7           
  Lines             ?      562           
  Branches          ?        0           
=========================================
  Hits              ?      495           
  Misses            ?       67           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 075bec3...233ee6e. Read the comment docs.

@@ -34,6 +34,7 @@
class SalesforceStream(HttpStream, ABC):
page_size = 2000
transformer = TypeTransformer(TransformConfig.DefaultSchemaNormalization)
encoding = "ISO-8859-1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is any link to Salesforce docs explaining or saying about this encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @girarda

@girarda girarda merged commit 90162b6 into master May 4, 2022
@girarda girarda deleted the alex/salesforceEncoding branch May 4, 2022 02:23
@wissevrowl
Copy link
Contributor

@girarda Different sources including your links indicate most Salesforce instances UTF-8 expcept for some instances

If your org logs in to ssl.salesforce.com, your encoding is ISO-8859-1. All other instances use UTF-8.

While your fix prevents the problem happening by switching to a non-variable length encoding, characters are now incorrectly decoded.

For example I added '间单的说 🪐' in an opportunity's description. The Airbyte source now converts this to 'éÂ�´åÂ�Â�çÂ�Â�说 ðÂ�ªÂ�'

suhomud pushed a commit that referenced this pull request May 23, 2022
* Encoding is ISO-8859-1

* rename test

* bump

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
5 participants