-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(python-cdk): add user friendly message for encoding errors #44438
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -54,7 +55,7 @@ def state(self, value: MutableMapping[str, Any]) -> None: | |||
"""State setter, accept state serialized by state getter.""" | |||
self._cursor.set_initial_state(value) | |||
|
|||
@property | |||
@property # type: ignore # no suitable parent cursor type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it not suitable / why can't the typing error be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at several of the Cursor
offered by the IDE, and they didn't seem right. The only Cursor object that fixed the problem was the GCS Cursor, but this is a more generic file as it belongs to the CDK and not the specific GCS source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the error exactly? Is the GCS cursor not an AbstractFileBasedCursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that GCS cursor is not generic, as this file belongs to the CDK. Error is error: Signature of "cursor" incompatible with supertype "Stream" [override]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the expected type from the parent class? Is it really a mismatch? If not, can the comment be more clear about why this is a mypy problem, not with the type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a mismatch. gcs.Cursor
inherits from DefualtFileBasedCursor
, which inherits from AbstractFileBasedCursor
. Improved comment.
airbyte-cdk/python/airbyte_cdk/sources/file_based/exceptions.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/file_based/file_based_source.py
Outdated
Show resolved
Hide resolved
…rce.py Removed test file to be added in a second PR
…sek/gcs_decode_error_issue_8952`) Certainly! To optimize the code, one can focus on multiple aspects like reducing the overhead of unnecessary checks, avoiding redundant operations, efficient use of data structures, and minimizing log string concatenations. Here is the optimized version of your given code. ### Optimization Highlights. 1. **Concise Manual Validation**: Combined `quote_char`, `escape_char` validation into a single validator. 2. **Helper Functions**: Utilized a generalized helper function to avoid repetition and handle specific validations concisely. 3. **Import Statements Simplified**: Removed unnecessary imports and rearranged for better readability. 4. **Combined Error Logging**: Concatenated log messages in a single operation to minimize the overhead. 5. **Proper Use of Pythonic Constructs**: Optimized type casting while skipping unnecessary checks and directly handling edge cases. This optimized version should maintain the functionality while potentially improving performance due to fewer and more efficient operations.
⚡️ Codeflash found optimizations for this PR📄
|
with pytest.raises(expected_exc) as exc: | ||
check(capsys, tmp_path, scenario) | ||
|
||
if expected_msg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block was never executed because previous block exited with the exception. Also, output was None, since the code never returned. Tests should be more thorough now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great thanks @strosek 🚢
/approve-regression-tests
|
@@ -520,6 +520,17 @@ def test_parse_field_size_larger_than_default_python_maximum(self) -> None: | |||
data_generator = self._read_data() | |||
assert list(data_generator) == [{"header1": "1", "header2": long_string}] | |||
|
|||
def test_read_data_with_encoding_error(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test ATE raised.
What
Make it clearer when a CSV file has problems in discover/read because of a file encoding mismatch.
This also improves general error display in the UI thanks to catching
AirbyteTracedExceptions
and building a readable tracemessage
with the concatenation of each ATE's message.How
Catching the UnicodeError and throwing an easier to read exception message.
Review guide
csv_parser.py
- added try-except for creating a DecodingError from the UnicodeError. Also fixed a mypy error.exceptions.py
- added friendlier error message and DecodingError alias.default_file_based_stream.py
- fix exception throwing chain, clean duplicate log entriesdefault_file_based_availability_strategy.py
- fix exception throwing chain, raise ATE.User Impact
The user will have more clarity on why a decoding error happens, and how to fix it.
Can this PR be safely reverted and rolled back?