-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: refactor read_csv and tests based on bigquery vs. pandas behavior comparison #1595
Conversation
623b794
to
e3201f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clean up!
) | ||
return self._read_pandas(pandas_df, api_name="read_csv", write_engine=write_engine) # type: ignore | ||
|
||
def _read_csv_w_pandas_engines( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: "_read_csv_w_pandas_engine" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas.read_csv
support multiple engines, such as "c", "python", "pyarrow", "python-fwf". Added docstring in the method for easier read.
if header is None: | ||
job_config.skip_leading_rows = 0 | ||
elif header > 0: | ||
job_config.skip_leading_rows = header + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-rhetorical question: why "header + 1" here when the original version is just "header"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New refactoring tests compare BigQuery and Pandas behavior and catch the bug mentioned here. However, despite the fix, column naming mismatches still exist, as reported in internal issue 409070192.
tests/system/small/test_session.py
Outdated
@@ -39,6 +39,33 @@ | |||
from tests.system import utils | |||
|
|||
|
|||
@pytest.fixture(scope="module") | |||
def write_df_to_local_csv_file(scalars_df_index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit: maybe "df_and_local_csv" is better? We are using a lot of noun phrases for fixtures after all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/system/small/test_session.py
Outdated
|
||
|
||
@pytest.fixture(scope="module") | ||
def write_df_to_gcs_csv_file(scalars_df_index, gcs_folder): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nit "df_and_gcs_csv"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/system/small/test_session.py
Outdated
index_col=False, | ||
) | ||
assert df.shape[0] == scalars_df_index.shape[0] | ||
if index_col is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should separate this test method into two. "if-else" branches based on test input are usually a code smell.
go/unit-testing-practices?polyglot=python#logic:
"Tests written without operators or control structures are clearer since the reader doesn't have to do any mental computations to understand them, and are more likely to be correct since it's harder to have bugs in code without these constructs."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. Done.
e3201f8
to
b222b22
Compare
No description provided.