-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability for Vectorized Scanner in write_pandas #2164
base: main
Are you sure you want to change the base?
Add ability for Vectorized Scanner in write_pandas #2164
Conversation
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
dfaada9
to
24ff852
Compare
Hey @sfc-gh-dszmolka I was wondering how long does an initial review typically take? |
i really cannot comment on it, as i do not own the resources who are responsible for reviewing the PRs. I'm very sorry to hear it's not fast enough, but also don't have any advice at this point besides hoping the team eventually gets there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code/feature looks good to me, but I no longer own the Python connector. I'll add some reviewers to move the review forward though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the added test. It's on one hand an integration test using a real database connection, and on the other hand it mocks all its subsequent queries. Maybe there's a chance it can be converted into a pure unit test?
(False, "FILE_FORMAT=(TYPE=PARQUET COMPRESSION=auto USE_VECTORIZED_SCANNER=FALSE)"), | ||
], | ||
) | ||
def test_write_pandas_use_vectorized_scanner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it that way that this test makes some assumptions about the internals of write_pandas
implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have updated the test to a pure unit test for this vectorized scanner functionality.
cur = SnowflakeCursor(cnx) | ||
cur._result = iter([]) | ||
return cur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How sure we are that write_pandas will always tolerate this result of execute
and some future unrelated changes won't break this test as a side effect?
if len(args) >= 1 and args[0].startswith("COPY INTO"): | ||
assert expected_file_format in args[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will write_pandas
always make just one COPY INTO
query?
469aa95
to
0d2df84
Compare
Yeah I was mostly just copying the existing way of the integration tests, I was using |
e17c05e
to
ccaf75f
Compare
@sfc-gh-mmishchenko Would you be able to take a look? I updated to a pure unit test |
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1903333: Add ability for USE_VECTORIZED_SCANNER in write_pandas #2157
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
USE_VECTORIZED_SCANNER
parameter in the functionwrite_pandas
when running the SQL commandCOPY INTO