Add ability for Vectorized Scanner in write_pandas #2164

culpgrant · 2025-02-03T04:00:27Z

Please answer these questions before submitting your pull requests. Thanks!

What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1903333: Add ability for USE_VECTORIZED_SCANNER in write_pandas #2157
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
- I am adding new logging messages
- I am adding a new telemetry message
- I am modifying authorization mechanisms
- I am adding new credentials
- I am modifying OCSP code
- I am adding a new dependency
Please describe how your code solves the related issue.

Give the user to specify the USE_VECTORIZED_SCANNER parameter in the function write_pandas when running the SQL command COPY INTO

(Optional) PR for stored-proc connector:

github-actions · 2025-02-03T04:00:40Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

culpgrant · 2025-02-03T04:04:15Z

I have read the CLA Document and I hereby sign the CLA

culpgrant · 2025-02-11T13:44:04Z

Hey @sfc-gh-dszmolka I was wondering how long does an initial review typically take?

sfc-gh-dszmolka · 2025-02-11T13:48:24Z

i really cannot comment on it, as i do not own the resources who are responsible for reviewing the PRs. I'm very sorry to hear it's not fast enough, but also don't have any advice at this point besides hoping the team eventually gets there.

sfc-gh-mkeller

Code/feature looks good to me, but I no longer own the Python connector. I'll add some reviewers to move the review forward though

test/integ/pandas/test_pandas_tools.py

sfc-gh-mmishchenko

Not sure about the added test. It's on one hand an integration test using a real database connection, and on the other hand it mocks all its subsequent queries. Maybe there's a chance it can be converted into a pure unit test?

sfc-gh-mmishchenko · 2025-02-20T16:25:23Z

test/integ/pandas/test_pandas_tools.py

+        (False, "FILE_FORMAT=(TYPE=PARQUET COMPRESSION=auto USE_VECTORIZED_SCANNER=FALSE)"),
+    ],
+)
+def test_write_pandas_use_vectorized_scanner(


Isn't it that way that this test makes some assumptions about the internals of write_pandas implementation?

Yeah I have updated the test to a pure unit test for this vectorized scanner functionality.

sfc-gh-mmishchenko · 2025-02-20T16:28:31Z

test/integ/pandas/test_pandas_tools.py

+            cur = SnowflakeCursor(cnx)
+            cur._result = iter([])
+            return cur


How sure we are that write_pandas will always tolerate this result of execute and some future unrelated changes won't break this test as a side effect?

sfc-gh-mmishchenko · 2025-02-20T16:31:27Z

test/integ/pandas/test_pandas_tools.py

+            if len(args) >= 1 and args[0].startswith("COPY INTO"):
+                assert expected_file_format in args[0]


Will write_pandas always make just one COPY INTO query?

culpgrant · 2025-02-25T03:43:47Z

Not sure about the added test. It's on one hand an integration test using a real database connection, and on the other hand it mocks all its subsequent queries. Maybe there's a chance it can be converted into a pure unit test?

Yeah I was mostly just copying the existing way of the integration tests, I was using test_table_location_building as a guideline. I am about to push a new change for a fully mocked pure unit test because that is probably the better approach for this.

culpgrant · 2025-03-10T11:54:03Z

@sfc-gh-mmishchenko Would you be able to take a look? I updated to a pure unit test

culpgrant force-pushed the feature/pandas_tools_vectorized_scanner branch from dfaada9 to 24ff852 Compare February 3, 2025 04:12

sfc-gh-dszmolka requested review from a team February 3, 2025 09:54

sfc-gh-mkeller reviewed Feb 18, 2025

View reviewed changes

test/integ/pandas/test_pandas_tools.py Outdated Show resolved Hide resolved

sfc-gh-mkeller requested review from sfc-gh-yixie, sfc-gh-aalam, sfc-gh-mmishchenko, sfc-gh-jszczerbinski and sfc-gh-mhofman February 18, 2025 20:05

sfc-gh-mmishchenko requested changes Feb 20, 2025

View reviewed changes

culpgrant force-pushed the feature/pandas_tools_vectorized_scanner branch from 469aa95 to 0d2df84 Compare February 20, 2025 17:27

culpgrant added 4 commits February 24, 2025 22:04

add ability to specify vertorized_scanner for write_pandas

aecd176

remove print

097c956

move to a pure unit test with full mocking

2b48cbd

update test_pandas_tools with precommit changes

ccaf75f

culpgrant force-pushed the feature/pandas_tools_vectorized_scanner branch from e17c05e to ccaf75f Compare February 25, 2025 04:05

culpgrant requested a review from sfc-gh-mmishchenko February 25, 2025 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability for Vectorized Scanner in write_pandas #2164

Add ability for Vectorized Scanner in write_pandas #2164

culpgrant commented Feb 3, 2025

github-actions bot commented Feb 3, 2025 •

edited

Loading

culpgrant commented Feb 3, 2025

culpgrant commented Feb 11, 2025

sfc-gh-dszmolka commented Feb 11, 2025

sfc-gh-mkeller left a comment

sfc-gh-mmishchenko left a comment

sfc-gh-mmishchenko Feb 20, 2025

culpgrant Feb 25, 2025

sfc-gh-mmishchenko Feb 20, 2025

sfc-gh-mmishchenko Feb 20, 2025

culpgrant commented Feb 25, 2025

culpgrant commented Mar 10, 2025

		if len(args) >= 1 and args[0].startswith("COPY INTO"):
		assert expected_file_format in args[0]

Add ability for Vectorized Scanner in write_pandas #2164

Are you sure you want to change the base?

Add ability for Vectorized Scanner in write_pandas #2164

Conversation

culpgrant commented Feb 3, 2025

github-actions bot commented Feb 3, 2025 • edited Loading

culpgrant commented Feb 3, 2025

culpgrant commented Feb 11, 2025

sfc-gh-dszmolka commented Feb 11, 2025

sfc-gh-mkeller left a comment

Choose a reason for hiding this comment

sfc-gh-mmishchenko left a comment

Choose a reason for hiding this comment

sfc-gh-mmishchenko Feb 20, 2025

Choose a reason for hiding this comment

culpgrant Feb 25, 2025

Choose a reason for hiding this comment

sfc-gh-mmishchenko Feb 20, 2025

Choose a reason for hiding this comment

sfc-gh-mmishchenko Feb 20, 2025

Choose a reason for hiding this comment

culpgrant commented Feb 25, 2025

culpgrant commented Mar 10, 2025

github-actions bot commented Feb 3, 2025 •

edited

Loading