Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1989239: Segmentation fault while parsing response with invalid datetime #2216

Open
inkoit opened this issue Mar 17, 2025 · 4 comments
Open
Assignees
Labels
bug status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team

Comments

@inkoit
Copy link

inkoit commented Mar 17, 2025

Python version

3.8,3.11

Operating system and processor architecture

macOS-15.2-arm64-arm-64bit,Linux-6.12.5-linuxkit-x86_64-with-glibc2.36

Installed packages

asn1crypto==1.5.1
backports.zoneinfo==0.2.1
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
coverage==7.6.1
cryptography==44.0.2
Cython==3.0.12
exceptiongroup==1.2.2
execnet==2.1.1
filelock==3.16.1
idna==3.10
importlib_resources==6.4.5
iniconfig==2.0.0
more-itertools==10.5.0
numpy==1.24.4
packaging==24.2
pendulum==3.0.0
pexpect==4.9.0
platformdirs==4.3.6
pluggy==1.5.0
ptyprocess==0.7.0
pycparser==2.22
PyJWT==2.9.0
pyOpenSSL==25.0.0
pytest==7.4.4
pytest-cov==5.0.0
pytest-rerunfailures==14.0
pytest-timeout==2.3.1
pytest-xdist==3.6.1
python-dateutil==2.9.0.post0
pytz==2025.1
pytzdata==2020.1
requests==2.32.3
six==1.17.0
snowflake-connector-python @ file:///Users/nsubbotin/code/snowflake-connector-python/.tox/.tmp/package/1/snowflake_connector_python-3.14.0-0.editable-cp38-cp38-macosx_14_0_arm64.whl#sha256=902936d8d2ea3f4d730abe0be0786a51d9dd5afabc6166bf693d7ced3eb7958c
sortedcontainers==2.4.0
time-machine==2.15.0
tomli==2.2.1
tomlkit==0.13.2
typing_extensions==4.12.2
tzdata==2025.1
urllib3==1.26.20
zipp==3.20.2

What did you do?

Given the following query.sql content:

SELECT 
    TO_TIMESTAMP('57168-12-10 13:59:58.000') AS a0,
    TO_TIMESTAMP('2025-03-04 14:58:35.000') AS a1,
    6367339 as a2,
    24 as a3,
    65 as a4,
    'kawabunga' as a5,
    'kawabunga' as a6,
    21293492 as a7,
    8 as a8,
    8 as a9,
    15 as a10,
    12 as a11,
    231122 as a12,
    9 as a13,
    253 as a14,
    'kawabunga' as a15,
    'kawabunga' as a16,
    21293492 as a17,
    12 as a18,
    25 as a19,
    'kawabunga' as a20,
    'kawabunga' as a21,
    8580496 as a22,
    0 as a23,
    445 as a24,
    'kawabunga' as a25,
    '' as a26,
    9060691 as a27,
    5 as a28,
    5 as a29,
    'kawabunga' as a30,
    'kawabunga' as a31,
    16366214 as a32,
    10 as a33,
    10 as a34,
    TO_DATE('2025-03-13') as a35

The code below results in segmentation fault:

import os
import snowflake.connector

conn = snowflake.connector.connect(
    user=os.getenv("SNOWFLAKE_USER"),
    password=os.getenv("SNOWFLAKE_PASSWORD"),
    warehouse=os.getenv("SNOWFLAKE_WAREHOUSE"),
    account=os.getenv("SNOWFLAKE_ACCOUNT"),
)
cur = conn.cursor()
with open("query.sql") as f:
    sql = f.read()
cur.execute(sql)
rows = cur.fetchall()

print(rows) # this prints rows without iterating them through Python, notice `<NULL>, <NULL>` - probably, they are real, C-nulls! 
rows[0][0] # this segfaults the process

We have a couple of such cases atm, and all of them share these features:

  • they have an invalid datetime (year 57168)
  • they are quite long, and have many fields after the invalid datetime

I've managed to trace the error, and it happens while parsing the output from Snowflake. Here's the content of payload.b64 that I've intercepted from the query above, and an example to reproduce

from snowflake.connector.nanoarrow_arrow_iterator import PyArrowRowIterator
from snowflake.connector.arrow_context import ArrowConverterContext
from base64 import b64decode

with open("payload.b64") as f:
    data = b64decode(f.read().strip())

it = PyArrowRowIterator(None, data, ArrowConverterContext(), False, False, False)
row = next(it)

print(row)
row[0]

I guess that the problem should be somewhere in CArrowChunkIterator.

It reproduces on the current main. Here's a PR that has these test cases: #2217

What did you expect to see?

Python exception or at least None. But not segmentation fault

Can you set logging to DEBUG and collect the logs?

2025-03-17 14:54:03,037 - MainThread CArrowIterator.cpp:120 - CArrowIterator() - DEBUG - Arrow BatchSize: 1
2025-03-17 14:54:03,037 - MainThread CArrowChunkIterator.cpp:46 - CArrowChunkIterator() - DEBUG - Arrow chunk info: batchCount 1, columnCount 36, use_numpy: 0
2025-03-17 14:54:03,037 - MainThread nanoarrow_arrow_iterator.cpython-310-darwin.so:0 - __cinit__() - DEBUG - Batches read: 0
2025-03-17 14:54:03,037 - MainThread CArrowChunkIterator.cpp:70 - next() - DEBUG - Current batch index: 0, rows in current batch: 1
(<NULL>, <NULL>, 6367339, 24, 65, 'kawabunga', 'kawabunga', 21293492, 8, 8, 15, 12, 231122, 9, 253, 'kawabunga', 'kawabunga', 21293492, 12, 25, 'kawabunga', 'kawabunga', 8580496, 0, 445, 'kawabunga', '', 9060691, 5, 5, 'kawabunga', 'kawabunga', 16366214, 10, 10, datetime.date(2025, 3, 13))
[1]    65519 segmentation fault  python segfault_parse.py
@github-actions github-actions bot changed the title Segmentation fault while parsing response with invalid datetime SNOW-1989239: Segmentation fault while parsing response with invalid datetime Mar 17, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Mar 17, 2025
@sfc-gh-dszmolka
Copy link
Contributor

hi there - thanks for reporting this issue and very appreciate for all these details and the sharp repro! 💯
this doesn't look too good; we're going to take a look.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed needs triage labels Mar 17, 2025
@inkoit
Copy link
Author

inkoit commented Mar 17, 2025

this doesn't look too good

Not good indeed. Thank you!

@sfc-gh-dszmolka
Copy link
Contributor

PR #2227 open for remediation

@sfc-gh-dszmolka sfc-gh-dszmolka added status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. and removed status-pr_pending_merge A PR is made and is under review labels Mar 28, 2025
@sfc-gh-dszmolka
Copy link
Contributor

PR is now merged and awaiting release.

a temporal check_arrow_conversion_error_on_every_column flag (default:True) has been also introduced, in any case someone is somehow dependant on the old behaviour; can set it to False to revert to the previous behaviour.

We'll remove this flag in the future and have only the new behaviour where the errors are immediately surfaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug status-fixed_awaiting_release The issue has been fixed, its PR merged, and now awaiting the next release cycle of the connector. status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants