Skip to content

bug: NVARCHAR(MAX) column fails on mssql connector #11191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
akanz1 opened this issue May 7, 2025 · 7 comments · May be fixed by #11202
Open
1 task done

bug: NVARCHAR(MAX) column fails on mssql connector #11191

akanz1 opened this issue May 7, 2025 · 7 comments · May be fixed by #11202
Labels
bug Incorrect behavior inside of ibis mssql The Microsoft SQL Server backend

Comments

@akanz1
Copy link
Contributor

akanz1 commented May 7, 2025

What happened?

After upgrading from 10.2.0 to 10.5.0 the dtype NVARCHAR(MAX) fails on mssql (likely also others are affected). previously this got resolved to a String() type.

The recently added handling of this type leads to a failure since in sql/datatypes.py:219 length.this.this yields 'MAX' which can not be places into an int() statement.

    @classmethod
    def _from_sqlglot_VARCHAR(
        cls, length: sge.DataTypeParam | None = None, nullable: bool | None = None
    ) -> dt.String:


        return dt.String(
            length=int(length.this.this) if length is not None else None,
            nullable=nullable,
        )

to make it work i've forked and fixed it like so:

    @classmethod
    def _from_sqlglot_VARCHAR(
        cls, length: sge.DataTypeParam | None = None, nullable: bool | None = None
    ) -> dt.String:
        if length is not None and length.this.this == "MAX":
            length = None
        return dt.String(
            length=int(length.this.this) if length is not None else None,
            nullable=nullable,
        )

i guess the following is even a bit nicer as it avoids the redundancy around None checks

@classmethod
def _from_sqlglot_VARCHAR(
    cls, length: sge.DataTypeParam | None = None, nullable: bool | None = None
) -> dt.String:
    length_value = None
    if length is not None and length.this.this != "MAX":
        length_value = int(length.this.this)
    return dt.String(
        length=length_value,
        nullable=nullable,
    )

What version of ibis are you using?

Ibis 10.5.0
python 3.11.11

What backend(s) are you using, if any?

No response

Relevant log output

File ".venv/lib/python3.11/site-packages/ibis/backends/sql/__init__.py", line 179, in sql
    schema = self._get_schema_using_query(query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/mssql/__init__.py", line 369, in _get_schema_using_query
    newtyp = self.compiler.type_mapper.from_string(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 195, in from_string
    return cls.to_ibis(sgtype, nullable=nullable)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 161, in to_ibis
    dtype = method(*typ.expressions, nullable=nullable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 219, in _from_sqlglot_VARCHAR
    length=int(length.this.this) if length is not None else None,
           ^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'MAX'

Code of Conduct

  • I agree to follow this project's Code of Conduct
@akanz1 akanz1 added the bug Incorrect behavior inside of ibis label May 7, 2025
@NickCrews NickCrews added the mssql The Microsoft SQL Server backend label May 9, 2025
@NickCrews
Copy link
Contributor

@akanz1 can you provide a more complete reproducer? The MSSSQL datatype converter already has the right logic, it just is the base converter type that has the bad implementation (as you point out). I am wondering if this bug is already fixed on main, so I want to use your reproducer to verify the bug is still present before I keep diving in.

For example, this does NOT raise an error on main:

import ibis
from ibis.expr import datatypes as dt
from ibis.backends.sql.datatypes import MSSQLType

a = dt.String(length=None)
sgt = MSSQLType.from_ibis(a)
b = MSSQLType.to_ibis(sgt)
assert a == b

@NickCrews
Copy link
Contributor

I don't think we need the entire schema, I think if you can print out the string argument passed to

File ".venv/lib/python3.11/site-packages/ibis/backends/mssql/__init__.py", line 369, in _get_schema_using_query
    newtyp = self.compiler.type_mapper.from_string(

then that would be sufficent.

I am trying to do

from ibis.backends.mssql import Backend as MSSQLBackend

MSSQLBackend.compiler.type_mapper.from_string("NVARCHAR(MAX)")

and that isn't giving me the error that you show.

@akanz1
Copy link
Contributor Author

akanz1 commented May 12, 2025

@NickCrews thank you for looking into this.

Not sure i understand correctly but heres the context at this location on main (rev=main#75c88fdbfcb750ef9f6c06ea6f92f05c02b482a8L.351):

query = "\n        SELECT\n          name,\n          is_nullable,\n          system_type_name,\n          precision,\n          scale,\n          error_number,\n          error_message\n        FROM sys.dm_exec_describe_first_result_set(N'WITH [sample_data] AS (SELECT TOP 1000000 * FROM [dbo].[my_table]) SELECT COUNT_BIG(*) FROM [sample_data]', NULL, 0)\n        ORDER BY column_ordinal\n        "

system_type_name = 'nvarchar(max)'
nullable = True

tsql = "'WITH [sample_data] AS (SELECT TOP 1000000 * FROM [dbo].[my_table]) SELECT COUNT_BIG(*) FROM [sample_data]'"
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/__init__.py", line 179, in sql
    schema = self._get_schema_using_query(query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/mssql/__init__.py", line 351, in _get_schema_using_query
    newtyp = self.compiler.type_mapper.from_string(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 195, in from_string
    return cls.to_ibis(sgtype, nullable=nullable)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 161, in to_ibis
    dtype = method(*typ.expressions, nullable=nullable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 219, in _from_sqlglot_VARCHAR
    length=int(length.this.this) if length is not None else None,
           ^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'MAX'

the problem still happens in the same place from what i can see where

length.this.this = 'MAX'

@NickCrews
Copy link
Contributor

Like what does your user code look like? You are calling my_msssql_connection.table("some_table_with_a_nvarcharmax_column")?

@akanz1
Copy link
Contributor Author

akanz1 commented May 12, 2025

Yes, just like that but using .sql()

sql_statement = 'SELECT TOP 1000000 * FROM [dbo].[my_table]'  # my_table has a column of type nvarchar(max), other nvarchar(some number) columns work

self.con = self.connect()  # is of type: ibis.backends.mssql.Backend
self.con.sql(sql_statement)

yields

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/__init__.py", line 179, in sql
    schema = self._get_schema_using_query(query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/mssql/__init__.py", line 351, in _get_schema_using_query
    newtyp = self.compiler.type_mapper.from_string(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 195, in from_string
    return cls.to_ibis(sgtype, nullable=nullable)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 161, in to_ibis
    dtype = method(*typ.expressions, nullable=nullable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/ibis/backends/sql/datatypes.py", line 219, in _from_sqlglot_VARCHAR
    length=int(length.this.this) if length is not None else None,
           ^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'MAX'

@NickCrews
Copy link
Contributor

NickCrews commented May 12, 2025

Ah, I see. In SqlGlotType, we do

    @classmethod
    def _from_sqlglot_VARCHAR(
        cls, length: sge.DataTypeParam | None = None, nullable: bool | None = None
    ) -> dt.String:
        return dt.String(
            length=int(length.this.this) if length is not None else None,
            nullable=nullable,
        )

    _from_sqlglot_NVARCHAR = _from_sqlglot_NCHAR = _from_sqlglot_CHAR = (
        _from_sqlglot_FIXEDSTRING
    ) = _from_sqlglot_VARCHAR

which does too early of binding of the NVARCHAR converter. Even though the MSSQLType._from_sqlglot_NVARCHAR is called, this resolved to SqlGlotType._from_sqlglot_VARCHAR, not MSSQLType._from_sqlglot_VARCHAR. Fix incoming.

@akanz1
Copy link
Contributor Author

akanz1 commented May 28, 2025

Any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis mssql The Microsoft SQL Server backend
Projects
Status: backlog
Development

Successfully merging a pull request may close this issue.

2 participants