MSSQL support #588

zacqed · 2023-04-15T04:27:49Z

I was reading the documentation and open issues to check on MSSQL implementation. Let me know if there is a plan in place to support it.

Thanks

lidavidm · 2023-04-15T04:50:08Z

As far as I am aware, nobody is actively working on it, but SQL Server support would be welcome. The catch I think would be that FreeTDS is GPL licensed and so development of an ADBC driver using it wouldn't be able to happen here.

lidavidm · 2023-11-02T13:49:50Z

Now that we can develop drivers in Go and use them in Python/R/C++/etc, we could use https://github.com/microsoft/go-mssqldb (the official Microsoft MSSQL client) which is BSD licensed.

zacqed · 2023-11-02T14:27:34Z

That ia good to hear.

…

On Thu, Nov 2, 2023, 7:20 PM David Li ***@***.***> wrote: Now that we can develop drivers in Go and use them in Python/R/C++/etc, we could use https://github.com/microsoft/go-mssqldb (the official Microsoft MSSQL client) which is BSD licensed. — Reply to this email directly, view it on GitHub <#588 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMZCR5DJCCHJI6L3J7ESUDYCOQITAVCNFSM6AAAAAAW7FDIJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJQG43TKNZYGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

davlee1972 · 2024-01-19T20:28:38Z

Not to hijack this, but could we also work on Sybase support at the same time?
https://github.com/SAP/go-ase

Both products support FreeTDS.

The initial version of MS-SQL was a Unix port of Sybase for Windows..

SQL Server 6.0 is released in 1995, marking the end of collaboration with Sybase; Sybase would continue developing their own variant of SQL Server, Sybase Adaptive Server Enterprise, independently of Microsoft.

SQL Server 2005, released in 2005, finishes the complete revision of the old Sybase code into Microsoft code.

SQL Server 2017, released in 2017, adds Linux support for these Linux platforms: Red Hat Enterprise Linux, SUSE Linux Enterprise Server, Ubuntu & Docker Engine.[4]

sdebruyn · 2025-04-11T15:17:38Z

Since the community of Microsoft database users would probably be more comfortable in C#, I'd wonder if we could also write the driver in C#.

Currently, new drivers can be written in C#, C/C++, Go, and Java. A driver written in C/C++ or Go can be used from either of those languages, as well as C#, Python, R, and Ruby. (C# can experimentally export drivers to the same set of languges as well.) The Rust API definitions for ADBC are still under development, but we plan for them to be on par with C#, C/C++, and Go in this respect.

It is so far preferable to write new drivers in Go. C/C++ have had issues with dependencies and in particular some not uncommon dependencies in that ecosystem tend to cause conflicts when loaded into Python processes and elsewhere. (For example, the ADBC Flight SQL driver was originally written in C++ but would have conflicted with the grpcio and pyarrow Python packages.) It also tends to be easier for us to package and distribute additional Go libraries than it is for C/C++.

As with Go, a native C# client already exists: https://www.nuget.org/packages/microsoft.data.sqlclient / https://github.com/dotnet/SqlClient

It is not fully clear to me what the consequences would be of the language choice.

lidavidm · 2025-04-12T01:41:11Z

@CurtHagenlocher correct me if I'm wrong but in theory we can package C#/.NET drivers as native DLLs/SOs/DYLIBs too so that could work (with possibly the same caveats as Go, where on some platforms the runtime may conflict if there are multiple drivers using .NET). (I'm not looking forward to setting up that packaging build, but...)

zeroshade · 2025-04-12T01:48:02Z

My preference would probably be to use the Go library as a base since we already have the packaging set up and so on.

But that's also personal preference haha. I think the hardest part will probably be having some sort of integration testing because I don't know of any option other than actually having an MSSQL DB instance to test against.

lidavidm · 2025-04-12T01:49:04Z

MSSQL is easy to test! There's a docker container and a free tier.

zeroshade · 2025-04-12T01:50:13Z

Oooo! Didn't know that! Then that seems like a good candidate...

CurtHagenlocher · 2025-04-13T15:45:46Z

Apologies in advance; this reply is going to be all over the place...

It's definitely possible in principle to take C# source and use "AOT" compilation to produce a native dynamic library. The main blocker for doing this with ADBC is a gap in the C# Arrow libraries. This gap is addressed by apache/arrow#40992, which I haven't yet had time to analyze. Once that's done, the C# implementations of Spark and Databricks ADBC drivers can be AOT-compiled and consumed using the C APIs.

But this underscores the point that not all C# code is compatible with AOT compilation. The Google code we're using in the C# BigQuery implementation, for instance, uses Reflection in a way that is not. Now I'd vaguely expect the latest C# SqlClient implementation to be AOT-compatible due to its relative importance in the ecosystem -- but pretty much the only way to determine this conclusive is to try it.

From a perhaps overly-selfish perspective, the only personal benefit I would see from an MSSQL ADBC driver is if it offered better performance than using SqlClient via its ADO.NET API. But because there's no way to build an ADBC driver on top of SqlClient without using its ADO.NET API, there's not actually a way to achieve better performance. And in fact, if the ultimate consumer is more .NET code then it will likely be somewhat worse in that the strings would be converted from UTF-16 to UTF-8 for the Arrow API and then back to UTF-16 for the .NET consumer. (It's possible that this would be an inefficiency no matter how it's implemented if the TDS protocol specifies that "wide" strings are UTF-16.)

Now just because I don't get any value out of this doesn't mean it's not worthwhile to have it. After all, one of the goals of this project is to establish ADBC as a more widely-used standard, and that requires drivers. But if the driver is strictly worse than what you get with other technologies, then at best it's a stopgap to help bootstrap an ecosystem rather than something valuable in its own right.

To that end, I think it would be more interesting to build a generic ADO.NET-to-ADBC wrapper -- just like we have the ADBC-to-ADO.NET wrapper already in the codebase. Such a thing could be used not only with SqlClient but with Oracle's ADO.NET driver, Teradata's ADO.NET driver, the MySql ADO.NET driver, npgsql, etc. It would be the C# equivalent of an ADBC wrapper on top of ODBC, but without all that sketchy C code making people nervous ;). And if the underlying driver happened to support AOT compilation, then so would the combination of the driver and the wrapper.

This isn't to say that I'm not interested in an MSSQL ADBC driver; far from it! Like the other row-oriented database protocols, ADO.NET is pretty inefficient in the inner loop. To this, it often adds overhead in terms of boxing and other taxes. But of course, you can't fix that by building on the outside, only by changing the internals. I'd previously considered forking the SqlClient code and changing it so that we can fill Arrow structures directly instead of going through an extra layer. I'd also considered investigating the use of the "tiberius" crate to build a Rust-based ADBC driver. (I both prefer the Rust language to Go and appreciate its lack of a runtime for use in interop scenarios.) But I've also got several lifetimes of side projects I'd like to do and something of a problem staying focused on any one of them, so ... .

It's also occurred to me that my "real" goal isn't so much "ADBC everywhere" as it is "Arrow everywhere", and ADBC is more of a means to an end. From that perspective, one could argue that it's sufficient (and probably lower-cost) to modify existing database APIs to support Arrow instead of using a new API. So perhaps the C# Arrow project could define the following:

interface IDbArrowCommand
{
    Task<IArrowArrayStream> ExecuteReaderAsync(CancellationToken cancellationToken);
    Task<IArrowArrayStreams> ExecuteMultipleReaderAsync(CancellationToken cancellationToken);
}

interface IArrowArrayStreams
{
    Task<IArrowArrayStream> GetNextResult(CancellationToken cancellationToken);
}

And then SqlClient or any other ADO.NET provider could be modified such that the DbCommand implementation also implements IDbArrowCommand, and a consumer could check for that support by doing a cast. And at that point, an ADBC wrapper arguably becomes more trivial to implement.

lidavidm · 2025-04-13T23:35:02Z

To that end, I think it would be more interesting to build a generic ADO.NET-to-ADBC wrapper -- just like we have the ADBC-to-ADO.NET wrapper already in the codebase.

That's reasonable to me.

I'd also considered investigating the use of the "tiberius" crate to build a Rust-based ADBC driver. (I both prefer the Rust language to Go and appreciate its lack of a runtime for use in interop scenarios.)

I also slightly prefer this but we do have the infra already set up for Go-based drivers. On the other hand this and/or an ODBC wrapper might be a good excuse to set up the same for Rust. (I suppose it depends on whether tiberius or go-mssqldb is more maintained/complete, though I just noticed that go-mssqldb has gone 2 years without development...)

It's also occurred to me that my "real" goal isn't so much "ADBC everywhere" as it is "Arrow everywhere", and ADBC is more of a means to an end. From that perspective, one could argue that it's sufficient (and probably lower-cost) to modify existing database APIs to support Arrow instead of using a new API.

That would also be desirable, and one of the things I'd like to do someday is refactor the Dremio JDBC driver to support (1) working with ADBC more generally, not just Flight SQL and (2) having a way to get the Arrow data even while using the JDBC APIs

sdebruyn · 2025-05-05T19:49:51Z

But this underscores the point that not all C# code is compatible with AOT compilation. The Google code we're using in the C# BigQuery implementation, for instance, uses Reflection in a way that is not. Now I'd vaguely expect the latest C# SqlClient implementation to be AOT-compatible due to its relative importance in the ecosystem -- but pretty much the only way to determine this conclusive is to try it.

So maybe the go driver is a better place to start from, since it will always be able to produce AOT compiled binaries?

From a perhaps overly-selfish perspective, the only personal benefit I would see from an MSSQL ADBC driver is if it offered better performance than using SqlClient via its ADO.NET API.

There are tools which exclusively support ADBC drivers. So even to have it available for compatibility reasons could be good enough.

And if the underlying driver happened to support AOT compilation, then so would the combination of the driver and the wrapper.

That's a lot of ifs :)

Is the reason why you'd not consider the go driver pure personal preference?

CurtHagenlocher · 2025-05-05T20:10:30Z

Is the reason why you'd not consider the go driver pure personal preference?

The cost of having the Go runtime in the same process as the CLR doesn't outweigh (for me) the benefit of a single API. (And it wouldn't even be a single API because there would still be other connectors based on ADO.NET.) There's also the problem described in #1841 to consider.

paleolimbot · 2025-05-05T20:47:25Z

There's also the problem described in #1841 to consider.

I do think that continuing to build Go-based drivers on undefined behaviour should be considered.

I both prefer the Rust language to Go and appreciate its lack of a runtime for use in interop scenarios.

Just a note that I'm happy to review Rust code and that I'm hoping to upstream some of our recent work in the next month or two. There are some sorely needed helpers that need writing (e.g., for options and catalogs) but the packaging is dirt easy and arrow-rs has great helpers to make the tests easy.

lidavidm · 2025-05-05T22:08:49Z

I did give Rust a shot recently FWIW, but given the comments in #2694 I couldn't figure out a way to make the lifetimes more relaxed to match what Felipe was requesting there (tiberius ties the result set's lifetime to the connection's and I couldn't figure out a way to defer that check to runtime; tiberius essentially forces you into doing the same)

lidavidm · 2025-05-05T22:10:09Z

Rust also has its own runtime issues to contend with (right now the Datafusion driver for instance unilaterally embeds its own copy of Tokio and it's not clear to me if Rust wouldn't end up with the same symbol clash/runtime clash issues as Go and C++ in the end)

paleolimbot · 2025-05-06T05:22:35Z

I couldn't figure out a way to make the lifetimes more relaxed to match what Felipe was requesting there

This probably requires some unsafe {} or unsafe impl or maybe a separate method that some drivers can choose not to implement until we figure it out. Feel free to put up a PR and I can try to fight with the compiler to make something work 🙂

it's not clear to me if Rust wouldn't end up with the same symbol clash/runtime clash issues as Go and C++ in the end

There's quite a bit of Rust Python packaging as prior art with respect to symbol clashes (I'm not aware of this being a problem)...as far as I know multiple tokio runtimes are not undefined behaviour (more like...supported, but creating more than one incurs some overhead).

(None of this should dissuade anybody from writing any Go! Just an excited Rust user over here 🙂 )

CurtHagenlocher · 2025-05-06T14:07:42Z

Mixing environments and interacting over a C API is always going to have elements of inefficiency and/or danger: multiple allocators, multiple thread pools, conflicting imported symbols, etc., each unaware that the others exist. The one I worry about the most -- perhaps without sufficient grounds -- is having multiple garbage-collected heaps in the same process. This is because the one I have the most experience with (in the .NET CLR), the signals used to drive garbage collection haven't typically "played well with others". I don't know how Go's GC works. If it's more like Python where the primary signal for deallocation is a refcount going to zero and the formal GC is largely "just" to clean up cycles, then my concern may be less well founded.

zeroshade · 2025-05-06T22:51:26Z

Go's garbage collector is a concurrent mark-and-sweep. Rather than maintaining refcounts, the GC goroutine will periodically look at the current heap size (with various settings configurable via env vars) and then trace the live heap objects to mark things that are reachable, and then do a sweep to clean up anything that is unreachable.

You can see it all explained here. Rather than being "signaled" there's just a goroutine that concurrently runs (and is scheduled by the runtime) doing GC work when there is work to do (or isn't scheduled/is off when there's no GC-related work to do). The cost of this is based on Go-owned heap memory. While the Go drivers, when building the shared-library objects, utilize an allocator that utilizes C-owned memory to pass data through the C interface (i.e. the go runtime doesn't know anything about the Arrow data or anything we construct with the Allocator, and therefore the Go GC doesn't scan it).

This is to ensure it's safe to pass the memory through the FFI boundaries without fighting the Go garbage collector, or building up a large memory base that it has to manage (as the lifetime of the memory will be managed by the consumer calling the release callback.

While there's definitely some questions surrounding multiple Go-based drivers in the same process that need to be addressed (the same issue would arise with any drivers that require protobuf/gRPC, or Rust async runtimes), I don't see any larger issue beyond this.

paleolimbot · 2025-05-07T00:08:54Z

I don't see any larger issue beyond this.

I think my main concern is the undefined behaviour bit...it happens to work today but I don't believe there's any guarantee it will continue to work?

davlee1972 mentioned this issue May 14, 2024

ADBC GO Driver Support microsoft/go-mssqldb#191

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSSQL support #588

MSSQL support #588

zacqed commented Apr 15, 2023

lidavidm commented Apr 15, 2023

lidavidm commented Nov 2, 2023

zacqed commented Nov 2, 2023 via email

davlee1972 commented Jan 19, 2024 •

edited

Loading

sdebruyn commented Apr 11, 2025 •

edited

Loading

lidavidm commented Apr 12, 2025

zeroshade commented Apr 12, 2025

lidavidm commented Apr 12, 2025

zeroshade commented Apr 12, 2025

CurtHagenlocher commented Apr 13, 2025

lidavidm commented Apr 13, 2025

sdebruyn commented May 5, 2025

CurtHagenlocher commented May 5, 2025

paleolimbot commented May 5, 2025

lidavidm commented May 5, 2025

lidavidm commented May 5, 2025 •

edited

Loading

paleolimbot commented May 6, 2025

CurtHagenlocher commented May 6, 2025

zeroshade commented May 6, 2025

paleolimbot commented May 7, 2025

MSSQL support #588

MSSQL support #588

Comments

zacqed commented Apr 15, 2023

lidavidm commented Apr 15, 2023

lidavidm commented Nov 2, 2023

zacqed commented Nov 2, 2023 via email

davlee1972 commented Jan 19, 2024 • edited Loading

sdebruyn commented Apr 11, 2025 • edited Loading

lidavidm commented Apr 12, 2025

zeroshade commented Apr 12, 2025

lidavidm commented Apr 12, 2025

zeroshade commented Apr 12, 2025

CurtHagenlocher commented Apr 13, 2025

lidavidm commented Apr 13, 2025

sdebruyn commented May 5, 2025

CurtHagenlocher commented May 5, 2025

paleolimbot commented May 5, 2025

lidavidm commented May 5, 2025

lidavidm commented May 5, 2025 • edited Loading

paleolimbot commented May 6, 2025

CurtHagenlocher commented May 6, 2025

zeroshade commented May 6, 2025

paleolimbot commented May 7, 2025

davlee1972 commented Jan 19, 2024 •

edited

Loading

sdebruyn commented Apr 11, 2025 •

edited

Loading

lidavidm commented May 5, 2025 •

edited

Loading