Skip to content

Commit d113091

Browse files
authored
0.8.0 Release (#400)
* initial new JSON spiking * Basic Variant type reads * Checkpoint for JSON data type reads * Checkpoint for JSON data type reads * Checkpoint for JSON data type reads * Some lint and test cleanup * Exclude Python 3.13 testing * Clean up TLS test configuration * Fix tls test lint * Cloud test fixes * Fix lint * Add HTTP streaming buffer * Improve variant type handling * Add new mechanism for DateTime64 binding * Add tls_mode client parameter, update changelog
1 parent b90cdf9 commit d113091

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1045
-530
lines changed

.docker/clickhouse/single_node/config.xml

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -37,24 +37,5 @@
3737
<table>session_log</table>
3838
</session_log>
3939

40-
<http_options_response>
41-
<header>
42-
<name>Access-Control-Allow-Origin</name>
43-
<value>*</value>
44-
</header>
45-
<header>
46-
<name>Access-Control-Allow-Headers</name>
47-
<value>accept, origin, x-requested-with, content-type, authorization</value>
48-
</header>
49-
<header>
50-
<name>Access-Control-Allow-Methods</name>
51-
<value>POST, GET, OPTIONS</value>
52-
</header>
53-
<header>
54-
<name>Access-Control-Max-Age</name>
55-
<value>86400</value>
56-
</header>
57-
</http_options_response>
58-
5940
<custom_settings_prefixes>SQL_</custom_settings_prefixes>
6041
</clickhouse>

.docker/clickhouse/single_node_tls/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM clickhouse/clickhouse-server:24.3-alpine
1+
FROM clickhouse/clickhouse-server:24.8-alpine
22
COPY .docker/clickhouse/single_node_tls/certificates /etc/clickhouse-server/certs
33
RUN chown clickhouse:clickhouse -R /etc/clickhouse-server/certs \
44
&& chmod 600 /etc/clickhouse-server/certs/* \

.docker/clickhouse/single_node_tls/config.xml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,6 @@
4343
<partition_by>toYYYYMM(event_date)</partition_by>
4444
<flush_interval_milliseconds>1000</flush_interval_milliseconds>
4545
</query_log>
46+
47+
<custom_settings_prefixes>SQL_</custom_settings_prefixes>
4648
</clickhouse>

.github/workflows/clickhouse_ci.yml

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
- name: "Add distribution info" # This lets SQLAlchemy find entry points
2929
run: python setup.py develop
3030

31-
- name: run ClickHouse Cloud SMT tests
31+
- name: run ClickHouse Cloud tests
3232
env:
3333
CLICKHOUSE_CONNECT_TEST_PORT: 8443
3434
CLICKHOUSE_CONNECT_TEST_CLOUD: 'True'
@@ -42,18 +42,3 @@ jobs:
4242
run: pytest tests/integration_tests
4343
- name: remove latest container
4444
run: docker compose down -v
45-
46-
- name: run ClickHouse Cloud tests
47-
env:
48-
CLICKHOUSE_CONNECT_TEST_PORT: 8443
49-
CLICKHOUSE_CONNECT_TEST_INSERT_QUORUM: 3
50-
CLICKHOUSE_CONNECT_TEST_HOST: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_HOST }}
51-
CLICKHOUSE_CONNECT_TEST_PASSWORD: ${{ secrets.INTEGRATIONS_TEAM_TESTS_CLOUD_PASSWORD }}
52-
run: pytest tests/integration_tests
53-
54-
- name: Run ClickHouse Container (HEAD)
55-
run: CLICKHOUSE_VERSION=head docker compose up -d clickhouse
56-
- name: Run HEAD tests
57-
run: pytest tests/integration_tests
58-
- name: remove head container
59-
run: docker compose down -v

.github/workflows/on_push.yml

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,15 @@ jobs:
5858
strategy:
5959
matrix:
6060
python-version:
61-
- '3.8'
6261
- '3.9'
6362
- '3.10'
6463
- '3.11'
6564
- '3.12'
6665
clickhouse-version:
67-
- '23.8'
68-
- '23.12'
69-
- '24.1'
70-
- '24.2'
7166
- '24.3'
67+
- '24.6'
68+
- '24.7'
69+
- '24.8'
7270
- latest
7371

7472
name: Local Tests Py=${{ matrix.python-version }} CH=${{ matrix.clickhouse-version }}
@@ -99,14 +97,11 @@ jobs:
9997
sudo echo "127.0.0.1 server1.clickhouse.test" | sudo tee -a /etc/hosts
10098
- name: Run tests
10199
env:
100+
CLICKHOUSE_CONNECT_TEST_TLS: 1
102101
CLICKHOUSE_CONNECT_TEST_DOCKER: 'False'
103102
CLICKHOUSE_CONNECT_TEST_FUZZ: 50
104103
SQLALCHEMY_SILENCE_UBER_WARNING: 1
105104
run: pytest tests
106-
- name: Run TLS tests
107-
env:
108-
CLICKHOUSE_CONNECT_TEST_TLS: 1
109-
run: pytest tests/tls
110105

111106
check-secret:
112107
runs-on: ubuntu-latest

CHANGELOG.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,75 @@
33
### WARNING -- Impending Breaking Change - Server Settings in DSN
44
When creating a DBAPI Connection method using the Connection constructor or a SQLAlchemy DSN, the library currently
55
converts any unrecognized keyword argument/query parameter to a ClickHouse server setting. Starting in the next minor
6-
release (0.8.0), unrecognized arguments/keywords for these methods of creating a DBAPI connection will raise an exception
6+
release (0.9.0), unrecognized arguments/keywords for these methods of creating a DBAPI connection will raise an exception
77
instead of being passed as ClickHouse server settings. This is in conjunction with some refactoring in Client construction.
88
The supported method of passing ClickHouse server settings is to prefix such arguments/query parameters with`ch_`.
99

10+
## 0.8.0, 2024-09-26
11+
### Experimental Feature - "New" JSON/Dynamic/Variant DataTypes
12+
#### Usage Notes
13+
- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other
14+
forms of JSON data are not supported
15+
- Valid formats for the JSON type are 'native', which returns a Python dictionary, or 'string', which returns a JSON string
16+
- Any value can be inserted into a Variant column, and ClickHouse will try to correctly determine the correct Variant
17+
Type for the value, based on its String representation.
18+
- More complete documentation for the new types will be provided in the future.
19+
20+
#### Known limitations:
21+
- Each of these types must be enabled in the ClickHouse settings before using. The "new" JSON type is available started
22+
with the 24.8 release
23+
- Returned JSON objects will only return the `max_dynamic_paths` number of elements (which defaults to 1024). This
24+
will be fixed in a future release.
25+
- Inserts into `Dynamic` columns will always be the String representation of the Python value. This will be fixed
26+
in a future release.
27+
- The implementation for the new types has not been optimized in C code, so performance may be somewhat slower than for
28+
simpler, established data types.
29+
30+
This is the first time that a new `clickhouse_connect` features has been labeled "experimental", but these new
31+
datatypes are complex and still experimental in ClickHouse server. Current test coverage for these types is also
32+
quite limited. Please don't hesitate to report issues with the new types.
33+
34+
### Bug Fixes
35+
- When operating ClickHouse Server in `strict` TLS mode, HTTPS connections [require](https://github.com/ClickHouse/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/Context.h#L84-L89) a client certificate even if that
36+
certificate is not used for authentication. A new client parameter `tls_mode='strict'` can be used in this situation where
37+
username/password authentication is being used with client certificates. Other valid values for the new `tls_mode` setting
38+
are `'proxy'` when TLS termination occurs at a proxy, and `'mutual'` to specify mutual TLS authentication is used by
39+
the ClickHouse server. If `tls_mode` is not set, and a client certificate and key are provided, `mutual` is assumed.
40+
- The server timezone was not being used for parameter binding if parameters were sent as a list instead of a dictionary.
41+
This should fully fix the reopened https://github.com/ClickHouse/clickhouse-connect/issues/377.
42+
- String port numbers (such as from environmental variables) are now correctly interpreted to determine the correct interface/protocol.
43+
Fixes https://github.com/ClickHouse/clickhouse-connect/issues/395
44+
- Insert commands with a `SELECT FROM ... LIMIT 0` will no longer raise an exception. Closes https://github.com/ClickHouse/clickhouse-connect/issues/389.
45+
46+
### Improvements
47+
- Some low level errors for problems with Native format inserts and queries now include the relevant column name in the
48+
error message. Thanks to [Angus Holder](https://github.com/angusholder) for the PR!
49+
- There is a new intermediate buffer for HTTP streaming/chunked queries. The buffer will store raw data from the HTTP request
50+
until it is actually requested in a stream. This allows some lag between reading the data from ClickHouse and processing
51+
the same data. Previously, if processing the data stream fell 30 seconds behind the ClickHouse HTTP writes to the stream,
52+
the ClickHouse server would close the connection, aborting the query and stream processing. This will now be mitigated by
53+
storing the data stream in the new intermediate buffer. By default, this buffer is set to 10 megabytes, but for slow
54+
processing of large queries where memory is not an issue, the buffer size can be increasing using the new `common` setting
55+
`http_buffer_size`. This is a fix in some cases of https://github.com/ClickHouse/clickhouse-connect/issues/399, but note that
56+
slow processing of large queries will still cause connection and processing failures if the data cannot be buffered.
57+
- It is now possible to correctly bind `DateTime64` type parameters when calling Client `query` methods through one of two approaches:
58+
- Wrap the Python `datetime.datetime` value in the new DT64Param class, e.g.
59+
```python
60+
query = 'SELECT {p1:DateTime64(3)}' # Server side binding with dictionary
61+
parameters={'p1': DT64Param(dt_value)}
62+
63+
query = 'SELECT %s as string, toDateTime64(%s,6) as dateTime' # Client side binding with list
64+
parameters=['a string', DT64Param(datetime.now())]
65+
```
66+
- If using a dictionary of parameter values, append the string `_64` to the parameter name
67+
```python
68+
query = 'SELECT {p1:DateTime64(3)}, {a1:Array(DateTime(3))}' # Server side binding with dictionary
69+
70+
parameters={'p1_64': dt_value, 'a1_64': [dt_value1, dt_value2]}
71+
```
72+
This closes https://github.com/ClickHouse/clickhouse-connect/issues/396, see also the similar issue https://github.com/ClickHouse/clickhouse-connect/issues/212
73+
74+
1075
## 0.7.19, 2024-08-23
1176
### Bug Fix
1277
- Insertion of large strings was triggering an exception. This has been fixed.

clickhouse_connect/__version__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
version = '0.7.19'
1+
version = '0.8.0'

clickhouse_connect/cc_sqlalchemy/datatypes/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
from clickhouse_connect.datatypes.base import ClickHouseType, TypeDef, EMPTY_TYPE_DEF
77
from clickhouse_connect.datatypes.registry import parse_name, type_map
8-
from clickhouse_connect.driver.query import str_query_value
8+
from clickhouse_connect.driver.binding import str_query_value
99

1010
logger = logging.getLogger(__name__)
1111

clickhouse_connect/cc_sqlalchemy/ddl/custom.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from sqlalchemy.sql.ddl import DDL
22
from sqlalchemy.exc import ArgumentError
33

4-
from clickhouse_connect.driver.query import quote_identifier
4+
from clickhouse_connect.driver.binding import quote_identifier
55

66

77
# pylint: disable=too-many-ancestors,abstract-method
@@ -31,7 +31,7 @@ def __init__(self, name: str, engine: str = None, zoo_path: str = None, shard_na
3131
super().__init__(stmt)
3232

3333

34-
# pylint: disable=too-many-ancestors,abstract-method
34+
# pylint: disable=too-many-ancestors,abstract-method
3535
class DropDatabase(DDL):
3636
"""
3737
Alternative DDL statement for built in SqlAlchemy DropSchema DDL class

clickhouse_connect/cc_sqlalchemy/dialect.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from clickhouse_connect.cc_sqlalchemy.sql.ddlcompiler import ChDDLCompiler
99
from clickhouse_connect.cc_sqlalchemy import ischema_names, dialect_name
1010
from clickhouse_connect.cc_sqlalchemy.sql.preparer import ChIdentifierPreparer
11-
from clickhouse_connect.driver.query import quote_identifier, format_str
11+
from clickhouse_connect.driver.binding import quote_identifier, format_str
1212

1313

1414
# pylint: disable=too-many-public-methods,no-self-use,unused-argument

clickhouse_connect/cc_sqlalchemy/sql/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
from sqlalchemy import Table
44

5-
from clickhouse_connect.driver.query import quote_identifier
5+
from clickhouse_connect.driver.binding import quote_identifier
66

77

88
def full_table(table_name: str, schema: Optional[str] = None) -> str:

clickhouse_connect/cc_sqlalchemy/sql/ddlcompiler.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from sqlalchemy.sql.compiler import DDLCompiler
33

44
from clickhouse_connect.cc_sqlalchemy.sql import format_table
5-
from clickhouse_connect.driver.query import quote_identifier
5+
from clickhouse_connect.driver.binding import quote_identifier
66

77

88
class ChDDLCompiler(DDLCompiler):

clickhouse_connect/cc_sqlalchemy/sql/preparer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from sqlalchemy.sql.compiler import IdentifierPreparer
22

3-
from clickhouse_connect.driver.query import quote_identifier
3+
from clickhouse_connect.driver.binding import quote_identifier
44

55

66
class ChIdentifierPreparer(IdentifierPreparer):

clickhouse_connect/common.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,3 +81,6 @@ def _init_common(name: str, options: Sequence[Any], default: Any):
8181
_init_common('use_protocol_version', (True, False), True)
8282

8383
_init_common('max_error_size', (), 1024)
84+
85+
# HTTP raw data buffer for streaming queries. This should not be reduced below 64KB to ensure compatibility with LZ4 compression
86+
_init_common('http_buffer_size', (), 10 * 1024 * 1024)

clickhouse_connect/datatypes/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,6 @@
44
import clickhouse_connect.datatypes.special
55
import clickhouse_connect.datatypes.string
66
import clickhouse_connect.datatypes.temporal
7+
import clickhouse_connect.datatypes.dynamic
78
import clickhouse_connect.datatypes.registry
9+
import clickhouse_connect.datatypes.postinit

clickhouse_connect/datatypes/base.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
from abc import ABC
55
from math import log
6-
from typing import NamedTuple, Dict, Type, Any, Sequence, MutableSequence, Optional, Union, Collection
6+
from typing import NamedTuple, Dict, Type, Any, Sequence, MutableSequence, Union, Collection
77

88
from clickhouse_connect.driver.common import array_type, int_size, write_array, write_uint64, low_card_version
99
from clickhouse_connect.driver.context import BaseQueryContext
@@ -94,6 +94,10 @@ def name(self):
9494
name = f'{wrapper}({name})'
9595
return name
9696

97+
@property
98+
def insert_name(self):
99+
return self.name
100+
97101
def data_size(self, sample: Sequence) -> int:
98102
if self.low_card:
99103
values = set(sample)
@@ -104,10 +108,13 @@ def data_size(self, sample: Sequence) -> int:
104108
d_size += 1
105109
return d_size
106110

107-
def _data_size(self, _sample: Collection) -> int:
111+
def _data_size(self, sample: Collection) -> int:
108112
if self.byte_size:
109113
return self.byte_size
110-
return 0
114+
total = 0
115+
for x in sample:
116+
total += len(str(x))
117+
return total / len(sample) + 1
111118

112119
def write_column_prefix(self, dest: bytearray):
113120
"""
@@ -119,7 +126,7 @@ def write_column_prefix(self, dest: bytearray):
119126
if self.low_card:
120127
write_uint64(low_card_version, dest)
121128

122-
def read_column_prefix(self, source: ByteSource):
129+
def read_column_prefix(self, source: ByteSource, _ctx: QueryContext):
123130
"""
124131
Read the low cardinality version. Like the write method, this has to happen immediately for container classes
125132
:param source: The native protocol binary read buffer
@@ -139,7 +146,7 @@ def read_column(self, source: ByteSource, num_rows: int, ctx: QueryContext) -> S
139146
:param ctx: QueryContext for query specific settings
140147
:return: The decoded column data as a sequence and the updated location pointer
141148
"""
142-
self.read_column_prefix(source)
149+
self.read_column_prefix(source, ctx)
143150
return self.read_column_data(source, num_rows, ctx)
144151

145152
def read_column_data(self, source: ByteSource, num_rows: int, ctx: QueryContext) -> Sequence:
@@ -274,18 +281,11 @@ def _write_column_low_card(self, column: Sequence, dest: bytearray, ctx: InsertC
274281
write_uint64(len(index), dest)
275282
self._write_column_binary(index, dest, ctx)
276283
write_uint64(len(keys), dest)
277-
write_array(array_type(1 << ix_type, False), keys, dest, ctx)
284+
write_array(array_type(1 << ix_type, False), keys, dest, ctx.column_name)
278285

279286
def _active_null(self, _ctx: QueryContext) -> Any:
280287
return None
281288

282-
def _first_value(self, column: Sequence) -> Optional[Any]:
283-
if self.nullable:
284-
return next((x for x in column if x is not None), None)
285-
if len(column):
286-
return column[0]
287-
return None
288-
289289

290290
EMPTY_TYPE_DEF = TypeDef()
291291
NULLABLE_TYPE_DEF = TypeDef(wrappers=('Nullable',))
@@ -338,7 +338,7 @@ def _finalize_column(self, column: Sequence, ctx: QueryContext) -> Sequence:
338338
def _write_column_binary(self, column: Union[Sequence, MutableSequence], dest: bytearray, ctx: InsertContext):
339339
if len(column) and self.nullable:
340340
column = [0 if x is None else x for x in column]
341-
write_array(self._array_type, column, dest, ctx)
341+
write_array(self._array_type, column, dest, ctx.column_name)
342342

343343
def _active_null(self, ctx: QueryContext):
344344
if ctx.as_pandas and ctx.use_extended_dtypes:

0 commit comments

Comments
 (0)