Skip to content

Commit bac3a10

Browse files
authored
Merge branch 'main' into string-arguments-for-codecs
2 parents 7a5bc66 + 0dd797f commit bac3a10

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+232
-95
lines changed

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,22 @@ body:
5757
id: reproduce
5858
attributes:
5959
label: Steps to reproduce
60-
description: Minimal, reproducible code sample, a copy-pastable example if possible.
60+
description: Minimal, reproducible code sample. Must list dependencies in [inline script metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/#example). When put in a file named `issue.py` calling `uv run issue.py` should show the issue.
61+
value: |
62+
```python
63+
# /// script
64+
# requires-python = ">=3.11"
65+
# dependencies = [
66+
# "zarr@git+https://github.com/zarr-developers/zarr-python.git@main",
67+
# ]
68+
# ///
69+
#
70+
# This script automatically imports the development branch of zarr to check for issues
71+
72+
import zarr
73+
# your reproducer code
74+
# zarr.print_debug_info()
75+
```
6176
validations:
6277
required: true
6378
- type: textarea

changes/2950.bufgix.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

changes/2962.fix.rst

Lines changed: 0 additions & 1 deletion
This file was deleted.

changes/3021.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Implemented ``move`` for ``LocalStore`` and ``ZipStore``. This allows users to move the store to a different root path.

changes/3068.bugfix.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Trying to open an array with ``mode='r'`` when the store is not read-only now raises an error.

changes/3081.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Adds ``fill_value`` to the list of attributes displayed in the output of the ``AsyncArray.info()`` method.

changes/3082.feature.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Use :py:func:`numpy.zeros` instead of :py:func:`np.full` for a performance speedup when creating a `zarr.core.buffer.NDBuffer` with `fill_value=0`.

docs/release-notes.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Bugfixes
3131
To reproduce the behaviour in previous zarr-python versions when ``compressor=None`` was passed, pass ``compressor='auto'`` instead. (:issue:`3039`)
3232
- Fixed the typing of ``dimension_names`` arguments throughout so that it now accepts iterables that contain `None` alongside `str`. (:issue:`3045`)
3333
- Using various functions to open data with ``mode='a'`` no longer deletes existing data in the store. (:issue:`3062`)
34+
- Internally use `typesize` constructor parameter for :class:`numcodecs.blosc.Blosc` to improve compression ratios back to the v2-package levels. (:issue:`2962`)
35+
- Specifying the memory order of Zarr format 2 arrays using the ``order`` keyword argument has been fixed. (:issue:`2950`)
3436

3537

3638
Misc

docs/user-guide/arrays.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ which can be used to print useful diagnostics, e.g.::
183183
Type : Array
184184
Zarr format : 3
185185
Data type : DataType.int32
186+
Fill value : 0
186187
Shape : (10000, 10000)
187188
Chunk shape : (1000, 1000)
188189
Order : C
@@ -200,6 +201,7 @@ prints additional diagnostics, e.g.::
200201
Type : Array
201202
Zarr format : 3
202203
Data type : DataType.int32
204+
Fill value : 0
203205
Shape : (10000, 10000)
204206
Chunk shape : (1000, 1000)
205207
Order : C
@@ -287,6 +289,7 @@ Here is an example using a delta filter with the Blosc compressor::
287289
Type : Array
288290
Zarr format : 3
289291
Data type : DataType.int32
292+
Fill value : 0
290293
Shape : (10000, 10000)
291294
Chunk shape : (1000, 1000)
292295
Order : C
@@ -601,6 +604,7 @@ Sharded arrays can be created by providing the ``shards`` parameter to :func:`za
601604
Type : Array
602605
Zarr format : 3
603606
Data type : DataType.uint8
607+
Fill value : 0
604608
Shape : (10000, 10000)
605609
Shard shape : (1000, 1000)
606610
Chunk shape : (100, 100)

docs/user-guide/groups.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ property. E.g.::
129129
Type : Array
130130
Zarr format : 3
131131
Data type : DataType.int64
132+
Fill value : 0
132133
Shape : (1000000,)
133134
Chunk shape : (100000,)
134135
Order : C
@@ -145,6 +146,7 @@ property. E.g.::
145146
Type : Array
146147
Zarr format : 3
147148
Data type : DataType.float32
149+
Fill value : 0.0
148150
Shape : (1000, 1000)
149151
Chunk shape : (100, 100)
150152
Order : C

docs/user-guide/performance.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ To use sharding, you need to specify the ``shards`` parameter when creating the
9292
Type : Array
9393
Zarr format : 3
9494
Data type : DataType.uint8
95+
Fill value : 0
9596
Shape : (10000, 10000, 1000)
9697
Shard shape : (1000, 1000, 1000)
9798
Chunk shape : (100, 100, 100)
@@ -122,6 +123,7 @@ ratios, depending on the correlation structure within the data. E.g.::
122123
Type : Array
123124
Zarr format : 3
124125
Data type : DataType.int32
126+
Fill value : 0
125127
Shape : (10000, 10000)
126128
Chunk shape : (1000, 1000)
127129
Order : C
@@ -141,6 +143,7 @@ ratios, depending on the correlation structure within the data. E.g.::
141143
Type : Array
142144
Zarr format : 3
143145
Data type : DataType.int32
146+
Fill value : 0
144147
Shape : (10000, 10000)
145148
Chunk shape : (1000, 1000)
146149
Order : F

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -291,8 +291,8 @@ extend-exclude = [
291291
extend-select = [
292292
"ANN", # flake8-annotations
293293
"B", # flake8-bugbear
294-
"EXE", # flake8-executable
295294
"C4", # flake8-comprehensions
295+
"EXE", # flake8-executable
296296
"FA", # flake8-future-annotations
297297
"FLY", # flynt
298298
"FURB", # refurb
@@ -364,14 +364,14 @@ module = [
364364
"tests.test_store.test_local",
365365
"tests.test_store.test_fsspec",
366366
"tests.test_store.test_memory",
367+
"tests.test_codecs.test_codecs",
367368
]
368369
strict = false
369370

370371
# TODO: Move the next modules up to the strict = false section
371372
# and fix the errors
372373
[[tool.mypy.overrides]]
373374
module = [
374-
"tests.test_codecs.test_codecs",
375375
"tests.test_metadata.*",
376376
"tests.test_store.test_core",
377377
"tests.test_store.test_logging",

src/zarr/api/asynchronous.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ async def open(
329329
try:
330330
metadata_dict = await get_array_metadata(store_path, zarr_format=zarr_format)
331331
# TODO: remove this cast when we fix typing for array metadata dicts
332-
_metadata_dict = cast(ArrayMetadataDict, metadata_dict)
332+
_metadata_dict = cast("ArrayMetadataDict", metadata_dict)
333333
# for v2, the above would already have raised an exception if not an array
334334
zarr_format = _metadata_dict["zarr_format"]
335335
is_v3_array = zarr_format == 3 and _metadata_dict.get("node_type") == "array"

src/zarr/codecs/crc32c_.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ async def _decode_single(
4040
inner_bytes = data[:-4]
4141

4242
# Need to do a manual cast until https://github.com/numpy/numpy/issues/26783 is resolved
43-
computed_checksum = np.uint32(crc32c(cast(typing_extensions.Buffer, inner_bytes))).tobytes()
43+
computed_checksum = np.uint32(
44+
crc32c(cast("typing_extensions.Buffer", inner_bytes))
45+
).tobytes()
4446
stored_checksum = bytes(crc32_bytes)
4547
if computed_checksum != stored_checksum:
4648
raise ValueError(
@@ -55,7 +57,7 @@ async def _encode_single(
5557
) -> Buffer | None:
5658
data = chunk_bytes.as_numpy_array()
5759
# Calculate the checksum and "cast" it to a numpy array
58-
checksum = np.array([crc32c(cast(typing_extensions.Buffer, data))], dtype=np.uint32)
60+
checksum = np.array([crc32c(cast("typing_extensions.Buffer", data))], dtype=np.uint32)
5961
# Append the checksum (as bytes) to the data
6062
return chunk_spec.prototype.buffer.from_array_like(np.append(data, checksum.view("B")))
6163

src/zarr/codecs/sharding.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ class _ShardIndex(NamedTuple):
115115
def chunks_per_shard(self) -> ChunkCoords:
116116
result = tuple(self.offsets_and_lengths.shape[0:-1])
117117
# The cast is required until https://github.com/numpy/numpy/pull/27211 is merged
118-
return cast(ChunkCoords, result)
118+
return cast("ChunkCoords", result)
119119

120120
def _localize_chunk(self, chunk_coords: ChunkCoords) -> ChunkCoords:
121121
return tuple(

src/zarr/codecs/transpose.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def parse_transpose_order(data: JSON | Iterable[int]) -> tuple[int, ...]:
2323
raise TypeError(f"Expected an iterable. Got {data} instead.")
2424
if not all(isinstance(a, int) for a in data):
2525
raise TypeError(f"Expected an iterable of integers. Got {data} instead.")
26-
return tuple(cast(Iterable[int], data))
26+
return tuple(cast("Iterable[int]", data))
2727

2828

2929
@dataclass(frozen=True)

src/zarr/core/_info.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ def byte_info(size: int) -> str:
6767
return f"{size} ({human_readable_size(size)})"
6868

6969

70-
@dataclasses.dataclass(kw_only=True)
70+
@dataclasses.dataclass(kw_only=True, frozen=True, slots=True)
7171
class ArrayInfo:
7272
"""
7373
Visual summary for an Array.
@@ -79,6 +79,7 @@ class ArrayInfo:
7979
_type: Literal["Array"] = "Array"
8080
_zarr_format: ZarrFormat
8181
_data_type: np.dtype[Any] | DataType
82+
_fill_value: object
8283
_shape: tuple[int, ...]
8384
_shard_shape: tuple[int, ...] | None = None
8485
_chunk_shape: tuple[int, ...] | None = None
@@ -97,6 +98,7 @@ def __repr__(self) -> str:
9798
Type : {_type}
9899
Zarr format : {_zarr_format}
99100
Data type : {_data_type}
101+
Fill value : {_fill_value}
100102
Shape : {_shape}""")
101103

102104
if self._shard_shape is not None:

src/zarr/core/array.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -903,7 +903,7 @@ async def open(
903903
store_path = await make_store_path(store)
904904
metadata_dict = await get_array_metadata(store_path, zarr_format=zarr_format)
905905
# TODO: remove this cast when we have better type hints
906-
_metadata_dict = cast(ArrayV3MetadataDict, metadata_dict)
906+
_metadata_dict = cast("ArrayV3MetadataDict", metadata_dict)
907907
return cls(store_path=store_path, metadata=_metadata_dict)
908908

909909
@property
@@ -1399,7 +1399,7 @@ async def _set_selection(
13991399
if isinstance(array_like, np._typing._SupportsArrayFunc):
14001400
# TODO: need to handle array types that don't support __array_function__
14011401
# like PyTorch and JAX
1402-
array_like_ = cast(np._typing._SupportsArrayFunc, array_like)
1402+
array_like_ = cast("np._typing._SupportsArrayFunc", array_like)
14031403
value = np.asanyarray(value, dtype=self.metadata.dtype, like=array_like_)
14041404
else:
14051405
if not hasattr(value, "shape"):
@@ -1413,7 +1413,7 @@ async def _set_selection(
14131413
value = value.astype(dtype=self.metadata.dtype, order="A")
14141414
else:
14151415
value = np.array(value, dtype=self.metadata.dtype, order="A")
1416-
value = cast(NDArrayLike, value)
1416+
value = cast("NDArrayLike", value)
14171417
# We accept any ndarray like object from the user and convert it
14181418
# to a NDBuffer (or subclass). From this point onwards, we only pass
14191419
# Buffer and NDBuffer between components.
@@ -1702,6 +1702,7 @@ def _info(
17021702
return ArrayInfo(
17031703
_zarr_format=self.metadata.zarr_format,
17041704
_data_type=_data_type,
1705+
_fill_value=self.metadata.fill_value,
17051706
_shape=self.shape,
17061707
_order=self.order,
17071708
_shard_shape=self.shards,
@@ -2436,11 +2437,11 @@ def __getitem__(self, selection: Selection) -> NDArrayLikeOrScalar:
24362437
"""
24372438
fields, pure_selection = pop_fields(selection)
24382439
if is_pure_fancy_indexing(pure_selection, self.ndim):
2439-
return self.vindex[cast(CoordinateSelection | MaskSelection, selection)]
2440+
return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
24402441
elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
24412442
return self.get_orthogonal_selection(pure_selection, fields=fields)
24422443
else:
2443-
return self.get_basic_selection(cast(BasicSelection, pure_selection), fields=fields)
2444+
return self.get_basic_selection(cast("BasicSelection", pure_selection), fields=fields)
24442445

24452446
def __setitem__(self, selection: Selection, value: npt.ArrayLike) -> None:
24462447
"""Modify data for an item or region of the array.
@@ -2535,11 +2536,11 @@ def __setitem__(self, selection: Selection, value: npt.ArrayLike) -> None:
25352536
"""
25362537
fields, pure_selection = pop_fields(selection)
25372538
if is_pure_fancy_indexing(pure_selection, self.ndim):
2538-
self.vindex[cast(CoordinateSelection | MaskSelection, selection)] = value
2539+
self.vindex[cast("CoordinateSelection | MaskSelection", selection)] = value
25392540
elif is_pure_orthogonal_indexing(pure_selection, self.ndim):
25402541
self.set_orthogonal_selection(pure_selection, value, fields=fields)
25412542
else:
2542-
self.set_basic_selection(cast(BasicSelection, pure_selection), value, fields=fields)
2543+
self.set_basic_selection(cast("BasicSelection", pure_selection), value, fields=fields)
25432544

25442545
@_deprecate_positional_args
25452546
def get_basic_selection(
@@ -3657,7 +3658,7 @@ def update_attributes(self, new_attributes: dict[str, JSON]) -> Array:
36573658
# TODO: remove this cast when type inference improves
36583659
new_array = sync(self._async_array.update_attributes(new_attributes))
36593660
# TODO: remove this cast when type inference improves
3660-
_new_array = cast(AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata], new_array)
3661+
_new_array = cast("AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]", new_array)
36613662
return type(self)(_new_array)
36623663

36633664
def __repr__(self) -> str:
@@ -4252,7 +4253,7 @@ async def init_array(
42524253
serializer=serializer,
42534254
dtype=dtype_parsed,
42544255
)
4255-
sub_codecs = cast(tuple[Codec, ...], (*array_array, array_bytes, *bytes_bytes))
4256+
sub_codecs = cast("tuple[Codec, ...]", (*array_array, array_bytes, *bytes_bytes))
42564257
codecs_out: tuple[Codec, ...]
42574258
if shard_shape_parsed is not None:
42584259
index_location = None
@@ -4523,7 +4524,7 @@ def _parse_keep_array_attr(
45234524
compressors = "auto"
45244525
if serializer == "keep":
45254526
if zarr_format == 3 and data.metadata.zarr_format == 3:
4526-
serializer = cast(SerializerLike, data.serializer)
4527+
serializer = cast("SerializerLike", data.serializer)
45274528
else:
45284529
serializer = "auto"
45294530
if fill_value is None:
@@ -4692,7 +4693,7 @@ def _parse_chunk_encoding_v3(
46924693
if isinstance(filters, dict | Codec):
46934694
maybe_array_array = (filters,)
46944695
else:
4695-
maybe_array_array = cast(Iterable[Codec | dict[str, JSON]], filters)
4696+
maybe_array_array = cast("Iterable[Codec | dict[str, JSON]]", filters)
46964697
out_array_array = tuple(_parse_array_array_codec(c) for c in maybe_array_array)
46974698

46984699
if serializer == "auto":
@@ -4711,7 +4712,7 @@ def _parse_chunk_encoding_v3(
47114712
if isinstance(compressors, dict | Codec):
47124713
maybe_bytes_bytes = (compressors,)
47134714
else:
4714-
maybe_bytes_bytes = cast(Iterable[Codec | dict[str, JSON]], compressors)
4715+
maybe_bytes_bytes = cast("Iterable[Codec | dict[str, JSON]]", compressors)
47154716

47164717
out_bytes_bytes = tuple(_parse_bytes_bytes_codec(c) for c in maybe_bytes_bytes)
47174718

src/zarr/core/array_spec.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def from_dict(cls, data: ArrayConfigParams) -> Self:
6464
"""
6565
kwargs_out: ArrayConfigParams = {}
6666
for f in fields(ArrayConfig):
67-
field_name = cast(Literal["order", "write_empty_chunks"], f.name)
67+
field_name = cast("Literal['order', 'write_empty_chunks']", f.name)
6868
if field_name not in data:
6969
kwargs_out[field_name] = zarr_config.get(f"array.{field_name}")
7070
else:

src/zarr/core/buffer/core.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ def create_zero_length(cls) -> Self:
159159
if cls is Buffer:
160160
raise NotImplementedError("Cannot call abstract method on the abstract class 'Buffer'")
161161
return cls(
162-
cast(ArrayLike, None)
162+
cast("ArrayLike", None)
163163
) # This line will never be reached, but it satisfies the type checker
164164

165165
@classmethod
@@ -207,7 +207,7 @@ def from_buffer(cls, buffer: Buffer) -> Self:
207207
if cls is Buffer:
208208
raise NotImplementedError("Cannot call abstract method on the abstract class 'Buffer'")
209209
return cls(
210-
cast(ArrayLike, None)
210+
cast("ArrayLike", None)
211211
) # This line will never be reached, but it satisfies the type checker
212212

213213
@classmethod
@@ -227,7 +227,7 @@ def from_bytes(cls, bytes_like: BytesLike) -> Self:
227227
if cls is Buffer:
228228
raise NotImplementedError("Cannot call abstract method on the abstract class 'Buffer'")
229229
return cls(
230-
cast(ArrayLike, None)
230+
cast("ArrayLike", None)
231231
) # This line will never be reached, but it satisfies the type checker
232232

233233
def as_array_like(self) -> ArrayLike:
@@ -371,7 +371,7 @@ def create(
371371
"Cannot call abstract method on the abstract class 'NDBuffer'"
372372
)
373373
return cls(
374-
cast(NDArrayLike, None)
374+
cast("NDArrayLike", None)
375375
) # This line will never be reached, but it satisfies the type checker
376376

377377
@classmethod
@@ -408,7 +408,7 @@ def from_numpy_array(cls, array_like: npt.ArrayLike) -> Self:
408408
"Cannot call abstract method on the abstract class 'NDBuffer'"
409409
)
410410
return cls(
411-
cast(NDArrayLike, None)
411+
cast("NDArrayLike", None)
412412
) # This line will never be reached, but it satisfies the type checker
413413

414414
def as_ndarray_like(self) -> NDArrayLike:
@@ -440,7 +440,7 @@ def as_scalar(self) -> ScalarType:
440440
"""Returns the buffer as a scalar value"""
441441
if self._data.size != 1:
442442
raise ValueError("Buffer does not contain a single scalar value")
443-
return cast(ScalarType, self.as_numpy_array()[()])
443+
return cast("ScalarType", self.as_numpy_array()[()])
444444

445445
@property
446446
def dtype(self) -> np.dtype[Any]:

src/zarr/core/buffer/cpu.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,8 @@ def create(
154154
order: Literal["C", "F"] = "C",
155155
fill_value: Any | None = None,
156156
) -> Self:
157-
if fill_value is None:
157+
# np.zeros is much faster than np.full, and therefore using it when possible is better.
158+
if fill_value is None or (isinstance(fill_value, int) and fill_value == 0):
158159
return cls(np.zeros(shape=tuple(shape), dtype=dtype, order=order))
159160
else:
160161
return cls(np.full(shape=tuple(shape), fill_value=fill_value, dtype=dtype, order=order))

0 commit comments

Comments
 (0)