Skip to content

Commit f3605a6

Browse files
pjbulljayqi
andauthored
Fixes for LocalClient bugs: glob in local clients to match cloud clients, reset default storage work on default client (#436)
* update glob behavior and test * Typing and changelog * no override * Self typing import * Simpler fix * Fix 414 * message for 414 * Change reset path for default * Fix linting * More waiting styles * Change LocalClient to not explicitly store default storage directory (#462) * Change LocalClient to not explicitly store default storage directory * Remove extraneous file --------- Co-authored-by: Jay Qi <[email protected]> --------- Co-authored-by: Jay Qi <[email protected]> Co-authored-by: Jay Qi <[email protected]>
1 parent 7cbff39 commit f3605a6

File tree

5 files changed

+90
-27
lines changed

5 files changed

+90
-27
lines changed

HISTORY.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,21 @@
44

55
- Allow `CloudPath` objects to be loaded/dumped through pickle format repeatedly. (Issue [#450](https://github.com/drivendataorg/cloudpathlib/issues/450))
66
- Fixes typo in `FileCacheMode` where values were being filled by envvar `CLOUPATHLIB_FILE_CACHE_MODE` instead of `CLOUDPATHLIB_FILE_CACHE_MODE`. (PR [#424](https://github.com/drivendataorg/cloudpathlib/pull/424)
7-
- Fix `CloudPath` cleanup via `CloudPath.__del__` when `Client` encounters an exception during initialization and does not create a `file_cache_mode` attribute. (Issue [#372](https://github.com/drivendataorg/cloudpathlib/issues/372), thanks to [@bryanwweber](https://github.com/bryanwweber))
7+
- Fix `CloudPath` cleanup via `CloudPath.__del__` when `Client` encounters an exception during initialization and does not create a `file_cache_mode` attribute. (Issue [#372](https://github.com/drivendataorg/cloudpathlib/issues/372), thanks to [@bryanwweber](https://github.com/bryanwweber))
88
- Drop support for Python 3.7; pin minimal `boto3` version to Python 3.8+ versions. (PR [#407](https://github.com/drivendataorg/cloudpathlib/pull/407))
99
- fix: use native `exists()` method in `GSClient`. (PR [#420](https://github.com/drivendataorg/cloudpathlib/pull/420))
1010
- Enhancement: lazy instantiation of default client (PR [#432](https://github.com/drivendataorg/cloudpathlib/issues/432), Issue [#428](https://github.com/drivendataorg/cloudpathlib/issues/428))
1111
- Adds existence check before downloading in `download_to` (Issue [#430](https://github.com/drivendataorg/cloudpathlib/issues/430), PR [#432](https://github.com/drivendataorg/cloudpathlib/pull/432))
1212
- Add env vars `CLOUDPATHLIB_FORCE_OVERWRITE_FROM_CLOUD` and `CLOUDPATHLIB_FORCE_OVERWRITE_TO_CLOUD`. (Issue [#393](https://github.com/drivendataorg/cloudpathlib/issues/393), PR [#437](https://github.com/drivendataorg/cloudpathlib/pull/437))
13+
- Fixed `glob` for `cloudpathlib.local.LocalPath` and subclass implementations to match behavior of cloud versions for parity in testing. (Issue [#415](https://github.com/drivendataorg/cloudpathlib/issues/415), [PR #436](https://github.com/drivendataorg/cloudpathlib/pull/436))
14+
- Changed how `cloudpathlib.local.LocalClient` and subclass implementations track the default local storage directory (used to simulate the cloud) used when no local storage directory is explicitly provided. ([PR #436](https://github.com/drivendataorg/cloudpathlib/pull/436), [PR #462](https://github.com/drivendataorg/cloudpathlib/pull/462))
15+
- Changed `LocalClient` so that client instances using the default storage access the default local storage directory through the `get_default_storage_dir` rather than having an explicit reference to the path set at instantiation. This means that calling `get_default_storage_dir` will reset the local storage for all clients using the default local storage, whether the client has already been instantiated or is instantiated after resetting. This fixes unintuitive behavior where `reset_local_storage` did not reset local storage when using the default client. (Issue [#414](https://github.com/drivendataorg/cloudpathlib/issues/414))
16+
- Added a new `local_storage_dir` property to `LocalClient`. This will return the current local storage directory used by that client instance.
17+
by reference through the `get_default_ rather than with an explicit.
1318

1419
## v0.18.1 (2024-02-26)
1520

16-
- Fixed import error due to incompatible `google-cloud-storage` by not using `transfer_manager` if it is not available. ([Issue #408](https://github.com/drivendataorg/cloudpathlib/issues/408), [PR #410](https://github.com/drivendataorg/cloudpathlib/pull/410))
21+
- Fixed import error due to incompatible `google-cloud-storage` by not using `transfer_manager` if it is not available. ([Issue #408](https://github.com/drivendataorg/cloudpathlib/issues/408), [PR #410](https://github.com/drivendataorg/cloudpathlib/pull/410))
1722

1823
Includes all changes from v0.18.0.
1924

cloudpathlib/local/localclient.py

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import shutil
77
from tempfile import TemporaryDirectory
88
from time import sleep
9-
from typing import Callable, Dict, Iterable, List, Optional, Tuple, Union
9+
from typing import Callable, ClassVar, Dict, Iterable, List, Optional, Tuple, Union
1010

1111
from ..client import Client
1212
from ..enums import FileCacheMode
@@ -17,7 +17,12 @@ class LocalClient(Client):
1717
"""Abstract client for accessing objects the local filesystem. Subclasses are as a monkeypatch
1818
substitutes for normal Client subclasses when writing tests."""
1919

20-
_default_storage_temp_dir = None
20+
# Class-level variable to tracks the default storage directory for this client class
21+
# that is used if a client is instantiated without a directory being explicitly provided
22+
_default_storage_temp_dir: ClassVar[Optional[TemporaryDirectory]] = None
23+
24+
# Instance-level variable that tracks the local storage directory for this client
25+
_local_storage_dir: Optional[Union[str, os.PathLike]]
2126

2227
def __init__(
2328
self,
@@ -28,10 +33,7 @@ def __init__(
2833
content_type_method: Optional[Callable] = mimetypes.guess_type,
2934
**kwargs,
3035
):
31-
# setup caching and local versions of file. use default temp dir if not provided
32-
if local_storage_dir is None:
33-
local_storage_dir = self.get_default_storage_dir()
34-
self._local_storage_dir = Path(local_storage_dir)
36+
self._local_storage_dir = local_storage_dir
3537

3638
super().__init__(
3739
local_cache_dir=local_cache_dir,
@@ -41,24 +43,45 @@ def __init__(
4143

4244
@classmethod
4345
def get_default_storage_dir(cls) -> Path:
46+
"""Return the default storage directory for this client class. This is used if a client
47+
is instantiated without a storage directory being explicitly provided. In this usage,
48+
"storage" refers to the local storage that simulates the cloud.
49+
"""
4450
if cls._default_storage_temp_dir is None:
4551
cls._default_storage_temp_dir = TemporaryDirectory()
4652
_temp_dirs_to_clean.append(cls._default_storage_temp_dir)
4753
return Path(cls._default_storage_temp_dir.name)
4854

4955
@classmethod
5056
def reset_default_storage_dir(cls) -> Path:
57+
"""Reset the default storage directly. This tears down and recreates the directory used by
58+
default for this client class when instantiating a client without explicitly providing
59+
a storage directory. In this usage, "storage" refers to the local storage that simulates
60+
the cloud.
61+
"""
5162
cls._default_storage_temp_dir = None
5263
return cls.get_default_storage_dir()
5364

65+
@property
66+
def local_storage_dir(self) -> Path:
67+
"""The local directory where files are stored for this client. This storage directory is
68+
the one that simulates the cloud. If no storage directory was provided on instantiating the
69+
client, the default storage directory for this client class is used.
70+
"""
71+
if self._local_storage_dir is None:
72+
# No explicit local storage was provided on instantiating the client.
73+
# Use the default storage directory for this class.
74+
return self.get_default_storage_dir()
75+
return Path(self._local_storage_dir)
76+
5477
def _cloud_path_to_local(self, cloud_path: "LocalPath") -> Path:
55-
return self._local_storage_dir / cloud_path._no_prefix
78+
return self.local_storage_dir / cloud_path._no_prefix
5679

5780
def _local_to_cloud_path(self, local_path: Union[str, os.PathLike]) -> "LocalPath":
5881
local_path = Path(local_path)
5982
cloud_prefix = self._cloud_meta.path_class.cloud_prefix
6083
return self.CloudPath(
61-
f"{cloud_prefix}{PurePosixPath(local_path.relative_to(self._local_storage_dir))}"
84+
f"{cloud_prefix}{PurePosixPath(local_path.relative_to(self.local_storage_dir))}"
6285
)
6386

6487
def _download_file(self, cloud_path: "LocalPath", local_path: Union[str, os.PathLike]) -> Path:
@@ -89,15 +112,9 @@ def _is_file(self, cloud_path: "LocalPath") -> bool:
89112
def _list_dir(
90113
self, cloud_path: "LocalPath", recursive=False
91114
) -> Iterable[Tuple["LocalPath", bool]]:
92-
if recursive:
93-
return (
94-
(self._local_to_cloud_path(obj), obj.is_dir())
95-
for obj in self._cloud_path_to_local(cloud_path).glob("**/*")
96-
)
97-
return (
98-
(self._local_to_cloud_path(obj), obj.is_dir())
99-
for obj in self._cloud_path_to_local(cloud_path).iterdir()
100-
)
115+
pattern = "**/*" if recursive else "*"
116+
for obj in self._cloud_path_to_local(cloud_path).glob(pattern):
117+
yield (self._local_to_cloud_path(obj), obj.is_dir())
101118

102119
def _md5(self, cloud_path: "LocalPath") -> str:
103120
return md5(self._cloud_path_to_local(cloud_path).read_bytes()).hexdigest()

tests/test_caching.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from pathlib import Path
55

66
import pytest
7-
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed
7+
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
88

99
from cloudpathlib.enums import FileCacheMode
1010
from cloudpathlib.exceptions import (
@@ -505,8 +505,9 @@ def test_manual_cache_clearing(rig: CloudProviderTestRig):
505505
# in CI there can be a lag before the cleanup actually happens
506506
@retry(
507507
retry=retry_if_exception_type(AssertionError),
508-
wait=wait_fixed(1),
509-
stop=stop_after_attempt(5),
508+
wait=wait_random_exponential(multiplier=0.5, max=5),
509+
stop=stop_after_attempt(10),
510+
reraise=True,
510511
)
511512
def _resilient_assert():
512513
gc.collect() # force gc before asserting

tests/test_cloudpath_instantiation.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,9 @@ def test_dependencies_not_loaded(rig, monkeypatch):
9292
with pytest.raises(MissingDependenciesError):
9393
rig.create_cloud_path("dir_0/file0_0.txt")
9494

95+
# manual reset for teardown order so teardown doesn't fail
96+
monkeypatch.setattr(rig.path_class._cloud_meta, "dependencies_loaded", True)
97+
9598

9699
def test_is_pathlike(rig):
97100
p = rig.create_cloud_path("dir_0")

tests/test_local.py

Lines changed: 42 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ def test_default_storage_dir(client_class, monkeypatch):
5959
p1.write_text("hello")
6060
assert p1.exists()
6161
assert p1.read_text() == "hello"
62+
63+
# p2 uses a new client, but the simulated "cloud" should be the same
6264
assert p2.exists()
6365
assert p2.read_text() == "hello"
6466

@@ -76,16 +78,51 @@ def test_reset_default_storage_dir(client_class, monkeypatch):
7678
cloud_prefix = client_class._cloud_meta.path_class.cloud_prefix
7779

7880
p1 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")
79-
client_class.reset_default_storage_dir()
80-
p2 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")
81-
8281
assert not p1.exists()
83-
assert not p2.exists()
84-
8582
p1.write_text("hello")
8683
assert p1.exists()
8784
assert p1.read_text() == "hello"
85+
86+
client_class.reset_default_storage_dir()
87+
88+
# We've reset the default storage directory, so the file should be gone
89+
assert not p1.exists()
90+
91+
# Also should be gone for p2, which uses a new client that is still using default storage dir
92+
p2 = client_class().CloudPath(f"{cloud_prefix}drive/file.txt")
8893
assert not p2.exists()
8994

9095
# clean up
9196
client_class.reset_default_storage_dir()
97+
98+
99+
def test_reset_default_storage_dir_with_default_client():
100+
"""Test that reset_default_storage_dir resets the storage used by all clients that are using
101+
the default storage directory, such as the default client.
102+
103+
Regression test for https://github.com/drivendataorg/cloudpathlib/issues/414
104+
"""
105+
# try default client instantiation
106+
from cloudpathlib.local import LocalS3Path, LocalS3Client
107+
108+
s3p = LocalS3Path("s3://drive/file.txt")
109+
assert not s3p.exists()
110+
s3p.write_text("hello")
111+
assert s3p.exists()
112+
113+
LocalS3Client.reset_default_storage_dir()
114+
s3p2 = LocalS3Path("s3://drive/file.txt")
115+
assert not s3p2.exists()
116+
117+
118+
@pytest.mark.parametrize("client_class", [LocalAzureBlobClient, LocalGSClient, LocalS3Client])
119+
def test_glob_matches(client_class, monkeypatch):
120+
if client_class is LocalAzureBlobClient:
121+
monkeypatch.setenv("AZURE_STORAGE_CONNECTION_STRING", "")
122+
123+
cloud_prefix = client_class._cloud_meta.path_class.cloud_prefix
124+
p = client_class().CloudPath(f"{cloud_prefix}drive/not/exist")
125+
p.mkdir(parents=True)
126+
127+
# match CloudPath, which returns empty; not glob module, which raises
128+
assert list(p.glob("*")) == []

0 commit comments

Comments
 (0)