Skip to content

Commit a6ccf94

Browse files
Update docs on trust_remote_code defaults to False (#6981)
* Set trust_remote_code defaults to False in docstrings * Replace warning tip with version added in docstrings * Update docs * Rephrase * Fix typo
1 parent 1d65718 commit a6ccf94

File tree

4 files changed

+36
-32
lines changed

4 files changed

+36
-32
lines changed

docs/source/dataset_script.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ as long as your dataset repository has a [required structure](./repository_struc
1212

1313
<Tip warning=true>
1414

15-
In the next major release, the new safety features of 🤗 Datasets will disable running dataset loading scripts by default, and you will have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
15+
For security reasons, 🤗 Datasets do not allow running dataset loading scripts by default, and you have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
1616

1717
</Tip>
1818

docs/source/load_hub.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ Certain datasets repositories contain a loading script with the Python code used
106106
Those datasets are generally exported to Parquet by Hugging Face, so that 🤗 Datasets can load the dataset fast and without running a loading script.
107107

108108
Even if a Parquet export is not available, you can still use any dataset with Python code in its repository with `load_dataset`.
109-
All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get a warning:
109+
All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get an error:
110110

111111
```py
112112
>>> from datasets import get_dataset_config_names, get_dataset_split_names, load_dataset
@@ -120,6 +120,6 @@ All files and code uploaded to the Hub are scanned for malware (refer to the Hub
120120

121121
<Tip warning=true>
122122

123-
In the next major release, the new safety features of 🤗 Datasets will disable running dataset loading scripts by default, and you will have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
123+
For security reasons, 🤗 Datasets do not allow running dataset loading scripts by default, and you have to pass `trust_remote_code=True` to load datasets that require running a dataset script.
124124

125125
</Tip>

src/datasets/hub.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,15 @@ def convert_to_parquet(
4242
`<org>/<dataset_name>`.
4343
revision (`str`, *optional*): Branch of the source Hub dataset repository. Defaults to the `"main"` branch.
4444
token (`bool` or `str`, *optional*): Authentication token for the Hugging Face Hub.
45-
trust_remote_code (`bool`, defaults to `True`): Whether you trust the remote code of the Hub script-based
45+
trust_remote_code (`bool`, defaults to `False`): Whether you trust the remote code of the Hub script-based
4646
dataset to be executed locally on your machine. This option should only be set to `True` for repositories
4747
where you have read the code and which you trust.
4848
49-
<Tip warning={true}>
49+
<Changed version="2.20.0">
5050
51-
`trust_remote_code` will default to False in the next major release.
51+
`trust_remote_code` defaults to `False` if not specified.
5252
53-
</Tip>
53+
</Changed>
5454
5555
Returns:
5656
`huggingface_hub.CommitInfo`

src/datasets/load.py

+29-25
Original file line numberDiff line numberDiff line change
@@ -1749,18 +1749,19 @@ def dataset_module_factory(
17491749
Directory to read/write data. Defaults to `"~/.cache/huggingface/datasets"`.
17501750
17511751
<Added version="2.16.0"/>
1752-
trust_remote_code (`bool`, defaults to `True`):
1752+
trust_remote_code (`bool`, defaults to `False`):
17531753
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
17541754
should only be set to `True` for repositories you trust and in which you have read the code, as it will
17551755
execute code present on the Hub on your local machine.
17561756
1757-
<Tip warning={true}>
1757+
<Added version="2.16.0"/>
17581758
1759-
`trust_remote_code` will default to False in the next major release.
1759+
<Changed version="2.20.0">
17601760
1761-
</Tip>
1761+
`trust_remote_code` defaults to `False` if not specified.
1762+
1763+
</Changed>
17621764
1763-
<Added version="2.16.0"/>
17641765
**download_kwargs (additional keyword arguments): optional attributes for DownloadConfig() which will override
17651766
the attributes in download_config if supplied.
17661767
@@ -1961,18 +1962,19 @@ def metric_module_factory(
19611962
dynamic_modules_path (Optional str, defaults to HF_MODULES_CACHE / "datasets_modules", i.e. ~/.cache/huggingface/modules/datasets_modules):
19621963
Optional path to the directory in which the dynamic modules are saved. It must have been initialized with :obj:`init_dynamic_modules`.
19631964
By default, the datasets and metrics are stored inside the `datasets_modules` module.
1964-
trust_remote_code (`bool`, defaults to `True`):
1965+
trust_remote_code (`bool`, defaults to `False`):
19651966
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
19661967
should only be set to `True` for repositories you trust and in which you have read the code, as it will
19671968
execute code present on the Hub on your local machine.
19681969
1969-
<Tip warning={true}>
1970+
<Added version="2.16.0"/>
19701971
1971-
`trust_remote_code` will default to False in the next major release.
1972+
<Changed version="2.20.0">
19721973
1973-
</Tip>
1974+
`trust_remote_code` defaults to `False` if not specified.
1975+
1976+
</Changed>
19741977
1975-
<Added version="2.16.0"/>
19761978
**download_kwargs (additional keyword arguments): optional attributes for DownloadConfig() which will override
19771979
the attributes in download_config if supplied.
19781980
@@ -2078,18 +2080,18 @@ def load_metric(
20782080
revision (Optional ``Union[str, datasets.Version]``): if specified, the module will be loaded from the datasets repository
20792081
at this version. By default, it is set to the local version of the lib. Specifying a version that is different from
20802082
your local version of the lib might cause compatibility issues.
2081-
trust_remote_code (`bool`, defaults to `True`):
2083+
trust_remote_code (`bool`, defaults to `False`):
20822084
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
20832085
should only be set to `True` for repositories you trust and in which you have read the code, as it will
20842086
execute code present on the Hub on your local machine.
20852087
2086-
<Tip warning={true}>
2088+
<Added version="2.16.0"/>
20872089
2088-
`trust_remote_code` will default to False in the next major release.
2090+
<Changed version="2.20.0">
20892091
2090-
</Tip>
2092+
`trust_remote_code` defaults to `False` if not specified.
20912093
2092-
<Added version="2.16.0"/>
2094+
</Changed>
20932095
20942096
Returns:
20952097
`datasets.Metric`
@@ -2220,18 +2222,19 @@ def load_dataset_builder(
22202222
**Experimental**. Key/value pairs to be passed on to the dataset file-system backend, if any.
22212223
22222224
<Added version="2.11.0"/>
2223-
trust_remote_code (`bool`, defaults to `True`):
2225+
trust_remote_code (`bool`, defaults to `False`):
22242226
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
22252227
should only be set to `True` for repositories you trust and in which you have read the code, as it will
22262228
execute code present on the Hub on your local machine.
22272229
2228-
<Tip warning={true}>
2230+
<Added version="2.16.0"/>
2231+
2232+
<Changed version="2.20.0">
22292233
2230-
`trust_remote_code` will default to False in the next major release.
2234+
`trust_remote_code` defaults to `False` if not specified.
22312235
2232-
</Tip>
2236+
</Changed>
22332237
2234-
<Added version="2.16.0"/>
22352238
**config_kwargs (additional keyword arguments):
22362239
Keyword arguments to be passed to the [`BuilderConfig`]
22372240
and used in the [`DatasetBuilder`].
@@ -2481,18 +2484,19 @@ def load_dataset(
24812484
**Experimental**. Key/value pairs to be passed on to the dataset file-system backend, if any.
24822485
24832486
<Added version="2.11.0"/>
2484-
trust_remote_code (`bool`, defaults to `True`):
2487+
trust_remote_code (`bool`, defaults to `False`):
24852488
Whether or not to allow for datasets defined on the Hub using a dataset script. This option
24862489
should only be set to `True` for repositories you trust and in which you have read the code, as it will
24872490
execute code present on the Hub on your local machine.
24882491
2489-
<Tip warning={true}>
2492+
<Added version="2.16.0"/>
24902493
2491-
`trust_remote_code` will default to False in the next major release.
2494+
<Changed version="2.20.0">
24922495
2493-
</Tip>
2496+
`trust_remote_code` defaults to `False` if not specified.
2497+
2498+
</Changed>
24942499
2495-
<Added version="2.16.0"/>
24962500
**config_kwargs (additional keyword arguments):
24972501
Keyword arguments to be passed to the `BuilderConfig`
24982502
and used in the [`DatasetBuilder`].

0 commit comments

Comments
 (0)