Skip to content

Commit de1dfc7

Browse files
Add datasets for generative models (#69)
* add dsets * clean up for staging * restore pyproject.toml * update: dummy URLs, feature content, collection content * update hero images * add using this ds * add using this ds * add using this ds * add using this ds * add using this ds * add using this ds * add urls * update dates * add thumbnails * update using_this_datset, add link blobs image * capitalize titles * update code blocks * add changelog * expand code binary blobs * expand code and edit copy d-wave * cite genomic data * update download link * update attribute descriptions * update date of modification and publication --------- Co-authored-by: Diego <[email protected]> Co-authored-by: Diego <[email protected]>
1 parent cd20275 commit de1dfc7

37 files changed

+765
-1
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"slug": "qml-benchmarks",
33
"title": "QML Benchmarks",
4-
"about": "This collection contains datasets from [Better than classical? The subtle art of benchmarking quantum machine learning models](https://arxiv.org/abs/2403.07059). These datasets offer several classification tasks that can be used for benchmarking quantum machine learning models.",
4+
"about": "This collection contains datasets from [Better than classical? The subtle art of benchmarking quantum machine learning models](https://arxiv.org/abs/2403.07059) and [Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits](https://arxiv.org/abs/2503.02934). These datasets offer several classification tasks that can be used for benchmarking quantum machine learning models.",
55
"thumbnail": "DatasetCollections_Benchmarks_Thumb.png"
66
}
Loading
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@misc{recioarmengol2025trainclassicaldeployquantum,
2+
title={Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits},
3+
author={Erik Recio-Armengol and Shahnawaz Ahmed and Joseph Bowles},
4+
year={2025},
5+
eprint={2503.02934},
6+
archivePrefix={arXiv},
7+
primaryClass={quant-ph},
8+
url={https://arxiv.org/abs/2503.02934},
9+
},
10+
11+
@misc{bowles2025binarizedmnist,
12+
title={Binarized MNIST},
13+
author={Joseph Bowles},
14+
howpublished={\urlhttps://pennylane.ai/datasets/other/binarized-mnist},
15+
year={2025}
16+
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"slug": "binarized-mnist",
3+
"class": {
4+
"slug": "binarized-mnist",
5+
"name": "BinarizedMNIST",
6+
"attributeList": [
7+
{
8+
"name": "train",
9+
"pythonType": "dict",
10+
"doc": "Training data `train['inputs']` and labels `train['labels']`. Each input is a binary array of shape (748,) that when reshaped to (28,28) gives a 2D image corresponding to a digit between 0 and 9. Labels are integers between 0 and 9, corresponding to the digit in the image."
11+
},
12+
{
13+
"name": "test",
14+
"pythonType": "dict",
15+
"doc": "Test data `test['inputs']` and labels `test['labels']`. Each input is a binary array of shape (748,) that when reshaped to (28,28) gives a 2D image corresponding to a digit between 0 and 9. Labels are integers between 0 and 9, corresponding to the digit in the image."
16+
}
17+
]
18+
},
19+
"collection": {
20+
"$path": "/other/qml-benchmarks/_meta/collection.json"
21+
},
22+
"data": [
23+
{
24+
"dataUrl": "https://datasets.cloud.pennylane.ai/user/05b7fb8e-464b-4f6b-a0e2-c5ed4ee502cb",
25+
"parameters": {
26+
"name": "binarized-mnist"
27+
},
28+
"extra": {}
29+
}
30+
],
31+
"downloadName": "binarized-mnist",
32+
"features": [
33+
{
34+
"slug": "dataset-attributes",
35+
"title": "Dataset Attributes",
36+
"type": "DATA",
37+
"content": {
38+
"$path": "features/dataset-attributes.md"
39+
}
40+
}
41+
],
42+
"meta": {
43+
"$path": "meta.json"
44+
},
45+
"parameterTree": null,
46+
"extra": {}
47+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
|Name|Type|Description|
2+
|-|-|-|
3+
|`train`|`dict`|Training data `train['inputs']` and labels `train['labels']`. Each input is a binary array of shape (748,) that when reshaped to (28,28) gives a 2D image corresponding to a digit between 0 and 9. Labels are integers between 0 and 9, corresponding to the digit in the image.|
4+
|`test`|`dict`|Test data `test['inputs']` and labels `test['labels']`. Each input is a binary array of shape (748,) that when reshaped to (28,28) gives a 2D image corresponding to a digit between 0 and 9. Labels are integers between 0 and 9, corresponding to the digit in the image.|
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"abstract": null,
3+
"authors": [
4+
{
5+
"name": "Joseph Bowles",
6+
"username": "josephbowles"
7+
}
8+
],
9+
"basedOnPapers": false,
10+
"citation": {
11+
"$path": "citation.txt"
12+
},
13+
"changelog": [
14+
"version 0.1 : initial public release"
15+
],
16+
"description": "Binarized version of the MNIST handwritten digits dataset",
17+
"license": "[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)",
18+
"dateOfLastModification": "2025-04-14",
19+
"dateOfPublication": "2025-04-14",
20+
"sourceCodeUrl": "https://github.com/XanaduAI/scaling-gqml",
21+
"tags": [],
22+
"title": "Binarized MNIST",
23+
"usingThisDataset": {
24+
"$path": "using_this_dataset.md"
25+
},
26+
"heroImage": "https://assets.cloud.pennylane.ai/datasets/generic/hero/Datasets_GenericHero_2.png",
27+
"thumbnail": "Dataset_QML_MNIST_Thumb.png",
28+
"extra": {}
29+
}
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Data for benchmarking machine learning models, taken from
2+
[Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits](https://arxiv.org/abs/2503.02934).
3+
This dataset is a binarized version of the well-known MNIST handwritten digits dataset.
4+
5+
**Description of the dataset**
6+
7+
The dataset consists of bit strings of length 784, that correspond to flattened images of size
8+
(28,28). The dataset is generated by normalizing the pixel values of the original MNIST dataset
9+
to [0,1] and then thresholding at 0.5 to create binary images. There are a total of 50000 training
10+
inputs and 10000 test inputs. Labels range from 0 to 9 and indicate the corresponding digits in the images.
11+
Please see the ``Source code`` tab to check how the data was generated.
12+
13+
**Example usage**
14+
15+
```pycon
16+
>>> [ds] = qml.data.load("other", name="binarized-mnist")
17+
>>>
18+
>>> ds.train['inputs']
19+
array([[0, 0, 0, ..., 0, 0, 0],
20+
[0, 0, 0, ..., 0, 0, 0],
21+
[0, 0, 0, ..., 0, 0, 0],
22+
...,
23+
[0, 0, 0, ..., 0, 0, 0],
24+
[0, 0, 0, ..., 0, 0, 0],
25+
[0, 0, 0, ..., 0, 0, 0]])
26+
>>> ds.train['labels']
27+
array([5, 0, 4, ..., 8, 4, 8])
28+
```
Loading
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@misc{recioarmengol2025trainclassicaldeployquantum,
2+
title={Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits},
3+
author={Erik Recio-Armengol and Shahnawaz Ahmed and Joseph Bowles},
4+
year={2025},
5+
eprint={2503.02934},
6+
archivePrefix={arXiv},
7+
primaryClass={quant-ph},
8+
url={https://arxiv.org/abs/2503.02934},
9+
},
10+
11+
@misc{bowles2025binaryblobs,
12+
title={Binary blobs},
13+
author={Joseph Bowles and Shahnawaz Ahmed},
14+
howpublished={\urlhttps://pennylane.ai/datasets/other/binary-blobs},
15+
year={2025}
16+
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"slug": "binary-blobs",
3+
"class": {
4+
"slug": "binary-blobs",
5+
"name": "BinaryBlobs",
6+
"attributeList": [
7+
{
8+
"name": "train",
9+
"pythonType": "dict",
10+
"doc": "Train data `train['inputs']` and labels `train['labels']`. Each input is a binary array of shape (16,) that when reshaped to shape (4,4) gives a 2D image corresponding to one of 8 patterns. Labels are integers from 0 to 7 specifying the pattern of the corresponding input."
11+
},
12+
{
13+
"name": "test",
14+
"pythonType": "dict",
15+
"doc": "Test data `test['inputs']` and labels `test['labels']`. Each input is a binary array of shape (16,) that when reshaped to shape (4,4) gives a 2D image corresponding to one of 8 patterns. Labels are integers from 0 to 7 specifying the pattern of the corresponding input."
16+
}
17+
]
18+
},
19+
"collection": {
20+
"$path": "/other/qml-benchmarks/_meta/collection.json"
21+
},
22+
"data": [
23+
{
24+
"dataUrl": "https://datasets.cloud.pennylane.ai/user/7a8439b3-4e76-4397-b25a-44bf8f0b7224",
25+
"parameters": {
26+
"name": "binary-blobs"
27+
},
28+
"extra": {}
29+
}
30+
],
31+
"downloadName": "binary-blobs",
32+
"features": [
33+
{
34+
"slug": "dataset-attributes",
35+
"title": "Dataset Attributes",
36+
"type": "DATA",
37+
"content": {
38+
"$path": "features/dataset-attributes.md"
39+
}
40+
}
41+
],
42+
"meta": {
43+
"$path": "meta.json"
44+
},
45+
"parameterTree": null,
46+
"extra": {}
47+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
|Name|Type|Description|
2+
|-|-|-|
3+
|`train`|`dict`|Train data `train['inputs']` and labels `train['labels']`. Each input is a binary array of shape (16,) that when reshaped to shape (4,4) gives a 2D image corresponding to one of 8 patterns. Labels are integers from 0 to 7 specifying the pattern of the corresponding input.|
4+
|`test`|`dict`|Test data `test['inputs']` and labels `test['labels']`. Each input is a binary array of shape (16,) that when reshaped to shape (4,4) gives a 2D image corresponding to one of 8 patterns. Labels are integers from 0 to 7 specifying the pattern of the corresponding input.|
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
{
2+
"abstract": null,
3+
"authors": [
4+
{
5+
"name": "Joseph Bowles",
6+
"username": "josephbowles"
7+
},
8+
{
9+
"name": "Shahnawaz Ahmed"
10+
}
11+
],
12+
"basedOnPapers": false,
13+
"citation": {
14+
"$path": "citation.txt"
15+
},
16+
"changelog": [
17+
"version 0.1 : initial public release"
18+
],
19+
"description": "Binary blobs dataset",
20+
"license": "[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)",
21+
"dateOfLastModification": "2025-04-14",
22+
"dateOfPublication": "2025-04-14",
23+
"sourceCodeUrl": "https://github.com/XanaduAI/scaling-gqml",
24+
"tags": [],
25+
"title": "Binary Blobs",
26+
"usingThisDataset": {
27+
"$path": "using_this_dataset.md"
28+
},
29+
"heroImage": "https://assets.cloud.pennylane.ai/datasets/generic/hero/Datasets_GenericHero_2.png",
30+
"thumbnail": "Dataset_QML_BinaryBlobs_Thumb.png",
31+
"extra": {}
32+
}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Data for benchmarking machine learning models, taken from
2+
[Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits](https://arxiv.org/abs/2503.02934).
3+
The Binary Blobs dataset can be seen as a binary version of the [Gaussian blobs](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
4+
dataset for continuous data. This dataset is the specific dataset that appeared in the above paper; more general datasets can be
5+
constructed via the [qml_benchmarks](https://github.com/XanaduAI/qml-benchmarks) package.
6+
7+
**Description of the dataset**
8+
9+
The dataset consists of bit strings of length 16. To generate samples, one of the 8 following patterns
10+
is selected at random (where data has been reshaped to size (4,4))
11+
12+
<p style="text-align: center"><img src="https://assets.cloud.pennylane.ai/datasets/generic/using_this_dataset/8blobs.png" alt="patterns" width="70%"/></p>
13+
14+
Each bit is then flipped with 5% probability. There are 5000 training points and 10000 test points.
15+
If needed, labels that correspond to the 8 patterns can also be accessed. Please see the ``Source code`` tab to check how the data was generated.
16+
17+
**Example usage**
18+
19+
```pycon
20+
>>> [ds] = qml.data.load("other", name="binary-blobs")
21+
>>>
22+
>>> blob_vector = ds.train['inputs'][0]
23+
>>> blob_array = np.reshape(blob_vector, (4,4))
24+
>>> print(blob_array)
25+
[[0. 0. 1. 1.]
26+
[0. 0. 1. 1.]
27+
[0. 0. 0. 0.]
28+
[0. 0. 0. 0.]]
29+
>>> ds.train['labels'][0]
30+
1
31+
>>> blob_vector = ds.test('inputs')[10]
32+
>>> blob_array = np.reshape(blob_vector, (4,4))
33+
>>> print(blob_array)
34+
[[1. 0. 1. 0.]
35+
[0. 1. 0. 0.]
36+
[0. 0. 1. 0.]
37+
[0. 0. 0. 1.]]
38+
>>> ds.test['labels'][10]
39+
5
40+
```
Loading
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
@misc{recioarmengol2025trainclassicaldeployquantum,
2+
title={Train on classical, deploy on quantum: scaling generative quantum machine learning to a thousand qubits},
3+
author={Erik Recio-Armengol and Shahnawaz Ahmed and Joseph Bowles},
4+
year={2025},
5+
eprint={2503.02934},
6+
archivePrefix={arXiv},
7+
primaryClass={quant-ph},
8+
url={https://arxiv.org/abs/2503.02934},
9+
},
10+
11+
@misc{bowles2025dwave,
12+
title={D-Wave},
13+
author={Joseph Bowles},
14+
howpublished={\urlhttps://pennylane.ai/datasets/other/d-wave},
15+
year={2025}
16+
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
{
2+
"slug": "d-wave",
3+
"class": {
4+
"slug": "d-wave",
5+
"name": "DWave",
6+
"attributeList": [
7+
{
8+
"name": "train",
9+
"pythonType": "dict",
10+
"doc": "Array corresponding to 60,000 bit strings of length 484. Each bit string corresponds to a measurement on 484 qubits."
11+
},
12+
{
13+
"name": "test",
14+
"pythonType": "dict",
15+
"doc": "Array corresponding to 10,000 bit strings of length 484. Each bit string corresponds to a measurement on 484 qubits."
16+
}
17+
]
18+
},
19+
"collection": {
20+
"$path": "/other/qml-benchmarks/_meta/collection.json"
21+
},
22+
"data": [
23+
{
24+
"dataUrl": "https://datasets.cloud.pennylane.ai/user/34c2da34-5512-41de-bb27-628b5c03e543",
25+
"parameters": {
26+
"name": "d-wave"
27+
},
28+
"extra": {}
29+
}
30+
],
31+
"downloadName": "d-wave",
32+
"features": [
33+
{
34+
"slug": "dataset-attributes",
35+
"title": "Dataset Attributes",
36+
"type": "DATA",
37+
"content": {
38+
"$path": "features/dataset-attributes.md"
39+
}
40+
}
41+
],
42+
"meta": {
43+
"$path": "meta.json"
44+
},
45+
"parameterTree": null,
46+
"extra": {}
47+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
|Name|Type|Description|
2+
|-|-|-|
3+
|`train`|`numpy.ndarray`|Array corresponding to 60,000 bit strings of length 484. Each bit string corresponds to a measurement on 484 qubits.|
4+
|`test`|`numpy.ndarray`|Array corresponding to 10,000 bit strings of length 484. Each bit string corresponds to a measurement on 484 qubits.|
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"abstract": null,
3+
"authors": [
4+
{
5+
"name": "Joseph Bowles",
6+
"username": "josephbowles"
7+
}
8+
],
9+
"basedOnPapers": false,
10+
"citation": {
11+
"$path": "citation.txt"
12+
},
13+
"changelog": [
14+
"version 0.1 : initial public release"
15+
],
16+
"description": "Dataset of sampled spin configuration from D-Wave advantage system",
17+
"license": "[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)",
18+
"dateOfLastModification": "2025-04-14",
19+
"dateOfPublication": "2025-04-14",
20+
"sourceCodeUrl": "https://github.com/XanaduAI/scaling-gqml",
21+
"tags": [],
22+
"title": "D-Wave",
23+
"usingThisDataset": {
24+
"$path": "using_this_dataset.md"
25+
},
26+
"heroImage": "https://assets.cloud.pennylane.ai/datasets/generic/hero/Datasets_GenericHero_2.png",
27+
"thumbnail": "Dataset_QML_DWave_Thumb.png",
28+
"extra": {}
29+
}

0 commit comments

Comments
 (0)