Skip to content

Commit eabe8e2

Browse files
committed
Merge branch 'develop' into 10517-dataset-types #10517
2 parents 68e4a60 + a6b5498 commit eabe8e2

File tree

2 files changed

+23
-16
lines changed

2 files changed

+23
-16
lines changed

doc/sphinx-guides/source/developers/s3-direct-upload-api.rst

+21-14
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Direct upload involves a series of three activities, each involving interacting
1818
This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload.
1919
Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.)
2020

21-
21+
2222
Requesting Direct Upload of a DataFile
2323
--------------------------------------
2424
To initiate a transfer of a file to S3, make a call to the Dataverse installation indicating the size of the file to upload. The response will include a pre-signed URL(s) that allow the client to transfer the file. Pre-signed URLs include a short-lived token authorizing the action represented by the URL.
@@ -29,7 +29,7 @@ To initiate a transfer of a file to S3, make a call to the Dataverse installatio
2929
export SERVER_URL=https://demo.dataverse.org
3030
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV
3131
export SIZE=1000000000
32-
32+
3333
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/uploadurls?persistentId=$PERSISTENT_IDENTIFIER&size=$SIZE"
3434
3535
The response to this call, assuming direct uploads are enabled, will be one of two forms:
@@ -71,7 +71,12 @@ The call will return a 400 (BAD REQUEST) response if the file is larger than wha
7171

7272
In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...
7373

74-
The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
74+
.. _direct-upload-to-s3:
75+
76+
Upload Files to S3
77+
------------------
78+
79+
The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
7580

7681
In the single part case, only one call to the supplied URL is required:
7782

@@ -88,21 +93,23 @@ Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the
8893
Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.
8994

9095
In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
91-
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
96+
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
9297

9398
To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:
9499

95100
.. code-block:: bash
96101
97102
curl -X PUT "$SERVER_URL/api/datasets/mpload?..." -d '{"1":"<eTag1 string>","2":"<eTag2 string>","3":"<eTag3 string>","4":"<eTag4 string>","5":"<eTag5 string>"}'
98-
103+
99104
If the client is unable to complete the multipart upload, it should call the abort URL:
100105

101106
.. code-block:: bash
102-
107+
103108
curl -X DELETE "$SERVER_URL/api/datasets/mpload?..."
104-
105-
109+
110+
.. note::
111+
If you encounter an ``HTTP 501 Not Implemented`` error, ensure the ``Content-Length`` header is correctly set to the file or chunk size. This issue may arise when streaming files or chunks asynchronously to S3 via ``PUT`` requests, particularly if the library or tool you're using doesn't set the ``Content-Length`` header automatically.
112+
106113
.. _direct-add-to-dataset-api:
107114

108115
Adding the Uploaded File to the Dataset
@@ -114,10 +121,10 @@ jsonData normally includes information such as a file description, tags, provena
114121
* "storageIdentifier" - String, as specified in prior calls
115122
* "fileName" - String
116123
* "mimeType" - String
117-
* fixity/checksum: either:
124+
* fixity/checksum: either:
118125

119126
* "md5Hash" - String with MD5 hash value, or
120-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
127+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
121128

122129
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512
123130

@@ -129,7 +136,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
129136
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
130137
131138
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
132-
139+
133140
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
134141
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
135142

@@ -173,10 +180,10 @@ jsonData normally includes information such as a file description, tags, provena
173180
* "storageIdentifier" - String, as specified in prior calls
174181
* "fileName" - String
175182
* "mimeType" - String
176-
* fixity/checksum: either:
183+
* fixity/checksum: either:
177184

178185
* "md5Hash" - String with MD5 hash value, or
179-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
186+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
180187

181188
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512.
182189
Note that the API call does not validate that the file matches the hash value supplied. If a Dataverse instance is configured to validate file fixity hashes at publication time, a mismatch would be caught at that time and cause publication to fail.
@@ -189,7 +196,7 @@ Note that the API call does not validate that the file matches the hash value su
189196
export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "forceReplace":"true", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}'
190197
191198
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
192-
199+
193200
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
194201
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
195202

src/main/java/propertyFiles/Bundle.properties

+2-2
Original file line numberDiff line numberDiff line change
@@ -2456,7 +2456,7 @@ permission.publishDataverse=Publish a dataverse
24562456
permission.managePermissionsDataFile=Manage permissions for a file
24572457
permission.managePermissionsDataset=Manage permissions for a dataset
24582458
permission.managePermissionsDataverse=Manage permissions for a dataverse
2459-
permission.editDataset=Edit a dataset's metadata
2459+
permission.editDataset=Edit a dataset's metadata, license, terms and add/delete files
24602460
permission.editDataverse=Edit a dataverse's metadata, facets, customization, and templates
24612461
permission.downloadFile=Download a file
24622462
permission.viewUnpublishedDataset=View an unpublished dataset and its files
@@ -2810,7 +2810,7 @@ permission.PublishDataverse.desc=Publish a dataverse
28102810
permission.ManageFilePermissions.desc=Manage permissions for a file
28112811
permission.ManageDatasetPermissions.desc=Manage permissions for a dataset
28122812
permission.ManageDataversePermissions.desc=Manage permissions for a dataverse
2813-
permission.EditDataset.desc=Edit a dataset's metadata
2813+
permission.EditDataset.desc=Edit a dataset's metadata, license, terms and add/delete files
28142814
permission.EditDataverse.desc=Edit a dataverse's metadata, facets, customization, and templates
28152815
permission.DownloadFile.desc=Download a file
28162816
permission.ViewUnpublishedDataset.desc=View an unpublished dataset and its files

0 commit comments

Comments
 (0)