You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
+21-14
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Direct upload involves a series of three activities, each involving interacting
18
18
This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload.
19
19
Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.)
20
20
21
-
21
+
22
22
Requesting Direct Upload of a DataFile
23
23
--------------------------------------
24
24
To initiate a transfer of a file to S3, make a call to the Dataverse installation indicating the size of the file to upload. The response will include a pre-signed URL(s) that allow the client to transfer the file. Pre-signed URLs include a short-lived token authorizing the action represented by the URL.
@@ -29,7 +29,7 @@ To initiate a transfer of a file to S3, make a call to the Dataverse installatio
The response to this call, assuming direct uploads are enabled, will be one of two forms:
@@ -71,7 +71,12 @@ The call will return a 400 (BAD REQUEST) response if the file is larger than wha
71
71
72
72
In the example responses above, the URLs, which are very long, have been omitted. These URLs reference the S3 server and the specific object identifier that will be used, starting with, for example, https://demo-dataverse-bucket.s3.amazonaws.com/10.5072/FK2FOQPJS/177883b000e-49cedef268ac?...
73
73
74
-
The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
74
+
.. _direct-upload-to-s3:
75
+
76
+
Upload Files to S3
77
+
------------------
78
+
79
+
The client must then use the URL(s) to PUT the file, or if the file is larger than the specified partSize, parts of the file.
75
80
76
81
In the single part case, only one call to the supplied URL is required:
77
82
@@ -88,21 +93,23 @@ Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the
88
93
Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.
89
94
90
95
In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
91
-
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
96
+
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
92
97
93
98
To successfully conclude the multipart upload, the client must call the 'complete' URI, sending a json object including the part eTags:
If you encounter an ``HTTP 501 Not Implemented`` error, ensure the ``Content-Length`` header is correctly set to the file or chunk size. This issue may arise when streaming files or chunks asynchronously to S3 via ``PUT`` requests, particularly if the library or tool you're using doesn't set the ``Content-Length`` header automatically.
112
+
106
113
.. _direct-add-to-dataset-api:
107
114
108
115
Adding the Uploaded File to the Dataset
@@ -114,10 +121,10 @@ jsonData normally includes information such as a file description, tags, provena
114
121
* "storageIdentifier" - String, as specified in prior calls
115
122
* "fileName" - String
116
123
* "mimeType" - String
117
-
* fixity/checksum: either:
124
+
* fixity/checksum: either:
118
125
119
126
* "md5Hash" - String with MD5 hash value, or
120
-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
127
+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
121
128
122
129
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512
123
130
@@ -129,7 +136,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
curl -X POST -H "X-Dataverse-key: $API_TOKEN""$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
132
-
139
+
133
140
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
134
141
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
135
142
@@ -173,10 +180,10 @@ jsonData normally includes information such as a file description, tags, provena
173
180
* "storageIdentifier" - String, as specified in prior calls
174
181
* "fileName" - String
175
182
* "mimeType" - String
176
-
* fixity/checksum: either:
183
+
* fixity/checksum: either:
177
184
178
185
* "md5Hash" - String with MD5 hash value, or
179
-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
186
+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
180
187
181
188
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512.
182
189
Note that the API call does not validate that the file matches the hash value supplied. If a Dataverse instance is configured to validate file fixity hashes at publication time, a mismatch would be caught at that time and cause publication to fail.
@@ -189,7 +196,7 @@ Note that the API call does not validate that the file matches the hash value su
curl -X POST -H "X-Dataverse-key: $API_TOKEN""$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
192
-
199
+
193
200
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
194
201
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifier must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
0 commit comments