You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files)
In some circumstances, it may be useful to move or copy files into Dataverse's storage manually or via external tools and then add then to a dataset (i.e. without involving Dataverse in the file transfer itself).
1518
+
Two API calls are available for this use case to add files to a dataset or to replace files that were already in the dataset.
1519
+
These calls were developed as part of Dataverse's direct upload mechanism and are detailed in :doc:`/developers/s3-direct-upload-api`.
1520
+
1514
1521
Report the data (file) size of a Dataset
1515
1522
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1516
1523
@@ -2366,48 +2373,6 @@ The fully expanded example above (without environment variables) looks like this
2366
2373
Note: The ``id`` returned in the json response is the id of the file metadata version.
2367
2374
2368
2375
2369
-
2370
-
Adding File Metadata
2371
-
~~~~~~~~~~~~~~~~~~~~
2372
-
2373
-
This API call requires a ``jsonString`` expressing the metadata of multiple files. It adds file metadata to the database table where the file has already been copied to the storage.
2374
-
2375
-
The jsonData object includes values for:
2376
-
2377
-
* "description" - A description of the file
2378
-
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
2379
-
* "storageIdentifier" - String
2380
-
* "fileName" - String
2381
-
* "mimeType" - String
2382
-
* "fixity/checksum" either:
2383
-
2384
-
* "md5Hash" - String with MD5 hash value, or
2385
-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
2386
-
2387
-
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
2388
-
2389
-
A curl example using an ``PERSISTENT_ID``
2390
-
2391
-
* ``SERVER_URL`` - e.g. https://demo.dataverse.org
2392
-
* ``API_TOKEN`` - API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header. For more details, see the :doc:`auth` section.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
+101-3
Original file line number
Diff line number
Diff line change
@@ -122,7 +122,7 @@ To add multiple Uploaded Files to the Dataset
122
122
---------------------------------------------
123
123
124
124
Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter.
125
-
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:
125
+
jsonData for this call is an array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:
126
126
127
127
* "description" - A description of the file
128
128
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
@@ -154,7 +154,7 @@ Replacing an existing file in the Dataset
154
154
-----------------------------------------
155
155
156
156
Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
157
-
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must also include values for:
157
+
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must include values for:
158
158
159
159
* "storageIdentifier" - String, as specified in prior calls
160
160
* "fileName" - String
@@ -172,9 +172,107 @@ Note that the API call does not validate that the file matches the hash value su
curl -X POST -H "X-Dataverse-key: $API_TOKEN""$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
178
178
179
179
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
180
180
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
181
+
182
+
Replacing multiple existing files in the Dataset
183
+
------------------------------------------------
184
+
185
+
Once the replacement files exist in the s3 bucket, a final API call is needed to register them as replacements for existing files. In this API call, additional metadata is added using the "jsonData" parameter.
186
+
jsonData for this call is array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must include some additional values:
187
+
188
+
* "fileToReplaceId" - the id of the file being replaced
189
+
* "forceReplace" - whether to replace a file with one of a different mimetype (optional, default is false)
190
+
* "description" - A description of the file
191
+
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
192
+
* "storageIdentifier" - String
193
+
* "fileName" - String
194
+
* "mimeType" - String
195
+
* "fixity/checksum" either:
196
+
197
+
* "md5Hash" - String with MD5 hash value, or
198
+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
199
+
200
+
201
+
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512
curl -X POST -H "X-Dataverse-key: $API_TOKEN""$SERVER_URL/api/datasets/:persistentId/replaceFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
211
+
212
+
The JSON object returned as a response from this API call includes a "data" that indicates how many of the file replacements succeeded and provides per-file error messages for those that don't, e.g.
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
278
+
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
0 commit comments