Skip to content

Commit f63f0e8

Browse files
authored
Merge pull request #9018 from GlobalDataverseCommunityConsortium/GDCC/9005-replaceFiles_api_call
GDCC/9005 replace files api call
2 parents 03afc7f + c22545b commit f63f0e8

File tree

10 files changed

+518
-208
lines changed

10 files changed

+518
-208
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
9005
2+
3+
Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files)

doc/sphinx-guides/source/api/native-api.rst

+7-42
Original file line numberDiff line numberDiff line change
@@ -1511,6 +1511,13 @@ The fully expanded example above (without environment variables) looks like this
15111511
15121512
curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.5072/FK2/J8SJZB -F 'jsonData={"description":"A remote image.","storageIdentifier":"trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png","checksumType":"MD5","md5Hash":"509ef88afa907eaf2c17c1c8d8fde77e","label":"testlogo.png","fileName":"testlogo.png","mimeType":"image/png"}'
15131513
1514+
Adding Files To a Dataset via Other Tools
1515+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1516+
1517+
In some circumstances, it may be useful to move or copy files into Dataverse's storage manually or via external tools and then add then to a dataset (i.e. without involving Dataverse in the file transfer itself).
1518+
Two API calls are available for this use case to add files to a dataset or to replace files that were already in the dataset.
1519+
These calls were developed as part of Dataverse's direct upload mechanism and are detailed in :doc:`/developers/s3-direct-upload-api`.
1520+
15141521
Report the data (file) size of a Dataset
15151522
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15161523

@@ -2366,48 +2373,6 @@ The fully expanded example above (without environment variables) looks like this
23662373
Note: The ``id`` returned in the json response is the id of the file metadata version.
23672374

23682375

2369-
2370-
Adding File Metadata
2371-
~~~~~~~~~~~~~~~~~~~~
2372-
2373-
This API call requires a ``jsonString`` expressing the metadata of multiple files. It adds file metadata to the database table where the file has already been copied to the storage.
2374-
2375-
The jsonData object includes values for:
2376-
2377-
* "description" - A description of the file
2378-
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
2379-
* "storageIdentifier" - String
2380-
* "fileName" - String
2381-
* "mimeType" - String
2382-
* "fixity/checksum" either:
2383-
2384-
* "md5Hash" - String with MD5 hash value, or
2385-
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
2386-
2387-
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
2388-
2389-
A curl example using an ``PERSISTENT_ID``
2390-
2391-
* ``SERVER_URL`` - e.g. https://demo.dataverse.org
2392-
* ``API_TOKEN`` - API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header. For more details, see the :doc:`auth` section.
2393-
* ``PERSISTENT_IDENTIFIER`` - Example: ``doi:10.5072/FK2/7U7YBV``
2394-
2395-
.. code-block:: bash
2396-
2397-
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
2398-
export SERVER_URL=https://demo.dataverse.org
2399-
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
2400-
export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \
2401-
{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]"
2402-
2403-
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
2404-
2405-
The fully expanded example above (without environment variables) looks like this:
2406-
2407-
.. code-block:: bash
2408-
2409-
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/:persistentId/addFiles?persistentId=doi:10.5072/FK2/7U7YBV -F jsonData='[{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}, {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]'
2410-
24112376
Updating File Metadata
24122377
~~~~~~~~~~~~~~~~~~~~~~
24132378

doc/sphinx-guides/source/developers/s3-direct-upload-api.rst

+101-3
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ To add multiple Uploaded Files to the Dataset
122122
---------------------------------------------
123123

124124
Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter.
125-
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:
125+
jsonData for this call is an array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:
126126

127127
* "description" - A description of the file
128128
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
@@ -154,7 +154,7 @@ Replacing an existing file in the Dataset
154154
-----------------------------------------
155155

156156
Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
157-
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must also include values for:
157+
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must include values for:
158158

159159
* "storageIdentifier" - String, as specified in prior calls
160160
* "fileName" - String
@@ -172,9 +172,107 @@ Note that the API call does not validate that the file matches the hash value su
172172
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
173173
export SERVER_URL=https://demo.dataverse.org
174174
export FILE_IDENTIFIER=5072
175-
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'forceReplace':'true', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
175+
export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "forceReplace":"true", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}'
176176
177177
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
178178
179179
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
180180
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
181+
182+
Replacing multiple existing files in the Dataset
183+
------------------------------------------------
184+
185+
Once the replacement files exist in the s3 bucket, a final API call is needed to register them as replacements for existing files. In this API call, additional metadata is added using the "jsonData" parameter.
186+
jsonData for this call is array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must include some additional values:
187+
188+
* "fileToReplaceId" - the id of the file being replaced
189+
* "forceReplace" - whether to replace a file with one of a different mimetype (optional, default is false)
190+
* "description" - A description of the file
191+
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
192+
* "storageIdentifier" - String
193+
* "fileName" - String
194+
* "mimeType" - String
195+
* "fixity/checksum" either:
196+
197+
* "md5Hash" - String with MD5 hash value, or
198+
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
199+
200+
201+
The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512
202+
203+
.. code-block:: bash
204+
205+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
206+
export SERVER_URL=https://demo.dataverse.org
207+
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
208+
export JSON_DATA='[{"fileToReplaceId": 10, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}},{"fileToReplaceId": 11, "forceReplace": true, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]'
209+
210+
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/replaceFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
211+
212+
The JSON object returned as a response from this API call includes a "data" that indicates how many of the file replacements succeeded and provides per-file error messages for those that don't, e.g.
213+
214+
.. code-block::
215+
216+
{
217+
"status": "OK",
218+
"data": {
219+
"Files": [
220+
{
221+
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42",
222+
"errorMessage": "Bad Request:The file to replace does not belong to this dataset.",
223+
"fileDetails": {
224+
"fileToReplaceId": 10,
225+
"description": "My description.",
226+
"directoryLabel": "data/subdir1",
227+
"categories": [
228+
"Data"
229+
],
230+
"restrict": "false",
231+
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42",
232+
"fileName": "file1.Bin",
233+
"mimeType": "application/octet-stream",
234+
"checksum": {
235+
"@type": "SHA-1",
236+
"@value": "123456"
237+
}
238+
}
239+
},
240+
{
241+
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53",
242+
"successMessage": "Replaced successfully in the dataset",
243+
"fileDetails": {
244+
"description": "My description.",
245+
"label": "file2.txt",
246+
"restricted": false,
247+
"directoryLabel": "data/subdir1",
248+
"categories": [
249+
"Data"
250+
],
251+
"dataFile": {
252+
"persistentId": "",
253+
"pidURL": "",
254+
"filename": "file2.txt",
255+
"contentType": "text/plain",
256+
"filesize": 2407,
257+
"description": "My description.",
258+
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53",
259+
"rootDataFileId": 11,
260+
"previousDataFileId": 11,
261+
"checksum": {
262+
"type": "SHA-1",
263+
"value": "123789"
264+
}
265+
}
266+
}
267+
}
268+
],
269+
"Result": {
270+
"Total number of files": 2,
271+
"Number of files successfully replaced": 1
272+
}
273+
}
274+
}
275+
276+
277+
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
278+
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

+4
Original file line numberDiff line numberDiff line change
@@ -1544,6 +1544,10 @@ public void finalizeFileDelete(Long dataFileId, String storageLocation) throws I
15441544
throw new IOException("Attempted to permanently delete a physical file still associated with an existing DvObject "
15451545
+ "(id: " + dataFileId + ", location: " + storageLocation);
15461546
}
1547+
if(storageLocation == null || storageLocation.isBlank()) {
1548+
throw new IOException("Attempted to delete a physical file with no location "
1549+
+ "(id: " + dataFileId + ", location: " + storageLocation);
1550+
}
15471551
StorageIO<DvObject> directStorageAccess = DataAccess.getDirectStorageIO(storageLocation);
15481552
directStorageAccess.delete();
15491553
}

src/main/java/edu/harvard/iq/dataverse/EditDatafilesPage.java

+1-2
Original file line numberDiff line numberDiff line change
@@ -590,8 +590,7 @@ public String init() {
590590
datafileService,
591591
permissionService,
592592
commandEngine,
593-
systemConfig,
594-
licenseServiceBean);
593+
systemConfig);
595594

596595
fileReplacePageHelper = new FileReplacePageHelper(addReplaceFileHelper,
597596
dataset,

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

+73-4
Original file line numberDiff line numberDiff line change
@@ -2452,8 +2452,7 @@ public Response addFileToDataset(@PathParam("id") String idSupplied,
24522452
fileService,
24532453
permissionSvc,
24542454
commandEngine,
2455-
systemConfig,
2456-
licenseSvc);
2455+
systemConfig);
24572456

24582457

24592458
//-------------------
@@ -3388,14 +3387,84 @@ public Response addFilesToDataset(@PathParam("id") String idSupplied,
33883387
this.fileService,
33893388
this.permissionSvc,
33903389
this.commandEngine,
3391-
this.systemConfig,
3392-
this.licenseSvc
3390+
this.systemConfig
33933391
);
33943392

33953393
return addFileHelper.addFiles(jsonData, dataset, authUser);
33963394

33973395
}
33983396

3397+
/**
3398+
* Replace multiple Files to an existing Dataset
3399+
*
3400+
* @param idSupplied
3401+
* @param jsonData
3402+
* @return
3403+
*/
3404+
@POST
3405+
@Path("{id}/replaceFiles")
3406+
@Consumes(MediaType.MULTIPART_FORM_DATA)
3407+
public Response replaceFilesInDataset(@PathParam("id") String idSupplied,
3408+
@FormDataParam("jsonData") String jsonData) {
3409+
3410+
if (!systemConfig.isHTTPUpload()) {
3411+
return error(Response.Status.SERVICE_UNAVAILABLE, BundleUtil.getStringFromBundle("file.api.httpDisabled"));
3412+
}
3413+
3414+
// -------------------------------------
3415+
// (1) Get the user from the API key
3416+
// -------------------------------------
3417+
User authUser;
3418+
try {
3419+
authUser = findUserOrDie();
3420+
} catch (WrappedResponse ex) {
3421+
return error(Response.Status.FORBIDDEN, BundleUtil.getStringFromBundle("file.addreplace.error.auth")
3422+
);
3423+
}
3424+
3425+
// -------------------------------------
3426+
// (2) Get the Dataset Id
3427+
// -------------------------------------
3428+
Dataset dataset;
3429+
3430+
try {
3431+
dataset = findDatasetOrDie(idSupplied);
3432+
} catch (WrappedResponse wr) {
3433+
return wr.getResponse();
3434+
}
3435+
3436+
dataset.getLocks().forEach(dl -> {
3437+
logger.info(dl.toString());
3438+
});
3439+
3440+
//------------------------------------
3441+
// (2a) Make sure dataset does not have package file
3442+
// --------------------------------------
3443+
3444+
for (DatasetVersion dv : dataset.getVersions()) {
3445+
if (dv.isHasPackageFile()) {
3446+
return error(Response.Status.FORBIDDEN,
3447+
BundleUtil.getStringFromBundle("file.api.alreadyHasPackageFile")
3448+
);
3449+
}
3450+
}
3451+
3452+
DataverseRequest dvRequest = createDataverseRequest(authUser);
3453+
3454+
AddReplaceFileHelper addFileHelper = new AddReplaceFileHelper(
3455+
dvRequest,
3456+
this.ingestService,
3457+
this.datasetService,
3458+
this.fileService,
3459+
this.permissionSvc,
3460+
this.commandEngine,
3461+
this.systemConfig
3462+
);
3463+
3464+
return addFileHelper.replaceFiles(jsonData, dataset, authUser);
3465+
3466+
}
3467+
33993468
/**
34003469
* API to find curation assignments and statuses
34013470
*

0 commit comments

Comments
 (0)