Skip to content

Commit ed8e966

Browse files
authored
Merge pull request #9003 from ErykKul/9002_allow_direct_upload_setting
9002 allow direct upload setting
2 parents f8e8d82 + 3ecd118 commit ed8e966

File tree

4 files changed

+37
-26
lines changed

4 files changed

+37
-26
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
A Dataverse installation can be now be configured to allow out-of-band upload by setting the `dataverse.files.<id>.upload-out-of-band` JVM option to `true`.
2+
3+
By default, Dataverse supports uploading files via the [add a file to a dataset](https://dataverse-guide--9003.org.readthedocs.build/en/9003/api/native-api.html#add-a-file-to-a-dataset) API. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
4+
5+
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the [Adding the Uploaded file to the Dataset](https://dataverse-guide--9003.org.readthedocs.build/en/9003/developers/s3-direct-upload-api.html#adding-the-uploaded-file-to-the-dataset) API call (described in the [Direct DataFile Upload/Replace API](https://dataverse-guide--9003.org.readthedocs.build/en/9003/developers/s3-direct-upload-api.html) page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.

doc/sphinx-guides/source/developers/s3-direct-upload-api.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
115115
116116
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/add?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
117117
118-
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
118+
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
119119
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
120120

121121
To add multiple Uploaded Files to the Dataset
@@ -146,7 +146,7 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
146146
147147
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
148148
149-
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
149+
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
150150
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
151151

152152

@@ -176,7 +176,7 @@ Note that the API call does not validate that the file matches the hash value su
176176
177177
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
178178
179-
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
179+
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
180180
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
181181

182182
Replacing multiple existing files in the Dataset
@@ -274,5 +274,5 @@ The JSON object returned as a response from this API call includes a "data" that
274274
}
275275
276276
277-
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
277+
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method. Enabling out-of-band uploads is described at :ref:`file-storage` in the Configuration Guide.
278278
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

doc/sphinx-guides/source/installation/config.rst

+26-21
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,10 @@ A Dataverse installation can alternately store files in a Swift or S3-compatible
508508

509509
A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web-accessible trusted remote store.
510510

511+
A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\<id\>.upload-out-of-band`` JVM option to ``true``.
512+
By default, Dataverse supports uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server).
513+
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`Adding the Uploaded file to the Dataset <direct-add-to-dataset-api>` API call (described in the :doc:`/developers/s3-direct-upload-api` page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.
514+
511515
The following sections describe how to set up various types of stores and how to configure for multiple stores.
512516

513517
Multi-store Basics
@@ -800,27 +804,28 @@ List of S3 Storage Options
800804
.. table::
801805
:align: left
802806

803-
=========================================== ================== ========================================================================== =============
804-
JVM Option Value Description Default value
805-
=========================================== ================== ========================================================================== =============
806-
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
807-
dataverse.files.<id>.type ``s3`` **Required** to mark this storage as S3 based. (none)
808-
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
809-
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
810-
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
811-
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
812-
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
813-
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
814-
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
815-
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
816-
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
817-
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
818-
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
819-
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
820-
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
821-
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
822-
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
823-
=========================================== ================== ========================================================================== =============
807+
=========================================== ================== =================================================================================== =============
808+
JVM Option Value Description Default value
809+
=========================================== ================== =================================================================================== =============
810+
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
811+
dataverse.files.<id>.type ``s3`` **Required** to mark this storage as S3 based. (none)
812+
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
813+
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
814+
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
815+
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset in the S3 store. ``false``
816+
dataverse.files.<id>.upload-out-of-band ``true``/``false`` Allow upload of files by out-of-band methods (using some tool other than Dataverse) ``false``
817+
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
818+
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
819+
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
820+
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
821+
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
822+
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
823+
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
824+
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
825+
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
826+
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
827+
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
828+
=========================================== ================== =================================================================================== =============
824829

825830
.. table::
826831
:align: left

src/main/java/edu/harvard/iq/dataverse/dataaccess/StorageIO.java

+2-1
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,8 @@ public static String getDriverPrefix(String driverId) {
606606
}
607607

608608
public static boolean isDirectUploadEnabled(String driverId) {
609-
return Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-redirect"));
609+
return (DataAccess.S3.equals(driverId) && Boolean.parseBoolean(System.getProperty("dataverse.files." + DataAccess.S3 + ".upload-redirect"))) ||
610+
Boolean.parseBoolean(System.getProperty("dataverse.files." + driverId + ".upload-out-of-band"));
610611
}
611612

612613
//Check that storageIdentifier is consistent with store's config

0 commit comments

Comments
 (0)