Skip to content

Commit 6be46c6

Browse files
committed
Merge branch 'develop' into 10517-dataset-types #10517
Conflicts: src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java tests/integration-tests.txt
2 parents 67e9971 + cf174b2 commit 6be46c6

25 files changed

+1021
-133
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
### Improved JSON Schema validation for datasets
2+
3+
Enhanced JSON schema validation with checks for required and allowed child objects, type checking for field types including `primitive`, `compound` and `controlledVocabulary`. More user-friendly error messages to help pinpoint the issues in the dataset JSON. See [Retrieve a Dataset JSON Schema for a Collection](https://guides.dataverse.org/en/6.3/api/native-api.html#retrieve-a-dataset-json-schema-for-a-collection) in the API Guide and PR #10543.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
New optional query parameter "returnDetails" added to "dataverses/{identifier}/facets/" endpoint to include detailed information of each DataverseFacet.
2+
3+
New endpoint "datasetfields/facetables" that lists all facetable dataset fields defined in the installation.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## Release Highlights
2+
3+
### Pre-Publish File DOI Reservation with DataCite
4+
5+
Dataverse installations using DataCite (or other persistent identifier (PID) Providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "``ProxyPass / ajp://localhost:8009/ timeout=``" setting in the recommended Dataverse configuration).
6+
7+
## Major Use Cases
8+
9+
- Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time. (Issue #7068, PR #7334)

doc/sphinx-guides/source/api/native-api.rst

+56-6
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,22 @@ The fully expanded example above (without environment variables) looks like this
224224
225225
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/facets"
226226
227+
By default, this endpoint will return an array including the facet names. If more detailed information is needed, we can set the query parameter ``returnDetails`` to ``true``, which will return the display name and id in addition to the name for each facet:
228+
229+
.. code-block:: bash
230+
231+
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
232+
export SERVER_URL=https://demo.dataverse.org
233+
export ID=root
234+
235+
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/dataverses/$ID/facets?returnDetails=true"
236+
237+
The fully expanded example above (without environment variables) looks like this:
238+
239+
.. code-block:: bash
240+
241+
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/dataverses/root/facets?returnDetails=true"
242+
227243
Set Facets for a Dataverse Collection
228244
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
229245

@@ -566,9 +582,7 @@ The fully expanded example above (without environment variables) looks like this
566582
Retrieve a Dataset JSON Schema for a Collection
567583
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
568584

569-
Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset. This
570-
first version of the schema only includes required elements and fields. In the future we plan to improve the schema by adding controlled
571-
vocabulary and more robust dataset field format testing:
585+
Retrieves a JSON schema customized for a given collection in order to validate a dataset JSON file prior to creating the dataset:
572586

573587
.. code-block:: bash
574588
@@ -593,8 +607,22 @@ While it is recommended to download a copy of the JSON Schema from the collectio
593607
Validate Dataset JSON File for a Collection
594608
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
595609

596-
Validates a dataset JSON file customized for a given collection prior to creating the dataset. The validation only tests for json formatting
597-
and the presence of required elements:
610+
Validates a dataset JSON file customized for a given collection prior to creating the dataset.
611+
612+
The validation tests for:
613+
614+
- JSON formatting
615+
- required fields
616+
- typeClass must follow these rules:
617+
618+
- if multiple = true then value must be a list
619+
- if typeClass = ``primitive`` the value object is a String or a List of Strings depending on the multiple flag
620+
- if typeClass = ``compound`` the value object is a FieldDTO or a List of FieldDTOs depending on the multiple flag
621+
- if typeClass = ``controlledVocabulary`` the values are checked against the list of allowed values stored in the database
622+
- typeName validations (child objects with their required and allowed typeNames are configured automatically by the database schema). Examples include:
623+
624+
- dsDescription validation includes checks for typeName = ``dsDescriptionValue`` (required) and ``dsDescriptionDate`` (optional)
625+
- datasetContact validation includes checks for typeName = ``datasetContactName`` (required) and ``datasetContactEmail``; ``datasetContactAffiliation`` (optional)
598626

599627
.. code-block:: bash
600628
@@ -4826,6 +4854,28 @@ The fully expanded example above (without environment variables) looks like this
48264854
48274855
curl "https://demo.dataverse.org/api/metadatablocks/citation"
48284856
4857+
.. _dataset-fields-api:
4858+
4859+
Dataset Fields
4860+
--------------
4861+
4862+
List All Facetable Dataset Fields
4863+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4864+
4865+
List all facetable dataset fields defined in the installation.
4866+
4867+
.. code-block:: bash
4868+
4869+
export SERVER_URL=https://demo.dataverse.org
4870+
4871+
curl "$SERVER_URL/api/datasetfields/facetables"
4872+
4873+
The fully expanded example above (without environment variables) looks like this:
4874+
4875+
.. code-block:: bash
4876+
4877+
curl "https://demo.dataverse.org/api/datasetfields/facetables"
4878+
48294879
.. _Notifications:
48304880
48314881
Notifications
@@ -5242,7 +5292,7 @@ The fully expanded example above (without environment variables) looks like this
52425292
Reserve a PID
52435293
~~~~~~~~~~~~~
52445294
5245-
Reserved a PID for a dataset. A superuser API token is required.
5295+
Reserve a PID for a dataset if not yet registered, and, if FilePIDs are enabled, reserve any file PIDs that are not yet registered. A superuser API token is required.
52465296
52475297
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
52485298
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
{
2+
"datasetVersion": {
3+
"license": {
4+
"name": "CC0 1.0",
5+
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
6+
},
7+
"metadataBlocks": {
8+
"citation": {
9+
"fields": [
10+
{
11+
"value": "HTML & More",
12+
"typeClass": "primitive",
13+
"multiple": false,
14+
"typeName": "title"
15+
},
16+
{
17+
"value": [
18+
{
19+
"authorName": {
20+
"value": "Markup, Marty",
21+
"typeClass": "primitive",
22+
"multiple": false,
23+
"typeName": "authorName"
24+
},
25+
"authorAffiliation": {
26+
"value": "W4C",
27+
"typeClass": "primitive",
28+
"multiple": false,
29+
"typeName": "authorAffiliation"
30+
}
31+
}
32+
],
33+
"typeClass": "compound",
34+
"multiple": true,
35+
"typeName": "author"
36+
},
37+
{
38+
"value": [
39+
{
40+
"datasetContactEmail": {
41+
"typeClass": "primitive",
42+
"multiple": false,
43+
"typeName": "datasetContactEmail",
44+
"value": "[email protected]"
45+
},
46+
"datasetContactName": {
47+
"typeClass": "primitive",
48+
"multiple": false,
49+
"typeName": "datasetContactName",
50+
"value": "Markup, Marty"
51+
}
52+
}
53+
],
54+
"typeClass": "compound",
55+
"multiple": true,
56+
"typeName": "datasetContact"
57+
},
58+
{
59+
"value": [
60+
{
61+
"dsDescriptionValue": {
62+
"value": "BEGIN<br></br>END",
63+
"multiple": false,
64+
"typeClass": "primitive",
65+
"typeName": "dsDescriptionValue"
66+
},
67+
"dsDescriptionDate": {
68+
"typeName": "dsDescriptionDate",
69+
"multiple": false,
70+
"typeClass": "primitive",
71+
"value": "2021-07-13"
72+
}
73+
}
74+
],
75+
"typeClass": "compound",
76+
"multiple": true,
77+
"typeName": "dsDescription"
78+
},
79+
{
80+
"value": [
81+
"Medicine, Health and Life Sciences"
82+
],
83+
"typeClass": "controlledVocabulary",
84+
"multiple": true,
85+
"typeName": "subject"
86+
},
87+
{
88+
"typeName": "language",
89+
"multiple": true,
90+
"typeClass": "controlledVocabulary",
91+
"value": [
92+
"English",
93+
"Afar",
94+
"aar"
95+
]
96+
}
97+
],
98+
"displayName": "Citation Metadata"
99+
}
100+
}
101+
}
102+
}

src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java

-8
Original file line numberDiff line numberDiff line change
@@ -1248,14 +1248,6 @@ public List<Long> selectFilesWithMissingOriginalSizes() {
12481248
}
12491249

12501250

1251-
/**
1252-
* Check that a identifier entered by the user is unique (not currently used
1253-
* for any other study in this Dataverse Network). Also check for duplicate
1254-
* in the remote PID service if needed
1255-
* @param datafileId
1256-
* @param storageLocation
1257-
* @return {@code true} iff the global identifier is unique.
1258-
*/
12591251
public void finalizeFileDelete(Long dataFileId, String storageLocation) throws IOException {
12601252
// Verify that the DataFile no longer exists:
12611253
if (find(dataFileId) != null) {

src/main/java/edu/harvard/iq/dataverse/DataverseServiceBean.java

+22-5
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
import edu.harvard.iq.dataverse.storageuse.StorageQuota;
2323
import edu.harvard.iq.dataverse.util.StringUtil;
2424
import edu.harvard.iq.dataverse.util.SystemConfig;
25-
import edu.harvard.iq.dataverse.util.json.JsonUtil;
25+
2626
import java.io.File;
2727
import java.io.IOException;
2828
import java.sql.Timestamp;
@@ -34,6 +34,7 @@
3434
import java.util.logging.Logger;
3535
import java.util.Properties;
3636

37+
import edu.harvard.iq.dataverse.validation.JSONDataValidation;
3738
import jakarta.ejb.EJB;
3839
import jakarta.ejb.Stateless;
3940
import jakarta.inject.Inject;
@@ -888,14 +889,16 @@ public List<Object[]> getDatasetTitlesWithinDataverse(Long dataverseId) {
888889
return em.createNativeQuery(cqString).getResultList();
889890
}
890891

891-
892892
public String getCollectionDatasetSchema(String dataverseAlias) {
893+
return getCollectionDatasetSchema(dataverseAlias, null);
894+
}
895+
public String getCollectionDatasetSchema(String dataverseAlias, Map<String, Map<String,List<String>>> schemaChildMap) {
893896

894897
Dataverse testDV = this.findByAlias(dataverseAlias);
895898

896899
while (!testDV.isMetadataBlockRoot()) {
897900
if (testDV.getOwner() == null) {
898-
break; // we are at the root; which by defintion is metadata blcok root, regarldess of the value
901+
break; // we are at the root; which by definition is metadata block root, regardless of the value
899902
}
900903
testDV = testDV.getOwner();
901904
}
@@ -932,6 +935,8 @@ public String getCollectionDatasetSchema(String dataverseAlias) {
932935
dsft.setRequiredDV(dsft.isRequired());
933936
dsft.setInclude(true);
934937
}
938+
List<String> childrenRequired = new ArrayList<>();
939+
List<String> childrenAllowed = new ArrayList<>();
935940
if (dsft.isHasChildren()) {
936941
for (DatasetFieldType child : dsft.getChildDatasetFieldTypes()) {
937942
DataverseFieldTypeInputLevel dsfIlChild = dataverseFieldTypeInputLevelService.findByDataverseIdDatasetFieldTypeId(testDV.getId(), child.getId());
@@ -944,8 +949,18 @@ public String getCollectionDatasetSchema(String dataverseAlias) {
944949
child.setRequiredDV(child.isRequired() && dsft.isRequired());
945950
child.setInclude(true);
946951
}
952+
if (child.isRequired()) {
953+
childrenRequired.add(child.getName());
954+
}
955+
childrenAllowed.add(child.getName());
947956
}
948957
}
958+
if (schemaChildMap != null) {
959+
Map<String, List<String>> map = new HashMap<>();
960+
map.put("required", childrenRequired);
961+
map.put("allowed", childrenAllowed);
962+
schemaChildMap.put(dsft.getName(), map);
963+
}
949964
if(dsft.isRequiredDV()){
950965
requiredDSFT.add(dsft);
951966
}
@@ -1021,11 +1036,13 @@ private String getCustomMDBSchema (MetadataBlock mdb, List<DatasetFieldType> req
10211036
}
10221037

10231038
public String isDatasetJsonValid(String dataverseAlias, String jsonInput) {
1024-
JSONObject rawSchema = new JSONObject(new JSONTokener(getCollectionDatasetSchema(dataverseAlias)));
1039+
Map<String, Map<String,List<String>>> schemaChildMap = new HashMap<>();
1040+
JSONObject rawSchema = new JSONObject(new JSONTokener(getCollectionDatasetSchema(dataverseAlias, schemaChildMap)));
10251041

1026-
try {
1042+
try {
10271043
Schema schema = SchemaLoader.load(rawSchema);
10281044
schema.validate(new JSONObject(jsonInput)); // throws a ValidationException if this object is invalid
1045+
JSONDataValidation.validate(schema, schemaChildMap, jsonInput); // throws a ValidationException if any objects are invalid
10291046
} catch (ValidationException vx) {
10301047
logger.info(BundleUtil.getStringFromBundle("dataverses.api.validate.json.failed") + " " + vx.getErrorMessage());
10311048
String accumulatedexceptions = "";
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
package edu.harvard.iq.dataverse.api;
2+
3+
import edu.harvard.iq.dataverse.DatasetFieldServiceBean;
4+
import edu.harvard.iq.dataverse.DatasetFieldType;
5+
import jakarta.ejb.EJB;
6+
import jakarta.ws.rs.*;
7+
import jakarta.ws.rs.core.Response;
8+
9+
import java.util.List;
10+
11+
import static edu.harvard.iq.dataverse.util.json.JsonPrinter.jsonDatasetFieldTypes;
12+
13+
/**
14+
* Api bean for managing dataset fields.
15+
*/
16+
@Path("datasetfields")
17+
@Produces("application/json")
18+
public class DatasetFields extends AbstractApiBean {
19+
20+
@EJB
21+
DatasetFieldServiceBean datasetFieldService;
22+
23+
@GET
24+
@Path("facetables")
25+
public Response listAllFacetableDatasetFields() {
26+
List<DatasetFieldType> datasetFieldTypes = datasetFieldService.findAllFacetableFieldTypes();
27+
return ok(jsonDatasetFieldTypes(datasetFieldTypes));
28+
}
29+
}

src/main/java/edu/harvard/iq/dataverse/api/Datasets.java

-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
package edu.harvard.iq.dataverse.api;
22

33
import com.amazonaws.services.s3.model.PartETag;
4-
54
import edu.harvard.iq.dataverse.*;
65
import edu.harvard.iq.dataverse.DatasetLock.Reason;
76
import edu.harvard.iq.dataverse.actionlogging.ActionLogRecord;

src/main/java/edu/harvard/iq/dataverse/api/Dataverses.java

+15-8
Original file line numberDiff line numberDiff line change
@@ -855,22 +855,29 @@ public Response setMetadataRoot(@Context ContainerRequestContext crc, @PathParam
855855
/**
856856
* return list of facets for the dataverse with alias `dvIdtf`
857857
*/
858-
public Response listFacets(@Context ContainerRequestContext crc, @PathParam("identifier") String dvIdtf) {
858+
public Response listFacets(@Context ContainerRequestContext crc,
859+
@PathParam("identifier") String dvIdtf,
860+
@QueryParam("returnDetails") boolean returnDetails) {
859861
try {
860-
User u = getRequestUser(crc);
861-
DataverseRequest r = createDataverseRequest(u);
862+
User user = getRequestUser(crc);
863+
DataverseRequest request = createDataverseRequest(user);
862864
Dataverse dataverse = findDataverseOrDie(dvIdtf);
863-
JsonArrayBuilder fs = Json.createArrayBuilder();
864-
for (DataverseFacet f : execCommand(new ListFacetsCommand(r, dataverse))) {
865-
fs.add(f.getDatasetFieldType().getName());
865+
List<DataverseFacet> dataverseFacets = execCommand(new ListFacetsCommand(request, dataverse));
866+
867+
if (returnDetails) {
868+
return ok(jsonDataverseFacets(dataverseFacets));
869+
} else {
870+
JsonArrayBuilder facetsBuilder = Json.createArrayBuilder();
871+
for (DataverseFacet facet : dataverseFacets) {
872+
facetsBuilder.add(facet.getDatasetFieldType().getName());
873+
}
874+
return ok(facetsBuilder);
866875
}
867-
return ok(fs);
868876
} catch (WrappedResponse e) {
869877
return e.getResponse();
870878
}
871879
}
872880

873-
874881
@GET
875882
@AuthRequired
876883
@Path("{identifier}/featured")

0 commit comments

Comments
 (0)