Skip to content

Commit 9505141

Browse files
committed
add docs for Croissant, tweak exporter docs #10341
1 parent 77c7102 commit 9505141

File tree

6 files changed

+66
-27
lines changed

6 files changed

+66
-27
lines changed

doc/release-notes/10341-croissant.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
A new metadata export format called Croissant is now available as an external metadata exporter. When enabled it replaces the Schema.org JSON-LD format in the `<head>` of dataset landing pages. For details see admin/discoverability.html#schema-org-json-ld-croissant-metadata and #10341

doc/sphinx-guides/source/admin/discoverability.rst

+10-3
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,21 @@ The HTML source of a dataset landing page includes "DC" (Dublin Core) ``<meta>``
3030
<meta name="DC.type" content="Dataset"
3131
<meta name="DC.title" content="..."
3232

33-
Schema.org JSON-LD Metadata
34-
+++++++++++++++++++++++++++
33+
.. _schema.org-head:
3534

36-
The HTML source of a dataset landing page includes Schema.org JSON-LD metadata like this::
35+
Schema.org JSON-LD/Croissant Metadata
36+
+++++++++++++++++++++++++++++++++++++
37+
38+
The ``<head>`` of the HTML source of a dataset landing page includes Schema.org JSON-LD metadata like this::
3739

3840

3941
<script type="application/ld+json">{"@context":"http://schema.org","@type":"Dataset","@id":"https://doi.org/...
4042

43+
If you enable the Croissant metadata export format (see :ref:`external-exporters`) the ``<head>`` will show Croissant metadata instead. It looks similar, but you should see ``"cr": "http://mlcommons.org/croissant/"`` in the output.
44+
45+
For backward compatibility, if you enable Croissant, the older Schema.org JSON-LD format (``schema.org`` in the API) will still be available from both the web interface (see :ref:`metadata-export-formats`) and the API (see :ref:`export-dataset-metadata-api`).
46+
47+
The Dataverse team has been working with Google on both formats. Google has `indicated <https://github.com/mlcommons/croissant/issues/530#issuecomment-1964227662>`_ that for `Google Dataset Search <https://datasetsearch.research.google.com>`_ (the main reason we started adding this extra metadata in the ``<head>`` of dataset pages), Croissant is the successor to the older format.
4148

4249
.. _discovery-sign-posting:
4350

doc/sphinx-guides/source/api/native-api.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -1150,16 +1150,19 @@ The fully expanded example above (without environment variables) looks like this
11501150
11511151
.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Descriptive names can be found under :ref:`metadata-export-formats` in the User Guide.
11521152

1153+
.. note:: Additional exporters can be enabled, as described under :ref:`external-exporters` in the Installation Guide. To discover the machine-readable name of each exporter (e.g. ``ddi``), check :ref:`inventory-of-external-exporters` or ``getFormatName`` in the exporter's source code.
11531154

11541155
Schema.org JSON-LD
11551156
^^^^^^^^^^^^^^^^^^
11561157

1157-
Please note that the ``schema.org`` format has changed in backwards-incompatible ways after Dataverse Software version 4.9.4:
1158+
Please note that the ``schema.org`` format has changed in backwards-incompatible ways after Dataverse 4.9.4:
11581159

11591160
- "description" was a single string and now it is an array of strings.
11601161
- "citation" was an array of strings and now it is an array of objects.
11611162

1162-
Both forms are valid according to Google's Structured Data Testing Tool at https://search.google.com/structured-data/testing-tool . (This tool will report "The property affiliation is not recognized by Google for an object of type Thing" and this known issue is being tracked at https://github.com/IQSS/dataverse/issues/5029 .) Schema.org JSON-LD is an evolving standard that permits a great deal of flexibility. For example, https://schema.org/docs/gs.html#schemaorg_expected indicates that even when objects are expected, it's ok to just use text. As with all metadata export formats, we will try to keep the Schema.org JSON-LD format your Dataverse installation emits backward-compatible to made integrations more stable, despite the flexibility that's afforded by the standard.
1163+
Both forms are valid according to Google's Structured Data Testing Tool at https://search.google.com/structured-data/testing-tool . Schema.org JSON-LD is an evolving standard that permits a great deal of flexibility. For example, https://schema.org/docs/gs.html#schemaorg_expected indicates that even when objects are expected, it's ok to just use text. As with all metadata export formats, we will try to keep the Schema.org JSON-LD format your Dataverse installation emits backward-compatible to made integrations more stable, despite the flexibility that's afforded by the standard.
1164+
1165+
The standard has further evolved into a format called Croissant. For details, see :ref:`schema.org-head` in the Admin Guide.
11631166

11641167
List Files in a Dataset
11651168
~~~~~~~~~~~~~~~~~~~~~~~

doc/sphinx-guides/source/installation/advanced.rst

+37-18
Original file line numberDiff line numberDiff line change
@@ -119,27 +119,46 @@ To activate in your Dataverse installation::
119119

120120
.. _external-exporters:
121121

122-
Installing External Metadata Exporters
123-
++++++++++++++++++++++++++++++++++++++
122+
External Metadata Exporters
123+
+++++++++++++++++++++++++++
124124

125-
As of Dataverse Software 5.14 Dataverse supports the use of external Exporters as a way to add additional metadata
126-
export formats to Dataverse or replace the built-in formats. This should be considered an **experimental** capability
127-
in that the mechanism is expected to evolve and using it may require additional effort when upgrading to new Dataverse
128-
versions.
125+
Dataverse 5.14+ supports the configuration of external metadata exporters (just "external exporters" or "exporters" for short) as a way to add additional metadata export formats or replace built-in formats. For a list of built-in formats, see :ref:`metadata-export-formats` in the User Guide.
129126

130-
This capability is enabled by specifying a directory in which Dataverse should look for third-party Exporters. See
131-
:ref:`dataverse.spi.exporters.directory`.
127+
This should be considered an **experimental** capability in that the mechanism is expected to evolve and using it may require additional effort when upgrading to new Dataverse versions.
132128

133-
See :doc:`/developers/metadataexport` for details about how to develop new Exporters.
129+
Enabling External Exporters
130+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
134131

135-
An minimal example Exporter is available at https://github.com/gdcc/dataverse-exporters. The community is encourage to
136-
add additional exporters (and/or links to exporters elsewhere) in this repository. Once you have downloaded the
137-
dataverse-spi-export-examples-1.0.0.jar (or other exporter jar), installed it in the directory specified above, and
138-
restarted your Payara server, the new exporter should be available.
132+
Use the :ref:`dataverse.spi.exporters.directory` configuration option to specify a directory from which external exporters (JAR files) should be loaded.
139133

140-
The example dataverse-spi-export-examples-1.0.0.jar replaces the ``JSON`` export with a ``MyJSON in <locale>`` version
141-
that just wraps the existing JSON export object in a new JSON object with the key ``inputJson`` containing the original
142-
JSON.(Note that the ``MyJSON in <locale>`` label will appear in the dataset Metadata Export download menu immediately,
143-
but the content for already published datasets will only be updated after you delete the cached exports and/or use a
144-
reExport API call (see :ref:`batch-exports-through-the-api`).)
134+
.. _inventory-of-external-exporters:
145135

136+
Inventory of External Exporters
137+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138+
139+
Known external exporters are listed below but development takes place at https://github.com/gdcc/dataverse-exporters and you are encouraged to check there for new exporters or contribute one!
140+
141+
In the list below, the name of each exporter is followed by the machine-readable name in parentheses for use in APIs (see :ref:`export-dataset-metadata-api` in the API Guide).
142+
143+
Croissant (``croissant``)
144+
~~~~~~~~~~~~~~~~~~~~~~~~~
145+
146+
`Croissant <https://github.com/mlcommons/croissant>`_ is oriented toward machine learning and exposes variable-level metadata. When enabled, it replaces the Schema.org JSON-LD shown in the ``<head>`` of a dataset page, as described under :ref:`schema.org-head` in the Admin Guide.
147+
148+
You can download the Croissant exporter JAR from FIXME.
149+
150+
The source can be found in the `"croissant" <https://github.com/gdcc/dataverse-exporters/tree/main/croissant>`_ directory of the exporters repo.
151+
152+
MyJSON (``dataverse_json``)
153+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
154+
155+
MyJSON is a minimal example exporter that demonstrates how to override a built-in metadata format. Specifically, it replaces the ``dataverse_json`` format (Dataverse's native JSON format), shown as "JSON" in the GUI with a "MyJSON in <locale>" version that just wraps the existing JSON export object in a new JSON object with the key ``inputJson`` containing the original JSON.
156+
157+
You can download the MyJSON exporter JAR from https://github.com/gdcc/dataverse-exporters where you should look under "prebuilt-examples" for a file called something like dataverse-spi-export-examples-x.x.x.jar.
158+
159+
The source can be found in the `"dataverse-spi-export-examples" <https://github.com/gdcc/dataverse-exporters/tree/main/dataverse-spi-export-examples>`_ directory of the exporters repo.
160+
161+
Developing New Exporters
162+
^^^^^^^^^^^^^^^^^^^^^^^^
163+
164+
See :doc:`/developers/metadataexport` for details about how to develop new exporters.

doc/sphinx-guides/source/installation/config.rst

+10-3
Original file line numberDiff line numberDiff line change
@@ -3109,12 +3109,19 @@ Can also be set via any `supported MicroProfile Config API source`_, e.g. the en
31093109
dataverse.spi.exporters.directory
31103110
+++++++++++++++++++++++++++++++++
31113111

3112-
This JVM option is used to configure the file system path where external Exporter JARs can be placed. See :ref:`external-exporters` for more information.
3112+
For some background, see :ref:`external-exporters` and :ref:`inventory-of-external-exporters`.
3113+
3114+
This JVM option is used to configure the file system path where external exporter JARs should be loaded from.
31133115

31143116
``./asadmin create-jvm-options '-Ddataverse.spi.exporters.directory=PATH_LOCATION_HERE'``
31153117

3116-
If this value is set, Dataverse will examine all JARs in the specified directory and will use them to add, or replace existing, metadata export formats.
3117-
If this value is not set (the default), Dataverse will not use external Exporters.
3118+
If this value is set, Dataverse will examine all JARs in the specified directory and will use them to add new metadata export formats or (if the machine-readable name used in :ref:`export-dataset-metadata-api` is the same) replace built-in metatadata export formats.
3119+
3120+
If this value is not set (the default), Dataverse will load any external exporters.
3121+
3122+
If you place a new JAR in this directory, you must restart Payara for Dataverse to load it.
3123+
3124+
If the JAR is for an exporter that replaces built-in format, you must delete the cached exports and/or use a reExport API call (see :ref:`batch-exports-through-the-api`) for the new format to be visible for existing datasets.
31183125

31193126
Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_SPI_EXPORTERS_DIRECTORY``.
31203127

doc/sphinx-guides/source/user/dataset-management.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ For more details about what Citation and Domain Specific Metadata is supported p
2525
Supported Metadata Export Formats
2626
---------------------------------
2727

28-
Once a dataset has been published, its metadata can be exported in a variety of other metadata standards and formats, which help make datasets more discoverable and usable in other systems, such as other data repositories. On each dataset page's metadata tab, the following exports are available:
28+
Once a dataset has been published, its metadata can be exported in a variety of other metadata standards and formats, which help make datasets more :doc:`discoverable </admin/discoverability>` and usable in other systems, such as other data repositories. On each dataset page's metadata tab, the following exports are available:
2929

3030
- Dublin Core
3131
- DDI (Data Documentation Initiative Codebook 2.5)
@@ -36,6 +36,8 @@ Once a dataset has been published, its metadata can be exported in a variety of
3636
- OpenAIRE
3737
- Schema.org JSON-LD
3838

39+
Additional formats can be enabled. See :ref:`inventory-of-external-exporters` in the Installation Guide.
40+
3941
Each of these metadata exports contains the metadata of the most recently published version of the dataset.
4042

4143
.. _adding-new-dataset:

0 commit comments

Comments
 (0)