You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
avoid expensive Solr join for public dvObjects in search (experimental) (#10555)
* avoid expensive Solr join when guest users search (affect IP Groups) #10554
* fix copy/past error, target doc for file, not dataset #10554
* Checking a few experimental changes into the branch:
Jim's soft commit fixes from 10547;
A quick experiment, replacing join on public objects with a boolean publicObject_b:true
for logged-in users as well (with a join added for just for their own personal documents;
groups are ignored for now). #10554
* Step 3, of the performance improvement effort relying on a boolean "publicObject" flag for
published documents - now for logged-in users, AND with support for groups.
Group support experimental, but appears to be working. #10554
* Modified the implementation for the guest user, to support ip groups. #10554
* Removed the few autocommit-related changes previously borrowed from 10547, to keep things separate and clear, for testing etc. #10554
* Reorganized the optimized code in SearchServiceBean; combined the code block
for the guest and authenticated users. #10554
* updated the release note. #10554
* Removed the warning from the ip groups guide about the effect of the new
search optimization feture that was no longer true. #10554
* Updated the section of the guide describing the new Solr optimization
feature flags. #10554
* Updated the performance section of the guide. #10554
* Modified IndexServiceBean to use the new feature flag, that has been separated from the flag that
enables the search-side optimization;
Fixed the groups sub-query for the guest user. #10554
* cosmetic #10554
* doc tweaks #10554
* no-op code cleanup, correct case of publicObject_b #10554
---------
Co-authored-by: Leonid Andreev <[email protected]>
Two experimental features flag called "add-publicobject-solr-field" and "avoid-expensive-solr-join" have been added to change how Solr documents are indexed for public objects and how Solr queries are constructed to accommodate access to restricted content (drafts, etc.). It is hoped that it will help with performance, especially on large instances and under load.
2
+
3
+
Before the search feature flag ("avoid-expensive...") can be turned on, the indexing flag must be enabled, and a full reindex performed. Otherwise publicly available objects are NOT going to be shown in search results.
4
+
5
+
For details see https://dataverse-guide--10555.org.readthedocs.build/en/10555/installation/config.html#feature-flags and #10555.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/developers/performance.rst
+4
Original file line number
Diff line number
Diff line change
@@ -118,6 +118,10 @@ Solr
118
118
119
119
While in the past Solr performance hasn't been much of a concern, in recent years we've noticed performance problems when Harvard Dataverse is under load. Improvements were made in `PR #10050 <https://github.com/IQSS/dataverse/pull/10050>`_, for example.
120
120
121
+
We are tracking performance problems in `#10469 <https://github.com/IQSS/dataverse/issues/10469>`_.
122
+
123
+
In a meeting with a Solr expert on 2024-05-10 we were advised to avoid joins as much as possible. (It was acknowledged that many Solr users make use of joins because they have to, like we do, to keep some documents private.) Toward that end we have added two feature flags called ``avoid-expensive-solr-join`` and ``add-publicobject-solr-field`` as explained under :ref:`feature-flags`. It was confirmed experimentally that performing the join on all the public objects (published collections, datasets and files), i.e., the bulk of the content in the search index, was indeed very expensive, especially on a large instance the size of the IQSS prod. archive, especially under indexing load. We confirmed that it was in fact unnecessary and were able to replace it with a boolean field directly in the indexed documents, which is achieved by the two feature flags above. However, as of writing this, this mechanism should still be considered experimental.
Copy file name to clipboardExpand all lines: doc/sphinx-guides/source/installation/config.rst
+6
Original file line number
Diff line number
Diff line change
@@ -3268,6 +3268,12 @@ please find all known feature flags below. Any of these flags can be activated u
3268
3268
* - api-session-auth
3269
3269
- Enables API authentication via session cookie (JSESSIONID). **Caution: Enabling this feature flag exposes the installation to CSRF risks!** We expect this feature flag to be temporary (only used by frontend developers, see `#9063 <https://github.com/IQSS/dataverse/issues/9063>`_) and for the feature to be removed in the future.
3270
3270
- ``Off``
3271
+
* - avoid-expensive-solr-join
3272
+
- Changes the way Solr queries are constructed for public content (published Collections, Datasets and Files). It removes a very expensive Solr join on all such documents, improving overall performance, especially for large instances under heavy load. Before this feature flag is enabled, the corresponding indexing feature (see next feature flag) must be turned on and a full reindex performed (otherwise public objects are not going to be shown in search results). See :doc:`/admin/solr-search-index`.
3273
+
- ``Off``
3274
+
* - add-publicobject-solr-field
3275
+
- Adds an extra boolean field `PublicObject_b:true` for public content (published Collections, Datasets and Files). Once reindexed with these fields, we can rely on it to remove a very expensive Solr join on all such documents in Solr queries, significantly improving overall performance (by enabling the feature flag above, `avoid-expensive-solr-join`). These two flags are separate so that an instance can reindex their holdings before enabling the optimization in searches, thus avoiding having their public objects temporarily disappear from search results while the reindexing is in progress.
3276
+
- ``Off``
3271
3277
3272
3278
**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
3273
3279
``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development </container/dev-usage>`, you can set them in the `docker compose <https://docs.docker.com/compose/environment-variables/set-environment-variables/>`_ file.
0 commit comments