Skip to content

Add 3rd minimal whoosh index for performance #1877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2025

Conversation

UlrichB22
Copy link
Collaborator

@UlrichB22 UlrichB22 commented Mar 21, 2025

Related to #1725.

After updating to this version the indexes need to be dropped and rebuild:

moin index-destroy
moin index-create
moin index-build

This will take some time for large wikis.

The index-build subcommand will create a new third index called LATEST_META. This index will be much smaller than LATEST_REVS and will only contain the usual metadata fields, but no content or content ngrams.

Many queries to check the existence of an item or to check ACL rights only require a few metadata fields. Queries against this small index are very fast and improve response times for large wikis.

The new index is called LATEST_META and the parameter to use the new index in various methods or functions is called “short”.

If you revert to a version prior to this change, you should delete and recreate the indexes. Advanced users can remove the new index files (latest_meta) in the wiki/index directory instead.

@UlrichB22 UlrichB22 marked this pull request as draft March 21, 2025 19:24
@RogerHaase
Copy link
Member

A guess is you thought about removing content from the latest_revs indexes. Is it easier to create a 3rd index or did you do some benchmarking?

@UlrichB22
Copy link
Collaborator Author

A guess is you thought about removing content from the latest_revs indexes. Is it easier to create a 3rd index or did you do some benchmarking?

Some time ago I tested removing the content, this will impact searching and search results. IMO the different NGRAMs have the biggest impact on the index size. This solution with a third index should not change the search functionality.

@UlrichB22 UlrichB22 changed the title DRAFT: Add 3rd minimal whoosh index for performance Add 3rd minimal whoosh index for performance Apr 25, 2025
@UlrichB22 UlrichB22 marked this pull request as ready for review April 25, 2025 08:31
@UlrichB22
Copy link
Collaborator Author

With the latest commit the recommended changes have been applied.
The new index is now called 'latest_meta'.

Running 'moin run' without recreating indexes will result in the following error message:

Error: Wiki index 'latest_meta' missing. Please see https://github.com/moinwiki/moin/pull/1877

@UlrichB22
Copy link
Collaborator Author

@ThomasWaldmann, @RogerHaase, may I ask you to review this PR. IMO this change is urgently needed for large wikis (e.g. python.org) to get reasonable response times. Thank you.

@RogerHaase
Copy link
Member

RogerHaase commented May 13, 2025

Sorry for the delay. Busy with other things, no time for moin.

For my wiki on windows with 900+ items, response times dropped from about 5 seconds to about 2 seconds for the +index view. Very nice fix.

@RogerHaase RogerHaase merged commit 39a8217 into moinwiki:master May 13, 2025
8 checks passed
@UlrichB22 UlrichB22 deleted the add_whoosh_idx branch May 19, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants