-
Notifications
You must be signed in to change notification settings - Fork 27
Submodule xxHash #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Submodule xxHash #187
Conversation
git submodule points to xxHash commit f2c52f1236a50d754b07f584ce4592de1df8c0f7, I assume that's the version that was added to eckit source tree at the time? The commit does not seem to point to specific tag and is ~5 years old, I think we can also consider updating to latest tag v0.8.3 at some point. |
Private downstream CI failed. |
Yes indeed, I've deliberately pointed to the same commit. The reason is hashing is performance sensitive, and a change to the xxHash version should track any possible change in this regard (that I wasnted to conflate the discussion topic here). |
Private downstream CI failed. |
If the upgrade is so sensitive to performance, would it make sense to also include a microbenchmark that test hashing performance as as sort of "in case we want to upgrade run this back-to-back" style test/script? I would volunteer to add some code for that based on google benchmark if you are interested, in another PR of course. |
@pmaciel do you know if this would require any changes to ecbuild and how it clones / updates repositories? |
Private downstream CI failed. |
Private downstream CI failed. |
… C++ 17, removed in C++ 20)
Private downstream CI failed. |
Private downstream CI failed. |
I like this change a lot and think that moving to git submodules is a step in the right direction. |
It shouldn't affect ecbuild if we transition to refer to repositories using submodules. To ease transition for ecbuild-based bundles, one could add the --recurse-submodules at the ecbuild's git clone call, yes 👍 |
Sorry I missed your comment - you're right, we don't have performance testing, and you're righ again, it should be done separately and with broader concern than just this core functionality. I wholeheartedly recommend we start an initiative for this, I have a suite that was designed for this, using pytest-benchmark, but the use case was very narrow (mir only). I'm not sure how performance-sensitive is the stack, but the hashing function is a small, highly reused function right at the centre (hence here not updating the xxHash version itself). Honestly, I cannot see it having a very big change because the optimisations are largely present, just not recently up-to-date. |
This is indeed an interesting way to go! Good catch regarding ecbuild/ecbundle. Also any of our GitHub Actions that clone repos would need to be modified. I'm just trying to collect all the places where we would need to make changes in order to accommodate this. |
This is for discussion only, for the moment.
This PR suggests the replacement of the copy of an external library
src/eckit/contrib/xxhash
for the git submodule
third_party/xxHash
You will need to clone with --recurse-submodules (The README.md is changed to reflect this).
Git submodules are a well-established practice and the way of collaboration on complex projects. It allows seamless integration with external repositories, while maintaining the code bases separate. The point is to include third-party software and external contributions not as a copy, but pointing to existing external repositories, so that it is much easier to track changes (eg. the current version of xxHash we include is 2020's, and a specific commit, not an officialy released version.) This PR actually points to the specific commit already in the develop branch so there's no code/functionality change.
Open issues raised here are:
This comes in advance of possible inclusion of libfmt as its codebase is quite large, and we're even considering including the library partially (so eg. avoiding testing and documentation, both of which are of value.). I would also recommend submodules for this (and any other) future third-party package.