Skip to content

Issue #1080: Replace the original usage of string.GetHashCode() with … #1121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

billchungiii
Copy link

Issue #1080: Replace the original usage of string.GetHashCode() with hash value calculation using the MurmurHash algorithm.

… with hash value calculation using the MurmurHash algorithm.
/// </summary>
/// <param name="name">input</param>
/// <returns>hash value</returns>
private static int GetMurmurHash(string name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could move this into a util class, like Strings, so it can be reused more easily in future.

/// </summary>
/// <param name="name">input</param>
/// <returns>hash value</returns>
private static int GetMurmurHash(string name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation could be wrapped in unchecked { ... }.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion, I have modified the code. 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can still throw if someone compiles in checked mode. The casts between int/uint need to be unchecked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your reply, The result of hash % 55536 ranges from 0 to 55,535, which is a safe range since it does not exceed the positive integer limit. Therefore, (int)(hash % 55536) will not cause an overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there are still two places that would fail checked arithmetic.

hash ^= (uint)(name[currentIndex + 2] << 16);

If the char has the high bit set, the result is a negative integer. Casting to uint will throw I think.

Similarly:

                uint k = (uint)(name[currentIndex] |
                                (name[currentIndex + 1] << 8) |
                                (name[currentIndex + 2] << 16) |
                                (name[currentIndex + 3] << 24));

Here's an example:

https://sharplab.io/#v2:EYLgHgbALANALiATgVwHYB8DGALAppga1wBMBYAKAG8KACOgAQEYA6JgTgAoPkBLVOAJQcARIB4NwCD7wgNoAGALo0APIpqMIAgQG4KAXyA=

The code seems to assume that char is int8 but they're actually int16.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, I modify the code.

@drewnoakes
Copy link
Member

Murmur hash seems intended for use with bytes, but we have characters. How about using FNV hash instead, with something like:

public static int StableStringHash(string text)
{
    unchecked
    {
        const uint fnvOffset = 2166136261;
        const uint fnvPrime = 16777619;

        uint hash = fnvOffset;
        foreach (char c in text)
        {
            hash ^= (byte)(c & 0xFF);
            hash *= fnvPrime;
            hash ^= (byte)(c >> 8);
            hash *= fnvPrime;
        }

        return (int)hash;
    }
}

Again, I think this should be in a utility class, like Strings.

@drewnoakes
Copy link
Member

This change means that applications using the newer code won't be able to connect with apps using the older code, even if both are on .NET Framework.

I think the best approach would therefore actually be to just use the same implementation of string hashing that exists in .NET Framework here.

@drewnoakes
Copy link
Member

Please take a look at: #1122

@drewnoakes
Copy link
Member

We want the compatibility offered by #1122, so I merged that approach. It also addresses the other feedback here (checked block and moved to utility class).

Thanks for getting things moving here.

@drewnoakes drewnoakes closed this May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants