-
Notifications
You must be signed in to change notification settings - Fork 755
Issue #1080: Replace the original usage of string.GetHashCode() with … #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… with hash value calculation using the MurmurHash algorithm.
/// </summary> | ||
/// <param name="name">input</param> | ||
/// <returns>hash value</returns> | ||
private static int GetMurmurHash(string name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we could move this into a util class, like Strings
, so it can be reused more easily in future.
/// </summary> | ||
/// <param name="name">input</param> | ||
/// <returns>hash value</returns> | ||
private static int GetMurmurHash(string name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation could be wrapped in unchecked { ... }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, I have modified the code. 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can still throw if someone compiles in checked mode. The casts between int/uint need to be unchecked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your reply, The result of hash % 55536 ranges from 0 to 55,535, which is a safe range since it does not exceed the positive integer limit. Therefore, (int)(hash % 55536) will not cause an overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are still two places that would fail checked arithmetic.
hash ^= (uint)(name[currentIndex + 2] << 16);
If the char has the high bit set, the result is a negative integer. Casting to uint
will throw I think.
Similarly:
uint k = (uint)(name[currentIndex] |
(name[currentIndex + 1] << 8) |
(name[currentIndex + 2] << 16) |
(name[currentIndex + 3] << 24));
Here's an example:
The code seems to assume that char is int8
but they're actually int16
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice, I modify the code.
Murmur hash seems intended for use with bytes, but we have characters. How about using FNV hash instead, with something like: public static int StableStringHash(string text)
{
unchecked
{
const uint fnvOffset = 2166136261;
const uint fnvPrime = 16777619;
uint hash = fnvOffset;
foreach (char c in text)
{
hash ^= (byte)(c & 0xFF);
hash *= fnvPrime;
hash ^= (byte)(c >> 8);
hash *= fnvPrime;
}
return (int)hash;
}
} Again, I think this should be in a utility class, like |
This change means that applications using the newer code won't be able to connect with apps using the older code, even if both are on .NET Framework. I think the best approach would therefore actually be to just use the same implementation of string hashing that exists in .NET Framework here. |
Please take a look at: #1122 |
We want the compatibility offered by #1122, so I merged that approach. It also addresses the other feedback here ( Thanks for getting things moving here. |
Issue #1080: Replace the original usage of string.GetHashCode() with hash value calculation using the MurmurHash algorithm.