Replies: 16 comments 2 replies
-
The string hash doesn’t need to be cryptographically secure, doesn’t it? We use hash randomization with a cryptographically secure seed. |
Beta Was this translation helpful? Give feedback.
-
My understanding (not an expert) is that there is cryptographically secure (e.g. SHA-512 or whatever), and that's definitely irrelevant to this use case. But then there's denial-of-service proof, which is a much lower bar but still requires some of the same mathematical/cryptographic analysis to prove? And you do want DoS-proof. And that analysis has been done for SIP 2-4, which is why it's everyone's current default, but not for |
Beta Was this translation helpful? Give feedback.
-
So what I meant to say in initial issue was not so much " |
Beta Was this translation helpful? Give feedback.
-
How would an attacker DoS me with an "insecure" hash, given that hash
randomization's seed is secure?
…On Mon, Sep 27, 2021 at 8:25 AM Itamar Turner-Trauring < ***@***.***> wrote:
So what I meant to say in initial issue was not so much "ahash is
cryptographically insecure" (since that's fine) but rather "ahash needs
more analysis to prove it's DoS-proof".
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#88 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWCWMVRGR7WP3NKSH763Y3UECEFDANCNFSM5E2NXBDQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
--Guido van Rossum (python.org/~guido)
|
Beta Was this translation helpful? Give feedback.
-
https://www.python.org/dev/peps/pep-0456/ has a bunch of details; just learned from reading it that for a bad hash, the seed can be recovered by an attacker. |
Beta Was this translation helpful? Give feedback.
-
In that case, we should probably table this improvement until the security folks have had time to look at this. |
Beta Was this translation helpful? Give feedback.
-
String hashing is done once per string, when strings are created. It doesn't impact Reducing the time to hash strings will improve startup times, though. That is when much hashing occurs. |
Beta Was this translation helpful? Give feedback.
-
Correction: it's done lazily. So typically on the first time the string is looked up in any dict (including interning). A value of -1 indicates that the hash hasn't been computed yet. I wouldn't want to change this, since most strings aren't used to look anything up in a dictionary (e.g. when reading lines from a file). |
Beta Was this translation helpful? Give feedback.
-
I may be wrong, but the My impression is that xxh3 (the third iteration of xxHash) is both widely used and extremely fast: https://cyan4973.github.io/xxHash/ Note: xxHash APIs accept an optional seed, making it keyable. I have no idea whether it's DoS-resistant. |
Beta Was this translation helpful? Give feedback.
-
That might be a good solution too, yes, the main point is that it is likely possible to switch to a faster hash than currently used one while still preserving anti-DoS guarantees. |
Beta Was this translation helpful? Give feedback.
-
On Tue, Sep 28, 2021 at 1:02 AM Guido van Rossum ***@***.***> wrote:
How would an attacker DoS me with an "insecure" hash, given that hash
randomization's seed is secure?
Attackers may guess the key by sending many hashes and measuring timings.
So having seed is not enough for anti-HashDoS.
FYI, I had made an issue to change the hash algorithm from SipHash-2-4
to SipHash-1-3, as Ruby and Rust had moved already.
https://bugs.python.org/issue29410
Microbemchmark result:
https://gist.github.com/methane/33c7b01c45ce23b67246f5ddaff9c9e7
Regards,
…--
Inada Naoki ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Please be careful when comparing hashing performance benchmark. Some benchmarks show off with impressive numbers like X GB/sec throughput. Dictionary keys are usually short. AFAIK majority of dict key strings are in the range of 5 to 25 characters. Good hashing algorithms perform extra steps to finalize a hash in order to archive good diffusion. The finalization step is a fixed cost and impact short strings. SipHash-1-3 is already faster for shorter strings than SipHash-2-4. It performs one finalization round less. |
Beta Was this translation helpful? Give feedback.
-
In my experience, when hashing 64-bit integers (equivalent to <8 char ASCII strings) |
Beta Was this translation helpful? Give feedback.
-
I think we should wait until |
Beta Was this translation helpful? Give feedback.
-
A while ago I have created a test branch with xxhash3, python/cpython@main...tiran:test_xxhash3 . I selected xxhash3 because it was the least effort to get it working. Could somebody run benchmarks, please? I would like to see if a different hashing algorithms makes a difference at all. I don't have the benchmark tool installed and configured. |
Beta Was this translation helpful? Give feedback.
-
@ericsnowcurrently Could you do this some time next week? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Given pervasive use of dicts, Python's hash function gets used a lot. There are faster alternatives than SIP 2-4, e.g.
ahash
: https://github.com/tkaitchuck/aHash#comparison-with-other-hashersahash
is still in need of cryptographic analysis to prove it has same protection as SIP 2-4, but (from using it in Rust) switching definitely has a meaningful performance improvement on hashmap lookups.Beta Was this translation helpful? Give feedback.
All reactions