Skip to content

Performance of blake2 #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tailhook opened this issue Dec 22, 2016 · 5 comments
Closed

Performance of blake2 #5

tailhook opened this issue Dec 22, 2016 · 5 comments

Comments

@tailhook
Copy link

Hi,

It looks like performance of blake2 compared to sha512 is very similar (while still faster):

     Running target/release/deps/blake2b-2fc6bb02e2d63899

running 6 tests
test bench_16  ... bench:          45 ns/iter (+/- 6) = 355 MB/s
test bench_1k  ... bench:       2,636 ns/iter (+/- 3) = 388 MB/s
test bench_256 ... bench:         685 ns/iter (+/- 31) = 373 MB/s
test bench_64  ... bench:         170 ns/iter (+/- 9) = 376 MB/s
test bench_64k ... bench:     174,027 ns/iter (+/- 3,512) = 376 MB/s
test bench_8k  ... bench:      21,669 ns/iter (+/- 951) = 378 MB/s

running 6 tests
test bench_16  ... bench:          60 ns/iter (+/- 1) = 266 MB/s
test bench_1k  ... bench:       3,092 ns/iter (+/- 101) = 331 MB/s
test bench_256 ... bench:         779 ns/iter (+/- 15) = 328 MB/s
test bench_64  ... bench:         204 ns/iter (+/- 9) = 313 MB/s
test bench_64k ... bench:     197,618 ns/iter (+/- 8,180) = 331 MB/s
test bench_8k  ... bench:      24,689 ns/iter (+/- 896) = 331 MB/s

But https://blake2.net/ shows that performance should be about 3x faster.

Any ideas? It looks like blake2 does not use SIMD? Is it not needed? Is there a chance that blake2 crate here will be optimized better in future? Or it's just because recent processors already execute sha512 much faster?

@tailhook
Copy link
Author

Well, here are the results of a benchmark for the code extracted from rust compiler:

test bench_16  ... bench:         475 ns/iter (+/- 9) = 33 MB/s
test bench_1k  ... bench:       2,177 ns/iter (+/- 141) = 470 MB/s
test bench_256 ... bench:         536 ns/iter (+/- 10) = 477 MB/s
test bench_64k ... bench:     131,659 ns/iter (+/- 3,418) = 497 MB/s

The 16 bytes case is probably slow because I'm not skipping some setup cost (you can observe benchmarks at the bottom of the gist linked above). Otherwise, it looks like rust has much better implementation than this crate (unless I'm testing it wrong).

@newpavlov
Copy link
Member

newpavlov commented Dec 23, 2016

Thank you for reporting this!

The original code from rust-crypto for blake2 is somewhat messy so I planned to work on it later, performance issues will be an additional incentive to do it. I will probably start working on it in January, but if you would like to try to improve this crate before that you are welcome!

Regarding SIMD, none of the crates currently use explicit SIMD instructions, they only use fake-simd crate to help LLVM optimiser and to prepare ground for future SIMD stabilization. I've tried to use simd-alt (fork of simd with minor API changes) and it gave a measurable improvement, but because it's a nigthly only I've decided not to use it for now.

@tailhook
Copy link
Author

tailhook commented Dec 23, 2016

Well, I don't have time to invest into it right now. I'm just trying to choose good hasher that will serve me in the long term. So I'm going to use blake2 and hopefully will benefit from it even more in future.

Thank you for work on this library!

@newpavlov
Copy link
Member

With a delay I've started rework of blake2 crate, you can see it in #17. For blake2b I am currently getting up to 720 MB/s on my machine and 470 MB/s for blake2s. I think several parts of the code can be optimized even further.

@newpavlov
Copy link
Member

I think we can close this issue now. It's definitely should be possible to optimize code even further, but I don't see easy ways to do it for current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants