Skip to content

use SIMD.jl for x86 and naive_findmin for :aarch64 #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

m-fila
Copy link
Member

@m-fila m-fila commented May 14, 2025

Just a test if I understand how to replace fast_findmin implementation as in #83

@Moelf
Copy link
Member

Moelf commented May 14, 2025

after fixing that, on Apple M4, I get:

julia +1.12 --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32


# main
Processed 100 events 32 times
Average time per event 135.5111340625 ± 2.9835167009602035 μs
Lowest time per event 131.43084000000002 μs

# this PR
Processed 100 events 32 times
Average time per event 138.62045531249998 ± 3.5853189437051123 μs
Lowest time per event 134.35666 μs

# main with 1.12-beta3
Processed 100 events 32 times
Average time per event 147.96477937499995 ± 4.669684823430356 μs
Lowest time per event 139.74125 μs

# this PR with 1.12-beta3
Processed 100 events 32 times
Average time per event 152.88778656250003 ± 5.847938860407047 μs
Lowest time per event 142.36916000000002 μs

this looks not bad??!? @graeme-a-stewart Although, there's an unfortuante degredation on 1.12

Co-authored-by: Jerry Ling <[email protected]>
Copy link

codecov bot commented May 14, 2025

Codecov Report

Attention: Patch coverage is 97.05882% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.48%. Comparing base (0f47837) to head (dd97a88).

Files with missing lines Patch % Lines
src/Utils.jl 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #151      +/-   ##
==========================================
+ Coverage   75.07%   75.48%   +0.40%     
==========================================
  Files          19       19              
  Lines        1252     1277      +25     
==========================================
+ Hits          940      964      +24     
- Misses        312      313       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@m-fila
Copy link
Member Author

m-fila commented May 14, 2025

Nvidia Grace

julia +1.12 --project instrumented-jetreco.jl --algorithm=AntiKt -R 0.4 ../test/data/events.pp13TeV.hepmc3.gz -m 32

# main
Processed 100 events 32 times
Average time per event 299.6035559375 ± 7.914782725959611 μs
Lowest time per event 280.66586 μs

# this PR
Processed 100 events 32 times
Average time per event 303.1709484375001 ± 8.819781288405308 μs
Lowest time per event 286.57968 μs

# main with 1.12-beta3
Processed 100 events 32 times
Average time per event 304.46982093750006 ± 13.922266876931305 μs
Lowest time per event 281.39805 μs

# this PR with 1.12-beta3
Processed 100 events 32 times
Average time per event 308.6548684375 ± 11.580493168517764 μs
Lowest time per event 288.65558 μs

AMD Ryzen 7 5700G

# main
Processed 100 events 32 times
Average time per event 188.28086562500002 ± 5.529621156269813 μs
Lowest time per event 181.3509 μs

# this PR
Processed 100 events 32 times
Average time per event 197.2376540625 ± 6.685658460456483 μs
Lowest time per event 186.34546999999998 μs

# main with 1.12-beta3
Processed 100 events 32 times
Average time per event 193.79122999999998 ± 11.42792473599183 μs
Lowest time per event 176.08296 μs

# this PR with 1.12-beta3
Processed 100 events 32 times
Average time per event 205.20970093750003 ± 9.736581454767695 μs
Lowest time per event 187.33396000000002 μs

@Moelf
Copy link
Member

Moelf commented May 14, 2025

AMD Ryzen 7 5700G

yeah ok so we use the SIMD.jl for x86 and this PR for ARM?

@Moelf Moelf changed the title use naive_findmin use SIMD.jl for x86 and naive_findmin for :aarch64 May 14, 2025
@Moelf Moelf marked this pull request as ready for review May 14, 2025 19:12
@Moelf
Copy link
Member

Moelf commented May 14, 2025

what happened to CI lol...

@m-fila
Copy link
Member Author

m-fila commented May 14, 2025

For completeness, with SIMD for x86_64

AMD Ryzen 7 5700G

# main
Processed 100 events 32 times
Average time per event 187.38033500000006 ± 5.74337091484025 μs
Lowest time per event 177.03909 μs

# this PR
Processed 100 events 32 times
Average time per event 191.52783343750005 ± 6.355636113548961 μs
Lowest time per event 179.90162 μs

# main with 1.12-beta3
Processed 100 events 32 times
Average time per event 195.5927890625 ± 12.11654841840912 μs
Lowest time per event 178.41858 μs

# this PR with 1.12-beta3
Processed 100 events 32 times
Average time per event 194.36534874999998 ± 10.480610090304218 μs
Lowest time per event 176.65239000000003 μs

@Moelf
Copy link
Member

Moelf commented May 15, 2025

JuliaLang/julia#58418

I will test against this PR later

@giordano
Copy link
Member

Would be good to run CI on aarch64 (Linux and/or macOS), no?

@m-fila m-fila mentioned this pull request May 15, 2025
@Moelf
Copy link
Member

Moelf commented May 15, 2025

julia #58418 on M4
Processed 100 events 32 times
Average time per event 151.89809874999995 ± 5.095528318872066 μs
Lowest time per event 143.37917 μs

so this is same as nightly, and slower than Julia 1.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants