-
Notifications
You must be signed in to change notification settings - Fork 19
Rocsparse add targeting OLCF Frontier #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
o New bml_sort_rocsparse_ellpack() method to sort csr column indices o New bml_prune_rocsparse_ellpack() method to threshold the csr matrix o Modify bml_add_ellpack() and bml_multiply_ellpack() to use these methods o Progress benchmarks now produce correct results with rocsparse
These were addressed in commit aa03a84
o Comments have been corrected
o code for obtaining nnz on host is revised, eliminating host array update
- host nnz variables are used in rocsparse calls
- the cusparse code from which this was copied should probably be similarly updated
________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Monday, June 12, 2023 12:30:28 PM
To: lanl/bml
Cc: Wall, Michael E; Author
Subject: [EXTERNAL] Re: [lanl/bml] Rocsparse add targeting OLCF Frontier (PR #716)
@jeanlucf22 commented on this pull request.
________________________________
In src/C-interface/ellpack/bml_add_ellpack_typed.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*discussion_r1227036980__;Iw!!Bt8fGhp8LhKGRg!HLcg1rXNyskCYEXh1zT1E4fTxT86nPgCMLBUnXiFm1AyUu85jPhYvLYu46csgkweNxvlANIlJxUE9cLqhjhXOd-V$>:
+ }
+
+ // Sort the resulting matrix
+ TYPED_FUNC(bml_sort_rocsparse_ellpack) (handle,A);
+ BML_CHECK_ROCSPARSE(rocsparse_set_mat_storage_mode(matA, rocsparse_storage_mode_sorted));
+
+ // Prune (threshold) the resulting matrix
+ TYPED_FUNC(bml_prune_rocsparse_ellpack) (handle,A,threshold);
+
+ // Free the temporary arrays used on the device and host
+#pragma omp target exit data map(delete:csrRowPtrC_tmp[:A_N+1],csrColIndC_tmp[:nnzC_tmp],csrValC_tmp[:nnzC_tmp])
+ free(csrRowPtrC_tmp);
+ free(csrColIndC_tmp);
+ free(csrValC_tmp);
+
+ // Done with matrix multiplication.
incorrect comment
________________________________
In src/C-interface/ellpack/bml_add_ellpack_typed.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*discussion_r1227037925__;Iw!!Bt8fGhp8LhKGRg!HLcg1rXNyskCYEXh1zT1E4fTxT86nPgCMLBUnXiFm1AyUu85jPhYvLYu46csgkweNxvlANIlJxUE9cLqhvCpeiu8$>:
+
+ rocsparse_handle handle = NULL;
+ rocsparse_mat_descr matA, matB, matC_tmp;
+
+ BML_CHECK_ROCSPARSE(rocsparse_create_handle(&handle));
+
+ // convert ellpack to cucsr
+ TYPED_FUNC(bml_ellpack2cucsr_ellpack) (A);
+ TYPED_FUNC(bml_ellpack2cucsr_ellpack) (B);
+
+ // ensure the matrices are sorted
+ TYPED_FUNC(bml_sort_rocsparse_ellpack) (handle,A);
+ TYPED_FUNC(bml_sort_rocsparse_ellpack) (handle,B);
+
+ // Create cusparse matrix A and B in CSR format
+ // Note: The following update is not necessary since the ellpack2cucsr
I am not understanding this comment. Should it be here? Or is there an update we can remove?
________________________________
In src/C-interface/ellpack/bml_add_ellpack_typed.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*discussion_r1227039341__;Iw!!Bt8fGhp8LhKGRg!HLcg1rXNyskCYEXh1zT1E4fTxT86nPgCMLBUnXiFm1AyUu85jPhYvLYu46csgkweNxvlANIlJxUE9cLqhladWAKq$>:
+ rocsparse_mat_descr matA, matB, matC_tmp;
+
+ BML_CHECK_ROCSPARSE(rocsparse_create_handle(&handle));
+
+ // convert ellpack to cucsr
+ TYPED_FUNC(bml_ellpack2cucsr_ellpack) (A);
+ TYPED_FUNC(bml_ellpack2cucsr_ellpack) (B);
+
+ // ensure the matrices are sorted
+ TYPED_FUNC(bml_sort_rocsparse_ellpack) (handle,A);
+ TYPED_FUNC(bml_sort_rocsparse_ellpack) (handle,B);
+
+ // Create cusparse matrix A and B in CSR format
+ // Note: The following update is not necessary since the ellpack2cucsr
+ // routine updates the csr rowpointers on host and device
+#pragma omp target update from(csrRowPtrA[:A_N+1])
why do we need these on the host?
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*pullrequestreview-1475492245__;Iw!!Bt8fGhp8LhKGRg!HLcg1rXNyskCYEXh1zT1E4fTxT86nPgCMLBUnXiFm1AyUu85jPhYvLYu46csgkweNxvlANIlJxUE9cLqhj0-9IYm$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEP6QVN5HG5GEONGOGLXK5N4JANCNFSM6AAAAAAZDWRBGQ__;!!Bt8fGhp8LhKGRg!HLcg1rXNyskCYEXh1zT1E4fTxT86nPgCMLBUnXiFm1AyUu85jPhYvLYu46csgkweNxvlANIlJxUE9cLqhn2NC6bO$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@@ -21,6 +21,13 @@ | |||
#ifdef BML_USE_CUSPARSE | |||
#include <cusparse.h> | |||
#endif | |||
#ifdef BML_USE_ROCSPARSE | |||
// Copy rocsparse headers into src/rocsparse/ and edit rocsparse_functions.h to remove '[[...]]' text | |||
#include "../rocsparse/rocsparse.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this file is missing...
Yes, this has been an issue since late releases of rocm 4, where some C++ syntax was introduced in a C header. Right now the rocm headers need to be copied from the rocm installation into src/C-interface/rocsparse and one of them needs to be edited. There's a fix in rocm 5.5 but that's not yet available on Frontier, when it is (should be soon), we'll just need to include the rocsparse.h header from the rocm installation.
I edited the comments in commit ad5c38f to specify the correct path where the rocsparse headers need to be, the path was incorrect in the previous comments.
________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Monday, June 12, 2023 1:34:54 PM
To: lanl/bml
Cc: Wall, Michael E; Author
Subject: [EXTERNAL] Re: [lanl/bml] Rocsparse add targeting OLCF Frontier (PR #716)
@jeanlucf22 commented on this pull request.
________________________________
In src/C-interface/ellpack/bml_add_ellpack_typed.c<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*discussion_r1227149408__;Iw!!Bt8fGhp8LhKGRg!FH7w7vbaeDT1ldz_DL0z2LHO2a5GYauKEUulDFo97UJpRezYjNZdMYe9nGQRytfb45GTKtG5y5Z8KF-gNxUpyEXI$>:
@@ -21,6 +21,13 @@
#ifdef BML_USE_CUSPARSE
#include <cusparse.h>
#endif
+#ifdef BML_USE_ROCSPARSE
+// Copy rocsparse headers into src/rocsparse/ and edit rocsparse_functions.h to remove '[[...]]' text
+#include "../rocsparse/rocsparse.h"
looks like this file is missing...
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*pullrequestreview-1475673136__;Iw!!Bt8fGhp8LhKGRg!FH7w7vbaeDT1ldz_DL0z2LHO2a5GYauKEUulDFo97UJpRezYjNZdMYe9nGQRytfb45GTKtG5y5Z8KF-gN5VX54h-$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEJ54GZRRO32QBO37JTXK5VN5ANCNFSM6AAAAAAZDWRBGQ__;!!Bt8fGhp8LhKGRg!FH7w7vbaeDT1ldz_DL0z2LHO2a5GYauKEUulDFo97UJpRezYjNZdMYe9nGQRytfb45GTKtG5y5Z8KF-gN2CZ4SpL$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
So we are expecting the user to do that? why not add this modified file to the repo? |
Let's discuss. Personally I don't think it's a good idea to redistribute an edited version of AMD's code in BML, but there should be other ways to handle this. For example, the olcf consultants might be willing to patch the relevant rocm header on frontier/crusher, or maybe we can write a script to copy the headers to the right place and patch the relevant one.
I was hoping AMD would fix this much sooner, I reported the issue way back in rocm 4.X...however, it is fixed in rocm 5.5 so at least it won't be an issue soon.
Meanwhile I'd like to keep this PR focused on the add/multiply methods, and make this a separate issue, addressed in a future PR. The build has worked this way for a while.
…________________________________
From: Jean-Luc Fattebert ***@***.***>
Sent: Monday, June 12, 2023 2:06:29 PM
To: lanl/bml
Cc: Wall, Michael E; Author
Subject: [EXTERNAL] Re: [lanl/bml] Rocsparse add targeting OLCF Frontier (PR #716)
So we are expecting the user to do that? why not add this modified file to the repo?
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/lanl/bml/pull/716*issuecomment-1588012478__;Iw!!Bt8fGhp8LhKGRg!FoapLQMzN6RAilCBhA_djFyhCKzF_1ZetGrCRutSj42GJsXaUblDaxm1ZnhSMahjnRQ__lvembCthdH4hSZuEhQi$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AA67VEPXK7BT22R7PR4YEU3XK5ZELANCNFSM6AAAAAAZDWRBGQ__;!!Bt8fGhp8LhKGRg!FoapLQMzN6RAilCBhA_djFyhCKzF_1ZetGrCRutSj42GJsXaUblDaxm1ZnhSMahjnRQ__lvembCthdH4hbstSxkJ$>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
/ccs/proj/csc304
How about putting a copy of that file in our project shared directory at OLCF? /ccs/proj/csc304 |
Working rocsparse add and multiply fix