-
Notifications
You must be signed in to change notification settings - Fork 381
Use _aligned_malloc instead of posix_memalign on Windows #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use _aligned_malloc instead of posix_memalign on Windows
Thanks for this patch, Tony. We have very few (zero?) Windows users/developers locally, so I'm glad someone out there is trying it out with Microsoft tools. |
So far I'm mostly interesting in using GNU tools on the Microsoft platform (among others), but MSVC might give me some better debugging info so we'll see. I was very impressed by how quickly BLIS builds and how cleanly it's put together so far. Haven't benchmarked perf yet, but your recent papers look encouraging! If building shared libraries can happen soonish, I'm thinking of trying out plugging in BLIS into Julia as an alternate BLAS implementation. |
Tony, Thank you for your kind words regarding cleanliness. Hopefully you find performance to be satisfactory, but if it is not, keep in mind that we still do not have optimized kernels for many architectures, so low observed performance probably just means that the reference implementation is being called for the operation in question. And as you may have noticed, BLIS does not attempt to detect hardware flags (sse, avx, etc), nor does it attempt to adjust cache blocksizes based on the size of the L2/L3 caches---all of this must be done manually, for now. But, we would like to think that those are problems that have been solved elsewhere already, and that it will just take someone with the right expertise and motivation to come along and contribute those parts. Of course, if we wait long enough, this will rise to the top of my own queue. :) Please keep in touch! |
+1 to Julia integration. That's a great target for BLIS. On Saturday, June 14, 2014, Tony Kelman [email protected] wrote:
Jeff Hammond |
Tony, as for integrating BLIS into Julia, we would be delighted if you were willing to take on such a project. It would also be interesting to see if the native (or even the object-based) BLIS interfaces would make it easier to include BLAS-like functionality within Julia. |
Hi @fgvanzee - I'd certainly be happy to experiment further, maybe starting on Linux first to make life a little easier. The big thing that's missing right now is building BLIS as a shared library. And the various architecture detection features as you said, but for a first experimental cut doing some things manually wouldn't be too bad. Julia's build system is already set up to allow choosing between OpenBLAS and system BLAS (at build time), so adding another option should be feasible. There's some code in the Makefiles for Atlas too, but it might be bitrotting a little. |
I'll admit that I don't have much expertise with the issues involved with properly generating shared libraries. I did it for libflame once, but I'm not sure if I was employing best practices, etc. I'll add it to my to-do list and hopefully you can tweak or make suggestions as needed. |
This needs fixing properly somehow, but using -O3 (at least with gcc 8.3), we get this: Program received signal SIGILL, Illegal instruction. 0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0) at ref_kernels/bli_cntx_ref.c:456 456 for ( i = 0; i < BLIS_NUM_LEVEL3_OPS; ++i ) vfuncs[ i ] = NULL; (gdb) bt #0 0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0) at ref_kernels/bli_cntx_ref.c:456 flame#1 0x000000001004c0a8 in bli_cntx_init_power9 (cntx=<optimized out>) at config/power9/bli_cntx_init_power9.c:42 flame#2 0x000000001003c85c in bli_gks_register_cntx (id=BLIS_ARCH_POWER9, nat_fp=0x1004c090 <bli_cntx_init_power9>, ref_fp=0x1004c0d0 <bli_cntx_init_power9_ref>, ind_fp=<optimized out>) at frame/base/bli_gks.c:373 flame#3 0x000000001003c97c in bli_gks_init () at frame/base/bli_gks.c:155 flame#4 0x000000001003cfe8 in bli_init_apis () at frame/base/bli_init.c:78 flame#5 0x00007ffff7e045a8 in __pthread_once_slow () from /lib64/libpthread.so.0 flame#6 0x00000000100492e8 in bli_pthread_once (once=<optimized out>, init=<optimized out>) at frame/thread/bli_pthread.c:314 flame#7 0x000000001003d138 in bli_init_once () at frame/base/bli_init.c:104 flame#8 bli_init_auto () at frame/base/bli_init.c:54 flame#9 0x0000000010011300 in cdotc_ (n=<optimized out>, x=<optimized out>, incx=<optimized out>, y=<optimized out>, incy=<optimized out>) at frame/compat/bla_dot.c:89 flame#10 0x0000000010002a48 in check2_ (sfac=0x103d14dc <sfac>) at blastest/src/cblat1.c:529 flame#11 0x0000000010001ef4 in main () at blastest/src/cblat1.c:112
This allows BLIS with the reference configuration to compile with MinGW (gcc 4.8.1 from mingw-builds, using MSYS2). The testsuite doesn't run successfully, it gets to the gemm tests then stops with an
Error 127
, see https://gist.github.com/tkelman/e1e619c2f1d2ff410068If I manually change
BLIS_SIMD_ALIGN_SIZE
to 1 inbli_config.h
, then a MinGW build does pass unit tests. From WinDbg, it looks like there might be an invalid use of_aligned_free
with the default alignment of 16.