Skip to content

Use _aligned_malloc instead of posix_memalign on Windows #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 14, 2014

Conversation

tkelman
Copy link
Contributor

@tkelman tkelman commented Jun 14, 2014

This allows BLIS with the reference configuration to compile with MinGW (gcc 4.8.1 from mingw-builds, using MSYS2). The testsuite doesn't run successfully, it gets to the gemm tests then stops with an Error 127, see https://gist.github.com/tkelman/e1e619c2f1d2ff410068

If I manually change BLIS_SIMD_ALIGN_SIZE to 1 in bli_config.h, then a MinGW build does pass unit tests. From WinDbg, it looks like there might be an invalid use of _aligned_free with the default alignment of 16.

fgvanzee added a commit that referenced this pull request Jun 14, 2014
Use _aligned_malloc instead of posix_memalign on Windows
@fgvanzee fgvanzee merged commit ad48dca into flame:master Jun 14, 2014
@fgvanzee
Copy link
Member

Thanks for this patch, Tony. We have very few (zero?) Windows users/developers locally, so I'm glad someone out there is trying it out with Microsoft tools.

@tkelman tkelman deleted the memalign_windows branch June 14, 2014 20:12
@tkelman
Copy link
Contributor Author

tkelman commented Jun 14, 2014

So far I'm mostly interesting in using GNU tools on the Microsoft platform (among others), but MSVC might give me some better debugging info so we'll see.

I was very impressed by how quickly BLIS builds and how cleanly it's put together so far. Haven't benchmarked perf yet, but your recent papers look encouraging! If building shared libraries can happen soonish, I'm thinking of trying out plugging in BLIS into Julia as an alternate BLAS implementation.

@fgvanzee
Copy link
Member

Tony, Thank you for your kind words regarding cleanliness. Hopefully you find performance to be satisfactory, but if it is not, keep in mind that we still do not have optimized kernels for many architectures, so low observed performance probably just means that the reference implementation is being called for the operation in question. And as you may have noticed, BLIS does not attempt to detect hardware flags (sse, avx, etc), nor does it attempt to adjust cache blocksizes based on the size of the L2/L3 caches---all of this must be done manually, for now. But, we would like to think that those are problems that have been solved elsewhere already, and that it will just take someone with the right expertise and motivation to come along and contribute those parts. Of course, if we wait long enough, this will rise to the top of my own queue. :)

Please keep in touch!

@jeffhammond
Copy link
Member

+1 to Julia integration. That's a great target for BLIS.

On Saturday, June 14, 2014, Tony Kelman [email protected] wrote:

So far I'm mostly interesting in using GNU tools on the Microsoft platform
(among others), but MSVC might give me some better debugging info so we'll
see.

I was very impressed by how quickly BLIS builds and how cleanly it's put
together so far. Haven't benchmarked perf yet, but your recent papers look
encouraging! If building shared libraries can happen soonish, I'm thinking
of trying out plugging in BLIS into Julia as an alternate BLAS
implementation.


Reply to this email directly or view it on GitHub
#9 (comment).

Jeff Hammond
[email protected]
http://jeffhammond.github.io/

@fgvanzee
Copy link
Member

Tony, as for integrating BLIS into Julia, we would be delighted if you were willing to take on such a project. It would also be interesting to see if the native (or even the object-based) BLIS interfaces would make it easier to include BLAS-like functionality within Julia.

@tkelman
Copy link
Contributor Author

tkelman commented Jun 23, 2014

Hi @fgvanzee - I'd certainly be happy to experiment further, maybe starting on Linux first to make life a little easier. The big thing that's missing right now is building BLIS as a shared library. And the various architecture detection features as you said, but for a first experimental cut doing some things manually wouldn't be too bad.

Julia's build system is already set up to allow choosing between OpenBLAS and system BLAS (at build time), so adding another option should be feasible. There's some code in the Makefiles for Atlas too, but it might be bitrotting a little.

@fgvanzee
Copy link
Member

I'll admit that I don't have much expertise with the issues involved with properly generating shared libraries. I did it for libflame once, but I'm not sure if I was employing best practices, etc. I'll add it to my to-do list and hopefully you can tweak or make suggestions as needed.

@songmaotian songmaotian mentioned this pull request Apr 22, 2016
@loveshack loveshack mentioned this pull request Mar 5, 2018
loveshack pushed a commit to loveshack/blis that referenced this pull request Sep 24, 2019
This needs fixing properly somehow, but using -O3 (at least with gcc 8.3),
we get this:

Program received signal SIGILL, Illegal instruction.
0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
    at ref_kernels/bli_cntx_ref.c:456
456             for ( i = 0; i < BLIS_NUM_LEVEL3_OPS; ++i ) vfuncs[ i ] = NULL;
(gdb) bt
#0  0x000000001004c660 in bli_cntx_init_power9_ref (cntx=0x103e06b0)
    at ref_kernels/bli_cntx_ref.c:456
flame#1  0x000000001004c0a8 in bli_cntx_init_power9 (cntx=<optimized out>)
    at config/power9/bli_cntx_init_power9.c:42
flame#2  0x000000001003c85c in bli_gks_register_cntx (id=BLIS_ARCH_POWER9,
    nat_fp=0x1004c090 <bli_cntx_init_power9>,
    ref_fp=0x1004c0d0 <bli_cntx_init_power9_ref>, ind_fp=<optimized out>)
    at frame/base/bli_gks.c:373
flame#3  0x000000001003c97c in bli_gks_init () at frame/base/bli_gks.c:155
flame#4  0x000000001003cfe8 in bli_init_apis () at frame/base/bli_init.c:78
flame#5  0x00007ffff7e045a8 in __pthread_once_slow () from /lib64/libpthread.so.0
flame#6  0x00000000100492e8 in bli_pthread_once (once=<optimized out>,
    init=<optimized out>) at frame/thread/bli_pthread.c:314
flame#7  0x000000001003d138 in bli_init_once () at frame/base/bli_init.c:104
flame#8  bli_init_auto () at frame/base/bli_init.c:54
flame#9  0x0000000010011300 in cdotc_ (n=<optimized out>, x=<optimized out>,
    incx=<optimized out>, y=<optimized out>, incy=<optimized out>)
    at frame/compat/bla_dot.c:89
flame#10 0x0000000010002a48 in check2_ (sfac=0x103d14dc <sfac>)
    at blastest/src/cblat1.c:529
flame#11 0x0000000010001ef4 in main () at blastest/src/cblat1.c:112
Aaron-Hutchinson pushed a commit to sifive/sifive-blis that referenced this pull request Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants