Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignment issues on solaris 11 #1700

Closed
jedele opened this issue Apr 29, 2022 · 29 comments · Fixed by #1724
Closed

alignment issues on solaris 11 #1700

jedele opened this issue Apr 29, 2022 · 29 comments · Fixed by #1724

Comments

@jedele
Copy link

jedele commented Apr 29, 2022

I have a C application that links with 3.11 libyara and works fine on Ubuntu 20.04.4 LTS.
But when I install everything on Solaris 11 and build it, it does not work with the libyara shared library.
So I statically link the needed functions from libyara with the application and it builds.

But when I run it on solaris 11.4.0.15 sun4v sparc
(doing the same operations that work fine on Ubuntu) it gets a bus error, I traced it
down to an alignment error, It appears to me that yr_area_allocate_memory is returning a buffer which
is not aligned (something like 0x000100b4c301).
Are there any special libtool/make/automate or configuration flags needed to only return aligned memory?
Thank you,
jim edele

@plusvic
Copy link
Member

plusvic commented Apr 29, 2022

We fixed some alignment issues in this commit:
e1654ae

But that was released in version 4.0.1. Version 3.11 is really old, my recommendation is upgrading to the latest version.

@jedele
Copy link
Author

jedele commented May 3, 2022

Just for the record.
We upgraded to 4.2.1, latest stable yara. Again, on solaris 11, built everything and this time it built without any
dynamic library issues, but I am still getting a bus error. It is in _yr_re_emit, doing a sth:

libyara.so.9.0.1_yr_re_emit+0x434 be,pt %icc, -0x354 <libyara.so.9.0.1_yr_re_emit+0xe0>
libyara.so.9.0.1`_yr_re_emit+0x438 sth %i4, [%o0]

register %o0 is on a half word boundary, I believe,

%o0 = 0x0000000100760c1d

I think this code is near the return from _yr_re_emit, which, at first, lead me to suspect stack corruption,
But I don't really see anything pointing to this, The %o0 address is valid. I can access it in the core.

I tried compiling with -g, and without -O3, and get the same results

@plusvic
Copy link
Member

plusvic commented May 3, 2022

I've fixed some alignment issues in this branch:
https://github.com/VirusTotal/yara/tree/fix_alignment_issues

Try with that version and let me know if you find some other alignment issues, there are probably more.

@jedele
Copy link
Author

jedele commented May 5, 2022

i tried git clone of the link and ran the usually yara boostrap.sh and configure, but when I tried make (on
Solaris 11 where I built 4.2.1 version) I get :
Making all in libyara
gmake[1]: Entering directory '/export/home/james.edele/yara-alignment/libyara'
LEX lexer.c
"/export/home/james.edele/yara-alignment/libyara/lexer.l":line 141: Error: missing translation value
gmake[1]: *** [Makefile:871: lexer.c] Error 1
gmake[1]: Leaving directory '/export/home/james.edele/yara-alignment/libyara'
gmake: *** [Makefile:1177: all-recursive] Error 1

I am not a lex person and I tried to figure it out but it's gonna take some time.
BTW, I did not install bison or flex b/c I did not need it to build 4.2.1 or 3.11 on any platform,

@plusvic
Copy link
Member

plusvic commented May 6, 2022

That's probably because your system has an old version of lex/flex. However re-generating lexer.c from lexer.l should not be necessary, as the lexer.c included in the repo is up-to-date. Probably lexer.c has a timestamp older than lexer.l and that's why gmake thinks that lexer.c should be re-generated. Try touch libyara/lexer.c in order to update its timestamp to the current time. With this I think that gmake won't try to execute flex again and will use the already existing lexer.c.

@jedele
Copy link
Author

jedele commented May 7, 2022

That tip was helpful. Apparently the git tree already has lexer.c and grammer.c so, normally, there is no need to run lex (of flex) or yacc to get the corresponding C file. Solaris's lex and yacc don't work correctly with yara. For example, the yara makefile calls yacc with. -W flag, but there is no -W option to Solaris' yacc. But by touching the C files, there is no need to run lex/yacc. So I got the git tree to build, but, alas, I still get the alignment issue in the same place. I tried -g and -Oo, but with no better results. I need to figure out exactly which C line in _yr_re_emit is being executed when we get the bus error.

@plusvic
Copy link
Member

plusvic commented May 10, 2022

@jedele can you confirm if the alignment issues are completely fixed?

@meme
Copy link
Contributor

meme commented May 10, 2022

I took a look at the patches you've posted but they don't solve the underlying issue, which is that the arena implementation invokes undefined behaviour.

With UBSan enabled it is possible to see this (on amd64 Linux):

runtime error: member access within misaligned address 0x7fb96b0b185c for type 'struct YR_MATCH', which requires 8 byte alignment

If you call, e.g. yr_rules_scan_file which takes a callback, when it is fired with CALLBACK_MSG_RULE_MATCHING (for example) the rule data passed in is allocated on a misaligned address: YR_MATCH requires an 8 byte alignment, but the arena allocator does not respect this. It needs to be updated to give out addresses with 8 byte alignment, just like malloc does.

@jedele
Copy link
Author

jedele commented May 10, 2022

To answer @plusvic, no, unfortunately I am still seeing alignment issues, sparcv9 architecture. The Sparcv9 architecture manual states: Half word accesses shall be aligned on 2-byte boundaries; word accesses (which include instruction fetches) shall be aligned on 4-byte boundaries; extended-word and doubleword accesses shall be aigned on 8-byte boundaries; and quadword quantities shall be aligned on 16-byte boundaries. An improperly aligned address in a load, store, or load-store instruction causes a trap to occur And I agree with meme's comment earlier that the arena allocator should give out addresses with 8 byte alignment.

@plusvic
Copy link
Member

plusvic commented May 11, 2022

I've updated fix_alignment_issues branch with commit 568bdd3. Please try that and let me know if a new issue arises.

@jedele
Copy link
Author

jedele commented May 11, 2022

@plusvic. I was able to install and build with your latest fix. The application I am using on Solaris 11 gets further along but still gets a bus error, another alignment issue, in a different place in the code. The code is doing a
st %g1 [%g3]
and %g3 has
0x000000010044ee3f
which is a valid address in the core.

The back trace shows it in yara_yyparse:
libyara.so.9.0.1yara_yyparse+0x2d88(10034a890, 1001131e0, ffffffff5540cd10, ffffffff7fffcca0, 4, ffffffff7fffd030) libyara.so.9.0.1yr_lex_parse_rules_file+0xb0(100104c80, 10034a890, 10014a830, 1001108a0, 0, ffffffff7fffe7e8)
libyara.so.9.0.1`yr_compiler_add_file+0x1d0(1001131e0, 0, 0, ffffffff7fffed22, 0, 0)
compile_files+0xa4(1001131e0, 100104b00, ffffffff7fffeb18, ffffffff7fffeb20, 100104c80, ffffffff7fffed22)
main+0x190(1, ffffffff7fffeb18, ffffffff7fffeb38, 2, 100104000, 100000)
_start+0x64(0, 0, 0, 0, 0, 0)

Based on the surrounding function calls, yara_yyparse() appears to be calling yr_arena_get_current_offset()
and using the return address as a pointer. I believe it is executing this code in yara_yyparse (in grammar.c):
int32_t* jmp_offset_addr = (int32_t*) yr_arena_ref_to_ptr(
compiler->arena, &fixup->ref);
int32_t jmp_offset =
yr_arena_get_current_offset(compiler->arena, YR_CODE_SECTION) -
fixup->ref.offset + 1;

    *jmp_offset_addr = jmp_offset;            <---- bus error here 

    // Remove fixup from the stack.
    compiler->fixup_stack_head = fixup->next;
    yr_free(fixup);

@plusvic
Copy link
Member

plusvic commented May 12, 2022

I've fixed more alignment issues in the fix_alignment_issues branch, but I'm afraid there are a lot more, specially in the modules that parse PE files. Parsing PE files taking into account all these alignment restrictions are going to be really painful.

Also, without having a way to run automated tests in this kind of platforms, is going to be hard to keep the code free from issues down the road. I'll try to setup some automated tests with UBSan enabled and see what comes out.

@jedele
Copy link
Author

jedele commented May 13, 2022

@plusvic I updated my tree to your latest alignment tree, built everything and ran it. Again, a bus error, this
time in yr_arena_write_data(). It's probably when dereferencing ref,
ref->buffer_id = r.buffer_id;
ref->offset = r.offset;
but I admit it's hard to disassemble the code from the optimized machine language, though this is a new
bus error, having never seen it in this coee. maybe your fixes fixed the others?

As for as any automated tests, I can only offer to try anything new you want to provide since It only takes a few minutes to download and build, And when it crashes, it goes fast. Thanks for the help you have provided so far.

@plusvic
Copy link
Member

plusvic commented May 13, 2022

Could you post the full stack trace for this crash?

@plusvic
Copy link
Member

plusvic commented May 13, 2022

@meme could you provide more details about how did you get this error?

runtime error: member access within misaligned address 0x7fb96b0b185c for type 'struct YR_MATCH', which requires 8 byte alignment

I'm compiling YARA with -fsanitize=undefined and running the tests cases, but I'm not getting any errors at all.

@plusvic
Copy link
Member

plusvic commented May 13, 2022

@meme could you provide more details about how did you get this error?

runtime error: member access within misaligned address 0x7fb96b0b185c for type 'struct YR_MATCH', which requires 8 byte alignment

I'm compiling YARA with -fsanitize=undefined and running the tests cases, but I'm not getting any errors at all.

For some reason I don't get the errors on MacOS, but it's working fine on Linux.

@meme
Copy link
Contributor

meme commented May 13, 2022

@plusvic I produced that error on 4.2.1, as described:

If you call, e.g. yr_rules_scan_file which takes a callback, when it is fired with CALLBACK_MSG_RULE_MATCHING (for example) the rule data passed in is allocated on a misaligned address: YR_MATCH requires an 8 byte alignment, but the arena allocator does not respect this.

I assume the test suite has that exact case? It is possible that you are getting "lucky" and the allocator is returning addresses on an 8 byte alignment due to the nature of the test suite. Perhaps issue multiple callbacks and intermix different callback types to create more allocations on 4 byte boundaries.

@jedele
Copy link
Author

jedele commented May 13, 2022

@plusvic. Glad to. Here you go:

yaracompilescan:core> ::stack
libyara.so.9.0.1yr_arena_write_data+0x118(10014a680, 4, ffffffff7fffe908, 18, 0, 100000) libyara.so.9.0.1yr_compiler_get_rules+0xd8(1001131e0, ffffffff7fffea20, ffffffff7fffeb30, ffffffff7fffeb30, 100104c80, 0)
main+0x1e0(1, ffffffff7fffeb28, ffffffff7fffeb48, 2, 100104000, 100000)
_start+0x64(0, 0, 0, 0, 0, 0)
yaracompilescan:core>
yaracompilescan:core>
yaracompilescan:core> ::dis yr_arena_write_data+0x118
libyara.so.9.0.1yr_arena_write_data+0xf0: cmp %g4, %i1 libyara.so.9.0.1yr_arena_write_data+0xf4: be,pn %icc, +0x1c <libyara.so.9.0.1yr_arena_write_data+0x110> libyara.so.9.0.1yr_arena_write_data+0xf8: mov %o0, %g3
libyara.so.9.0.1yr_arena_write_data+0xfc: sllx %g4, 0x1, %g2 libyara.so.9.0.1yr_arena_write_data+0x100: add %g2, %g4, %g2
libyara.so.9.0.1yr_arena_write_data+0x104: sllx %g2, 0x3, %g2 libyara.so.9.0.1yr_arena_write_data+0x108: add %i0, %g2, %g2
libyara.so.9.0.1yr_arena_write_data+0x10c: ldx [%g2 + 0x8], %g3 libyara.so.9.0.1yr_arena_write_data+0x110: ld [%g1 + 0x4], %g4
libyara.so.9.0.1yr_arena_write_data+0x114: ldx [%l1 + 0x8], %g5 libyara.so.9.0.1yr_arena_write_data+0x118: ldx [%g3 + %g4], %g2
libyara.so.9.0.1yr_arena_write_data+0x11c: cmp %g2, %g5 libyara.so.9.0.1yr_arena_write_data+0x120: bcs,pt %xcc, +0x28 <libyara.so.9.0.1yr_arena_write_data+0x148> libyara.so.9.0.1yr_arena_write_data+0x124: add %g3, %g4, %g3
libyara.so.9.0.1yr_arena_write_data+0x128: ldx [%o7 + 0x18], %g4 libyara.so.9.0.1yr_arena_write_data+0x12c: add %g5, %g4, %g4
libyara.so.9.0.1yr_arena_write_data+0x130: cmp %g2, %g4 libyara.so.9.0.1yr_arena_write_data+0x134: bcc,a,pt %xcc, +0x18 <libyara.so.9.0.1yr_arena_write_data+0x14c> libyara.so.9.0.1yr_arena_write_data+0x138: ldx [%g1 + 0x8], %g1
libyara.so.9.0.1yr_arena_write_data+0x13c: sub %g2, %g5, %g2 libyara.so.9.0.1yr_arena_write_data+0x140: add %o0, %g2, %g2

If you want more disassembly of the surrounding code, let me know.

@plusvic
Copy link
Member

plusvic commented May 16, 2022

@jedele Could you try the latest version in the branch https://github.com/VirusTotal/yara/tree/fix_alignment_issues?

@jedele
Copy link
Author

jedele commented May 17, 2022

@plusvic. Today I updated the git tree to to your latest and rebuild everything. It did not crash! But, unfortunately, when I examined the results, it was wrong! What used to match no longer matches. I tried modifying the rules, but it could not match them. I tried a very large sample with a lot of rules. It output a lot warnings, a lot of "NO MATCH" results, and then hung.
I took a core at the point of hanging. Here is the back trace :

yaracompilescan:core> $c
libyara.so.9.0.1_yr_re_fiber_sync+0x168(ffffffff7fffe0c0, 10095ba38, 10034bae0, ffffffffffffffff, c2, 100754670) libyara.so.9.0.1yr_re_exec+0x61c(0, 0, ffffffff7fffe0c0, 10095ba38, 10034bae0, 100105fdd)
libyara.so.9.0.1yr_scan_verify_match+0x230(10095b9d0, 1008529d8, 100105fc4, 30, 0, 18) libyara.so.9.0.1_yr_scanner_scan_mem_block.isra.1+0xe0(10095b9d0, 100105fc4, ffffffff7fffe748, ffffffff7fffe750, 1a, 8)
libyara.so.9.0.1yr_scanner_scan_mem_blocks+0x6cc(0, ffffffff7fffe768, b400, ffffffff7d727958, ffffffff7d726000, 0) libyara.so.9.0.1yr_scanner_scan_mem+0x6c(10095b9d0, 100105fc4, 30, 0, 10095b9d0, 10095f0a0)
libyara.so.9.0.1`yr_rules_scan_mem+0x50(0, 100105fc4, 30, 4, 1000034e8, 0)
readfile+0xfc(100104000, 10095a510, 10010f710, 100001e90, 100105ff4, 100105fc4)
main+0x208(0, ffffffff7fffead8, ffffffff7fffeaf8, 2, 0, 100000)
_start+0x64(0, 0, 0, 0, 0, 0)
yaracompilescan:core>

yaracompilescan:core> ::dis _yr_re_fiber_sync+0x168
libyara.so.9.0.1_yr_re_fiber_sync+0x140: cmp %i2, %l3 libyara.so.9.0.1_yr_re_fiber_sync+0x144: bne,pt %xcc, -0x108 <libyara.so.9.0.1_yr_re_fiber_sync+0x3c> libyara.so.9.0.1_yr_re_fiber_sync+0x148: stx %i5, [%g2]
libyara.so.9.0.1_yr_re_fiber_sync+0x14c: ba,pt %xcc, -0x64 <libyara.so.9.0.1_yr_re_fiber_sync+0xe8>
libyara.so.9.0.1_yr_re_fiber_sync+0x150: clr %g1 libyara.so.9.0.1_yr_re_fiber_sync+0x154: ldub [%i5 + 0x2], %g1
libyara.so.9.0.1_yr_re_fiber_sync+0x158: ldub [%i5 + 0x1], %g2 libyara.so.9.0.1_yr_re_fiber_sync+0x15c: stb %g1, [%fp + 0x776]
libyara.so.9.0.1_yr_re_fiber_sync+0x160: stb %g2, [%fp + 0x775] libyara.so.9.0.1_yr_re_fiber_sync+0x164: ldsh [%fp + 0x775], %g1
libyara.so.9.0.1_yr_re_fiber_sync+0x168: add %i5, %g1, %i5 libyara.so.9.0.1_yr_re_fiber_sync+0x16c: cmp %i2, %l3
libyara.so.9.0.1_yr_re_fiber_sync+0x170: bne,pt %xcc, -0x134 <libyara.so.9.0.1_yr_re_fiber_sync+0x3c>
libyara.so.9.0.1_yr_re_fiber_sync+0x174: stx %i5, [%i2] libyara.so.9.0.1_yr_re_fiber_sync+0x178: ba,pt %xcc, -0x90 <libyara.so.9.0.1_yr_re_fiber_sync+0xe8> libyara.so.9.0.1_yr_re_fiber_sync+0x17c: clr %g1
libyara.so.9.0.1_yr_re_fiber_sync+0x180: ldub [%i5 + 0x1], %l6 libyara.so.9.0.1_yr_re_fiber_sync+0x184: andcc %l5, 0xff, %i5
libyara.so.9.0.1_yr_re_fiber_sync+0x188: be,pn %icc, +0x44 <libyara.so.9.0.1_yr_re_fiber_sync+0x1cc>
libyara.so.9.0.1_yr_re_fiber_sync+0x18c: ldub [%fp + 0x77f], %g1 libyara.so.9.0.1_yr_re_fiber_sync+0x190: and %l6, 0xff, %g3

I can do more debugging tomorrow to see if we are in some infinite loop in yr_re_fiber_sync. But the fact that the code no longer matches any rules is enough to make me go back to an earler version (4.2.1) of yara and my code just to make sure I haven't done anything to cause it.

@plusvic
Copy link
Member

plusvic commented May 17, 2022

Well, at least it looks like we made some progress. If you can shed some additional light about this new issue would be great. I would say that there's more than one issue going on here. The infinite loop in yr_re_fiber_sync is probably unrelated to the rules not matching. So the first thing I would try to do is reducing the test setup to its bare minimum, idealistically a single file that when scanned with a single rule doesn't match while it should.

@jedele
Copy link
Author

jedele commented May 18, 2022

Here is a single rule matching a short list of data. The data, bigfile, is a list of full pathnames.
$ cat rulesfile
rule etcFolderPolicy {
meta:
action = "filewatch"
strings:
$watch1 = /^/etc/.$/
condition:
any of ($watch
)
}

james.edele@dtexsol01 ~/yara-alignment

$ cat rulesfile
rule etcFolderPolicy {
meta:
action = "filewatch"
strings:
$watch1 = /^/etc/.$/
condition:
any of ($watch
)
}

$ cat bigfile
/etc
/etc/alsa
/etc/alsa/conf.d
/etc/alsa/conf.d/50-pulseaudio.conf
/etc/alsa/conf.d/50-arcam-av-ctl.conf
/usr/james.edele

james.edele@dtexsol01 ~/yara-alignment
$ ./yaracompilescan rulesfile /dev/null
warning file:line rulesfile:5 message $watch1 contains .*, .+ or .{x,} consider using .{,N}, .{1,N} or {x,N} with a reasonable value for N - rule
/etc - NO MATCH
/etc/alsa - NO MATCH
/etc/alsa/conf.d - NO MATCH
/etc/alsa/conf.d/50-pulseaudio.conf - NO MATCH
/etc/alsa/conf.d/50-arcam-av-ctl.conf - NO MATCH
/usr/james.edele - NO MATCH

the warning message is pretty normal even when everything works.
This code (yaracompilescan compiled with yara 4.2.1) worked fine on AIX and Ubuntu release 20.04. I want to double check that now. I may then try it with the lastes alignment tree on Ubuntu and AIX. Thank Victor

@jedele
Copy link
Author

jedele commented May 19, 2022

I built the fix_alignment_issues on Ubuntu (20.04) and it worked!. I tried it then on AIX and it hung and reported NO MATHCES in the rules and it hung. I did not verify what function it was hung in. But this is what happened on Solaris 11. See above. In both ubuntu andAIX cases, I verified that yara.4.2.1 worked correctly with my program.

I'll be taking a few days off, but I can start debugging the solaris code when I get back.

@jedele
Copy link
Author

jedele commented Jun 1, 2022

Apologies for such a long delay w/o comment.
I tried compiling and running my code linked with the alignment code using a simple rule and simple patterns. The program completes but does not match anything. This is code that worked perfectly with yara 4.2.1 on both aix and Linux but crashes with alignment bus errors on Solaris.
Running my code with 7 complex rules (each with a dozen or more patterns along with exclude patterns) it hangs. When it hangs it is looping in _yr_re_fiber_sync() with opcode RE_OPCODE_JUMP. It is in the
"while (fiber != last)" loop with the same opcode (RE_OPCODE_JUMP), it never changes.

For fun, I went back to the yara 4.2.1 code and grabbed the 1 or 2 lines for RE_OPCODE_JUMP and used them. Alas, it gets a bus error.
I can try to get more data for this case and try to isolate the rule and patterns that cause this.

@plusvic
Copy link
Member

plusvic commented Jun 1, 2022

So, for clarifying, you are testing the latest changes in the fix_alignment_issues branch and it gets into an infinite on Solaris, does it works fine in Linux?

Can you verify if jmp_len in the following snippet of _yr_re_fiber_sync is zero?

    case RE_OPCODE_JUMP:
      memcpy(&jmp_len, fiber->ip + 1, sizeof(jmp_len));
      fiber->ip += jmp_len;
      break;

I guess the infinite loop is because somehow jmp_len is zero and the instruction pointer is not advancing.

@plusvic
Copy link
Member

plusvic commented Jun 1, 2022

I think your most recent issue could be fixed by 8546232. Let me know if something changed.

@jedele
Copy link
Author

jedele commented Jun 2, 2022

You are correct, jmp_len was 0.
And I tried your latest fix on Solaris.
And it seems to work, hallelujah!
I also tried the latest alignment branch on Ubuntu and AIX.
On both it worked!
So with that, I will wait a few more days to make sure nothing else comes up. If none, I'll close this.
Thank you very much for your dedication and great help!
jim

@jedele
Copy link
Author

jedele commented Jun 6, 2022

This branch worked on Ubuntu 20.04, Solaris 11 and AIX (current release)

@plusvic
Copy link
Member

plusvic commented Jun 6, 2022

Great, I've merged it in the master branch.

plusvic added a commit that referenced this issue Jun 7, 2022
The x86 platform is very forgiving when you access a 16-bits variable stored at a memory address that is not aligned to a 2-bytes boundary, or a 32-bits variable that is not aligned to 4-bytes boundary, and so on. Other platforms, like ARM or Sparc are not that flexible, and accessing a value by dereferencing a pointer that is not aligned to the size of the value causes a processor fault.

This fixes multiple issues caused by pointers that are not aligned to the size of the value.

Fixes: #1700
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants