Skip to content

UTF-8 accentuated characters causing segfault #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Rfferrao87 opened this issue Jun 12, 2020 · 0 comments
Open

UTF-8 accentuated characters causing segfault #340

Rfferrao87 opened this issue Jun 12, 2020 · 0 comments

Comments

@Rfferrao87
Copy link

Rfferrao87 commented Jun 12, 2020

Hi, I'd like to know if this is an isolated case, but some logs with utf-8 characters have been breaking lognormalizer and, consequently, rsyslog for me.

Here are my tests:

echo 'msg="Supervisão"' | lognormalizer -r sample.rb -vvv

liblognorm: loading rulebase file 'sample.rb'
liblognorm: rulebase version is 2

liblognorm: read rulebase line[~3]: 'rule=:msg=%msg:string%'
liblognorm: rule line to add: ':msg=%msg:string%'
liblognorm: addSampToTree 0 of 16
liblognorm: parsed literal: 'msg='
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "m" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "m" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7380a0, parser 0x738600
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "s" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "s" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7386b0, parser 0x738600
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "g" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "g" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x738880, parser 0x7385c0
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "=" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "=" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7387a0, parser 0x738a90
liblognorm: parsed field: 'msg'
liblognorm: field type 'string', i 15
liblognorm: ln_pdagAddParserInternal: { "name": "msg", "type": "string" }
liblognorm: ln_pdagAddParserInstance: { "name": "msg", "type": "string" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x738a40, parser 0x738a90
liblognorm: parsed literal: ''
liblognorm: end addSampToTree 16 of 16
liblognorm: optimizing main pdag component
liblognorm: pre sort, parser 0:(null)[7680004]
liblognorm: post sort, parser 0:(null)[7680004]
liblognorm: optimizing 0x7386b0: field 0 type 'literal', name '(null)': 'm':
liblognorm: opt path compact: add 0x738560 to 0x7388d0
liblognorm: delete 0x7386b0[1]: (null)
liblognorm: opt path compact: add 0x738600 to 0x7388d0
liblognorm: delete 0x738880[1]: (null)
liblognorm: opt path compact: add 0x738b20 to 0x7388d0
liblognorm: delete 0x7387a0[1]: (null)
liblognorm: pre sort, parser 0:msg[7680032]
liblognorm: post sort, parser 0:msg[7680032]
liblognorm: optimizing 0x738cb0: field 0 type 'string', name 'msg': 'UNKNOWN':
liblognorm: finished optimizing main pdag component
liblognorm: ---AFTER OPTIMIZATION------------------
liblognorm: MAIN COMPONENT:
liblognorm: subDAG 0x7380a0 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm: field type 'literal', name '(null)': 'msg=': called 0
liblognorm: field type 'literal', name '(null)': 'msg=':
liblognorm:   subDAG 0x738a40 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm:   field type 'string', name 'msg': 'UNKNOWN': called 0
liblognorm:   field type 'string', name 'msg': 'UNKNOWN':
liblognorm:     subDAG [TERM] 0x738cb0 (children: 0 parsers, ref 1) [called 0, backtracked 0]
liblognorm: MAIN COMPONENT (alternative):
liblognorm: 0x7380a0[ref 1]:
liblognorm:   0x738a40[ref 1]: msg=
liblognorm:     0x738cb0[ref 1]: msg=%msg:string%
liblognorm: =======================================
number of tree nodes: 6
liblognorm: MAIN COMPONENT:
liblognorm: subDAG 0x7380a0 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm: field type 'literal', name '(null)': 'msg=': called 0
liblognorm: field type 'literal', name '(null)': 'msg=':
liblognorm:   subDAG 0x738a40 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm:   field type 'string', name 'msg': 'UNKNOWN': called 0
liblognorm:   field type 'string', name 'msg': 'UNKNOWN':
liblognorm:     subDAG [TERM] 0x738cb0 (children: 0 parsers, ref 1) [called 0, backtracked 0]
liblognorm: MAIN COMPONENT (alternative):
liblognorm: 0x7380a0[ref 1]:
liblognorm:   0x738a40[ref 1]: msg=
liblognorm:     0x738cb0[ref 1]: msg=%msg:string%
To normalize: 'msg="Supervisão"'
liblognorm: 0: enter parser, dag node 0x7380a0, json 0x738910
liblognorm: 0/0:trying 'literal' parser for field '(null)', data 'msg='
liblognorm: parser lookup returns 0, pParsed 4
liblognorm: 0: potential hit, trying subtree 0x738a40
liblognorm: 4: enter parser, dag node 0x738a40, json 0x738910
liblognorm: 4/0:trying 'string' parser for field 'msg', data 'UNKNOWN'
Segmentation fault (core dumped)

The rulebase contents are the following:

version=2

rule=:msg=%msg:string%

Can you help me figure this out?

@Rfferrao87 Rfferrao87 changed the title UTF-8 characters causing segfault UTF-8 accentuated characters causing segfault Jan 16, 2022
EHerzog76 added a commit to EHerzog76/liblognorm that referenced this issue Mar 6, 2022
julthomas added a commit to julthomas/liblognorm that referenced this issue Aug 24, 2022
In struct data_String, the perm_chars member is declared as
an array indexed from 0 to 255:

    char perm_chars[256];

Bytes > 127, for instance with UTF-8 data, cause a segfault.
I believe casting as (unsigned char) instead of (unsigned) should
ensure the perm_chars is accessed at index 0..255.

I came this segfault with the following string sample. Rsyslog crashes
in stringIsPermittedChar() when it processes char byte 0xe2.

echo 'd’ouverture' |od -t x1 -a
0000000  64  e2  80  99  6f  75  76  65  72  74  75  72  65  0a
          d   b nul  em   o   u   v   e   r   t   u   r   e  nl
0000016

Note line numbers in parser.c reported by gdb are wrong with master
because it is from a 2.0.6 version with other patches:

#0  0x00007f9a73034713 in stringIsPermittedChar (data=0x55bd3a1a42a0, c=-30 '\342') at parser.c:3236
rsyslog#1  0x00007f9a73034c63 in ln_v2_parseString (npb=0x7f9a5292d670, offs=0x7f9a5292d600, pdata=0x55bd3a1a42a0, parsed=0x7f9a5292d5f8, value=0x7f9a5292d5f0) at parser.c:3363
rsyslog#2  0x00007f9a73029279 in tryParser (npb=0x7f9a5292d670, dag=0x55bd3a1a11a0, offs=0x7f9a5292d600, pParsed=0x7f9a5292d5f8, value=0x7f9a5292d5f0, prs=0x55bd3a14ea00) at pdag.c:1454
rsyslog#3  0x00007f9a730295db in ln_normalizeRec (npb=0x7f9a5292d670, dag=0x55bd3a1a11a0, offs=0, bPartialMatch=0, json=0x7f9a4c4dd260, endNode=0x7f9a5292d6a0) at pdag.c:1575
rsyslog#4  0x00007f9a730299ac in ln_normalize (ctx=0x55bd3a14e680, str=0x7f9a4c4e19a0 "d’ouverture de session :rsyslog#11#011{837B856E-32B3-78E2-3D77-4E95F8904E71}rsyslog#15#012#015#012Informations sur le processus :rsyslog#15#012#011ID du processus :rsyslog#11#0110x0#015#012#011Nom du processus :rsyslog#11#01"..., strLen=8021, json_p=0x7f9a5292d6d8) at pdag.c:1653
rsyslog#5  0x00007f9a7302459e in doAction (pMsgData=0x7f9a5292d740, pWrkrData=0x7f9a4c4daad0) at mmnormalize.c:259
...

This should fix issue rsyslog#340 "UTF-8 accentuated
characters causing segfault". The patch in this pull request is
also similar.
julthomas added a commit to zenetys/rpm-rsyslog that referenced this issue Sep 7, 2022
The bug is described in the following liblognorm pull request
and issue:

- string: fix out of bound access in perm_chars causing segfault
  rsyslog/liblognorm#364

- UTF-8 accentuated characters causing segfault
  rsyslog/liblognorm#340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant