Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

failed: ipset v6.32: Element cannot be deleted from the set: it's not added #3296

Open
bboreham opened this issue May 10, 2018 · 12 comments
Open

Comments

@bboreham
Copy link
Contributor

Excerpt from log, focusing on one IP address:

INFO: 2018/05/09 16:00:59.820416 Starting Weaveworks NPC 2.3.0; node name "container11"
INFO: 2018/05/09 16:00:59.820625 Serving /metrics on :6781
Wed May  9 16:00:59 2018 <5> ulogd.c:843 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
DEBU: 2018/05/09 16:00:59.836833 Got list of ipsets: [weave-local-pods weave-?b%zl9GIe0AET1(QI^7NWe*fO weave-iuZcey(5DeXbzgRFs8Szo]+@p weave-a4Xwd(HKP4gdUva6MiTEIZAnp weave-z~y01unAQHA]WxHG!ALB)5]}s weave-merwHPWBXl40TI+Z;zAiZs*_y weave-6iyod:sR_fJJ!asxT3lEJWI[d weave-E.1.0W^NGSp]0_t5WwH/]gX@L weave-k?Z;25^M}|1s7P3|H9i;*;MhG weave-U2!Dhw$r^7X%1=}Rnjqffc87_ weave-$@i@JIAK[omT8D]7^N@EPQ={9 weave-cFh|zJj^65nuF3x;YP1O.tN%9 weave-^/BTzYx)2*FacLfNjGa)Wjg=4 weave-0EHD/vdN#O4]V?o4Tx7kS;APH weave-4vtqMI+kx/2]jD%_c0S%thO%V]
DEBU: 2018/05/09 16:00:59.836864 Flushing ipset 'weave-local-pods'
...
INFO: 2018/05/09 16:01:05.590549 adding entry 10.104.238.41 to weave-z~y01unAQHA]WxHG!ALB)5]}s of e639917e-5394-11e8-8e7a-005056bf0013
INFO: 2018/05/09 16:01:05.590580 added entry 10.104.238.41 to weave-z~y01unAQHA]WxHG!ALB)5]}s of e639917e-5394-11e8-8e7a-005056bf0013
INFO: 2018/05/09 16:01:05.591479 adding entry 10.104.238.41 to weave-a4Xwd(HKP4gdUva6MiTEIZAnp of e639917e-5394-11e8-8e7a-005056bf0013
INFO: 2018/05/09 16:01:05.591502 added entry 10.104.238.41 to weave-a4Xwd(HKP4gdUva6MiTEIZAnp of e639917e-5394-11e8-8e7a-005056bf0013
INFO: 2018/05/09 16:39:05.125088 deleting entry 10.104.238.41 from weave-a4Xwd(HKP4gdUva6MiTEIZAnp of e639917e-5394-11e8-8e7a-005056bf0013
INFO: 2018/05/09 16:39:05.125110 deleted entry 10.104.238.41 from weave-a4Xwd(HKP4gdUva6MiTEIZAnp of e639917e-5394-11e8-8e7a-005056bf0013
FATA: 2018/05/09 16:39:05.133155 update pod: ipset [del weave-a4Xwd(HKP4gdUva6MiTEIZAnp 10.104.238.41] failed: ipset v6.32: Element cannot be deleted from the set: it's not added
: exit status 1

It looks to me like it only added once and deleted once from that set.

@brb
Copy link
Contributor

brb commented May 10, 2018

This is very strange. NPC tried to delete 10.104.238.41 only once, and its deletion failed. Could it be that someone else has deleted the element from the ipset (e.g. another weave-npc process started by accident)?

Also, from the full log I see that the user hasn't created any NetworkPolicy, so can I assume that they do not use it? If yes, a quick fix could be to disable the weave-npc from starting when EXPECT_NPC=0.

@bboreham
Copy link
Contributor Author

We did not see any evidence of another weave-npc. We checked a number of nodes, and on every one weave-npc had crashed with the same symptom.

Agreed on the quick workaround; the cloud.weave.works launch generator has an option disable-npc to set EXPECT_NPC=0 and remove the container from the daemonset spec.

@brb
Copy link
Contributor

brb commented May 10, 2018

Is it reproducible? If yes, what is the kernel version (asking, as I can provide a small eBPF program which tracks insertions and deletions to ipset at the kernel level to see whether the entry got actually inserted).

@bboreham
Copy link
Contributor Author

Seemed to happen on every node after a few hours.

Kernel:

Linux kubemaster11 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@bboreham
Copy link
Contributor Author

I wrote a program to go through the ipset lines in the full weave-npc log file and do all the create/add/del operations, and it didn't crash. Ran on Ubuntu 16.04 with 4.13 kernel.

main.go.txt

@murali-reddy
Copy link
Contributor

murali-reddy commented Aug 2, 2018

this seems like related to kernel issue https://bugzilla.netfilter.org/show_bug.cgi?id=1119 that is fixed > 4.11

https://github.com/projectcalico/felix/issues/1347

@bboreham
Copy link
Contributor Author

bboreham commented Aug 2, 2018

Great find!
Should we add a work-around - ignore this error from ipset on kernel < 4.11 ?

@murali-reddy
Copy link
Contributor

Sure. I will add an exception.

@brb
Copy link
Contributor

brb commented Aug 2, 2018

Should we add a work-around - ignore this error from ipset on kernel < 4.11 ?

I don't think that we can work-around this way. According to the bug report, the problem is that ipset delete might evict other ipset members. So, the work-around would be to check whether members stored in the ipset match IP addrs stored internally in weave-npc each time we do any manipulation to the ipset.

@brb
Copy link
Contributor

brb commented Aug 2, 2018

Also, the problem seems to be introduced in the kernel 4.2 and fixed in the 4.11. So, the user's kernel 4.4.0-124-generic #148-Ubuntu falls into this range.

@murali-reddy
Copy link
Contributor

So, the work-around would be to check whether members stored in the ipset match IP addrs stored internally in weave-npc each time we do any manipulation to the ipset.

So are you saying we can run into issue where there may not be any error but still as result of this issue, some entries can get evicted? Which seems even worse scenario

@brb
Copy link
Contributor

brb commented Aug 3, 2018

Yep, removing entry A might evict entry B as well.

@murali-reddy murali-reddy self-assigned this Aug 8, 2018
murali-reddy added a commit that referenced this issue Aug 10, 2018
if the kernel version is in affected range of Kernels, then resync the entries to
expected set of entries.

Fixes #3296 failed: ipset v6.32: Element cannot be deleted from the set: it's not added
murali-reddy added a commit that referenced this issue Aug 10, 2018
if the kernel version is in affected range of Kernels, then resync the entries to
expected set of entries.

Kernel bug: https://bugzilla.netfilter.org/show_bug.cgi?id=1119

Fixes #3296 failed: ipset v6.32: Element cannot be deleted from the set: it's not added
murali-reddy added a commit that referenced this issue Aug 10, 2018
if the kernel version is in affected range of Kernels, then resync the entries to
expected set of entries.

Kernel bug: https://bugzilla.netfilter.org/show_bug.cgi?id=1119

Fixes #3296 failed: ipset v6.32: Element cannot be deleted from the set: it's not added
murali-reddy added a commit that referenced this issue Aug 10, 2018
if the kernel version is in affected range of Kernels, then resync the entries to
expected set of entries.

Kernel bug: https://bugzilla.netfilter.org/show_bug.cgi?id=1119

Fixes #3296 failed: ipset v6.32: Element cannot be deleted from the set: it's not added
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants