Skip to content

OFP TCP Keepalive Timer is not working as expected due to keepalive count (t_keepcnt) is not incremented and validated while processing Keepalive Timer #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
manishmatey opened this issue Aug 8, 2022 · 7 comments

Comments

@manishmatey
Copy link

Hi Team,

When a TCP connection is established, Ideally TCP connection should get terminated when there is no data exchange between client and server for sometime.
TCP will start keepalive timer and if no data exchange happened between client and server for few keepalive packets (ideally 8 to 10) then TCP connection will be dropped.

I did not find anywhere TCP keepalive count is getting incremented and checked while processing TCP keepalive timer.
Can somebody confirm if this a bug in OFP code itself?
OR
Where TCP keepalive count is getting incremented in OFP TCP code?

Any help will be appreciated.

@bogdanPricope
Copy link
Contributor

Hi,

My understanding is that it is validating the time spent (in ticks) versus the max to wait:

ofp_tcp_timer.c
ofp_tcp_timer_keep()
{
......
if ((always_keepalive || (inp->inp_socket->so_options & OFP_SO_KEEPALIVE)) &&
tp->t_state <= TCPS_CLOSING) {
if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >=
TP_KEEPIDLE(tp) + TP_MAXIDLE(tp))
goto dropit;
.........................

I don't use the same codebase ... but what are the values of always_keepalive and inp->inp_socket->so_options for you?

Best regards,
Bogdan

P.S. Please have a look at my work on NFP (my version of ofp): http://www.netinosoft.org
Feedback will be much appreciated.

@manishmatey
Copy link
Author

manishmatey commented Aug 8, 2022

Hi @bogdanPricope

Please check the below values for variables (as requested)
always_keepalive = 1 and inp->inp_socket->so_options =0

In my case, Below if condition never gets TRUE and TCP is resetting the keepalive timer forever.

if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >=
TP_KEEPIDLE(tp) + TP_MAXIDLE(tp))

I have printed all 4 variables below :
[Ticks=775071425 - Recv Time=775065425] = 6000
[Keepidle=720000 + Maxidle=60000] = 780000

if conditions like this 6000 >= 780000 will never gets TRUE soon as Ticks value and Recv Time is keep on incrementing and here comparison with big constant value 780000.

Regards,
Manish

@bogdanPricope
Copy link
Contributor

Hi @manishmatey

I did a little experiment: I changed:

  • keepidle to 5 minutes instead 120 minutes
  • keepintvl to 20 s instead on 75 s
  • add log message
    printf("Ticks: %d, t_rcvtime: %d -> %d vs %d\n",
    nfp_timer_ticks(0), tp->t_rcvtime,
    (int)(nfp_timer_ticks(0) - tp->t_rcvtime),
    TP_KEEPIDLE(tp) + TP_MAXIDLE(tp));

I am getting this:
I 1007 0:788517888 httpd.c:173] accept fd=1
Ticks: 6971, t_rcvtime: 1007 -> 5964 vs 46000
Ticks: 8991, t_rcvtime: 1007 -> 7984 vs 46000
Ticks: 11011, t_rcvtime: 1007 -> 10004 vs 46000
Ticks: 13031, t_rcvtime: 1007 -> 12024 vs 46000
Ticks: 15051, t_rcvtime: 1007 -> 14044 vs 46000
Ticks: 17071, t_rcvtime: 1007 -> 16064 vs 46000
Ticks: 19091, t_rcvtime: 1007 -> 18084 vs 46000
Ticks: 21111, t_rcvtime: 1007 -> 20104 vs 46000
Ticks: 23131, t_rcvtime: 1007 -> 22124 vs 46000
Ticks: 25151, t_rcvtime: 1007 -> 24144 vs 46000
Ticks: 27171, t_rcvtime: 1007 -> 26164 vs 46000
Ticks: 29191, t_rcvtime: 1007 -> 28184 vs 46000
Ticks: 31211, t_rcvtime: 1007 -> 30204 vs 46000
Ticks: 33231, t_rcvtime: 1007 -> 32224 vs 46000
Ticks: 35251, t_rcvtime: 1007 -> 34244 vs 46000
Ticks: 37271, t_rcvtime: 1007 -> 36264 vs 46000
Ticks: 39291, t_rcvtime: 1007 -> 38284 vs 46000
Ticks: 41311, t_rcvtime: 1007 -> 40304 vs 46000
Ticks: 43331, t_rcvtime: 1007 -> 42324 vs 46000
Ticks: 45351, t_rcvtime: 1007 -> 44344 vs 46000
Ticks: 47371, t_rcvtime: 1007 -> 46364 vs 46000
tcp drop returned: 0x41c07048!!!

That is, the connection was dropped after the specified time. Note that t_rcvtime remains constant as there is no traffic from the other side....

Now, you may add similar log messages and check if the connection is dropped in your case.

@manishmatey
Copy link
Author

Hi @bogdanPricope

I have few queries on the above test :

  1. Is t_rcvtime will not get updated for Keepalive response packet received ?

  2. Are you dropping keepalive packet responses in above test ?

Regards,
Manish

@bogdanPricope
Copy link
Contributor

Hi @manishmatey

My understanding is that this the case were the remote device becomes unaccessible due to connectivity issue or remote has crashed. To simulate this case I shut down the network interface of the remote device.. that is, there are NO keepalive responses (received or sent).

I don't understand your point: you are actively using keepalive mechanism to keep the connection up ... if you have responses it means the connection is up and should not be terminated...

@manishmatey
Copy link
Author

manishmatey commented Aug 17, 2022

HI @bogdanPricope ,

Thanks for the reply so below is my understanding :
When there is no reply of TCP keepalive packets then based on below if condition TCP connection will be dropped

if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >= TP_KEEPIDLE(tp) + TP_MAXIDLE(tp))

and keepalive count variable t_keepcnt is not used anywhere to drop the TCP connection.
Currently I am seeing TCP keepalive packets are getting exchanged in every 1 minute due to condition [if (delta > 6000) delta = 6000;]. Please check the code below:

File : ofp_tcp_timer.c
Function : ofp_tcp_timer_activate(struct tcpcb *tp, int timer_type, uint32_t delta)
case TT_KEEP:
if (delta > 6000) delta = 6000;
t_callout = &tp->t_timers->tt_keep;
f_callout = ofp_tcp_timer_keep;
break;

If I comment the above if condition [if (delta > 6000) delta = 6000;] then keepalive probe starting after 2 hours. Is this condition is having issue? I have just commented this code as workaround.

Is this above if condition [if (delta > 6000) delta = 6000;] has issue?
Regards,
Manish Tiwari

@bogdanPricope
Copy link
Contributor

Hi @manishmatey

I get that in your case the remote side is still accessible ... it just stopped sending "payload" data.

What you are seeing is that keepalive messages are sent every minute despite having a TCP_KEEPIDLE ("The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes") of 120 minutes.
This is (as far as I understand) a bug (probably caused by an workaround to the fact that OFP was not supporting long timers).
This 'if (delta > 6000) delta = 6000;' is not in FreeBSD ... so, probably is an OFP addition...

However, even without he unwanted keepalives the behavior will not change: after 120 minutes (TCP_KEEPIDLE) OFP will send a keepalive and the remote side will answer: t_rcvtime will be updated and connection will not be dropped.

I am not a TCP expert but my understanding is that this is the expected behavior for the keepalive mechanism..... (is not meant to monitor TCP payload traffic but only if the remote is alive and on the remote side the connection is still active (was not closed, etc.))

Regards,
Bogdan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants