Description
I tried sending a lot of data continuously. The socket reached a point in which it was blocking and POLLOUT was NEVER back on.
I debugged the code and tried to understand why that happend. I saw that it always happens after SSL_read fails. So I added debug logs and saw that:
<Logs from sending full chunks of data>
LEVEL: [8] line: [...] I: lws_plat_mbedtls_net_recv: EAGAIN or EWOULDBLOCK
LEVEL: [4] line: [...] N: ssl_pm_read: mbedtls_ssl_read says -0x6900
LEVEL: [16] line: [...] D: lws_ssl_capable_read: [wsicli|0|WS/h1/default/....]: SSL_read says -1
LEVEL: [16] line: [...] D: lws_ssl_capable_read: [wsicli|0|WS/h1/default/....]: ssl err 3 errno 22
LEVEL: [8] line: [...] I: lws_ssl_capable_read: WANT_WRITE
<From this point POLLOUT is never active for nothing other than read_wants_write>
You can see that mbedtls_ssl_read
fails with -0x6900 (WANT_READ) but lws_ssl_capable_read
falsely got WANT_WRITE
:
lws_ssl_capable_read(...) {
...
n = SSL_read(wsi->tls.ssl, buf, (int)len);
...
if (n < 0) {
m = SSL_get_error(wsi->tls.ssl, n); /* This uses ssl->err which isn't really set in ssl_pm_read */
if (m == SSL_ERROR_WANT_WRITE || SSL_want_write(wsi->tls.ssl)) {
/* This condition is met because last error was from a blocking send */
lwsl_info("%s: WANT_WRITE\n", __func__);
lwsl_debug("%s: LWS_SSL_CAPABLE_MORE_SERVICE\n", lws_wsi_tag(wsi));
wsi->tls_read_wanted_write = 1;
lws_callback_on_writable(wsi);
return LWS_SSL_CAPABLE_MORE_SERVICE;
}
This leads into the following flow from lws_service_fd_tsi
:
if ((pollfd->revents & LWS_POLLOUT) == LWS_POLLOUT &&
wsi->tls_read_wanted_write) {
/*
* If this wsi has a pending WANT_WRITE from SSL_read(), it has
* asked for a callback on writeable so it can retry the read.
*
* Let's consume the POLLOUT by turning it into a POLLIIN, and
* setting a flag to request a new writeable
*/
wsi->tls_read_wanted_write = 0;
pollfd->revents &= ~(LWS_POLLOUT);
pollfd->revents |= LWS_POLLIN;
cow = 1;
}
This flow disables LWS_POLLOUT.
We got WANT_WRITE instead of WANT_READ because ssl_pm_read
doesn't really set the ssl->err
(Like ssl_pm_send
does). Instead it just returns -1 ignoring the real error.
To fix this you need to add a switch case:
int ssl_pm_read(SSL *ssl, void *buffer, int len)
{
int ret;
struct ssl_pm *ssl_pm = (struct ssl_pm *)ssl->ssl_pm;
ret = mbedtls_ssl_read(&ssl_pm->ssl, buffer, (size_t)len);
if (ret < 0) {
// lwsl_notice("%s: mbedtls_ssl_read says -0x%x\n", __func__, -ret);
SSL_DEBUG(SSL_PLATFORM_ERROR_LEVEL, "mbedtls_ssl_read() return -0x%x", -ret);
if (ret == MBEDTLS_ERR_NET_CONN_RESET ||
#if defined(MBEDTLS_VERSION_NUMBER) && MBEDTLS_VERSION_NUMBER >= 0x03000000
ret <= MBEDTLS_ERR_SSL_HANDSHAKE_FAILURE) /* fatal errors */
#else
ret <= MBEDTLS_ERR_SSL_NO_USABLE_CIPHERSUITE) /* fatal errors */
#endif
{
ssl->err = SSL_ERROR_SYSCALL;
}
switch (ret) {
case MBEDTLS_ERR_NET_CONN_RESET:
ssl->err = SSL_ERROR_SYSCALL;
break;
case MBEDTLS_ERR_SSL_WANT_WRITE:
ssl->err = SSL_ERROR_WANT_WRITE;
break;
case MBEDTLS_ERR_SSL_WANT_READ:
ssl->err = SSL_ERROR_WANT_READ;
break;
default:
break;
}
ret = -1;
}
return ret;
}
After this fix I'm able to send large amount of data without any problem.