Skip to content

Websocket fails to send large amount of data due to problem in ssl_pm_read #3315

Closed
@Davidovory03

Description

@Davidovory03

I tried sending a lot of data continuously. The socket reached a point in which it was blocking and POLLOUT was NEVER back on.
I debugged the code and tried to understand why that happend. I saw that it always happens after SSL_read fails. So I added debug logs and saw that:

<Logs from sending full chunks of data>
LEVEL: [8] line: [...] I: lws_plat_mbedtls_net_recv: EAGAIN or EWOULDBLOCK
LEVEL: [4] line: [...] N: ssl_pm_read: mbedtls_ssl_read says -0x6900
LEVEL: [16] line: [...] D: lws_ssl_capable_read: [wsicli|0|WS/h1/default/....]: SSL_read says -1
LEVEL: [16] line: [...] D: lws_ssl_capable_read: [wsicli|0|WS/h1/default/....]: ssl err 3 errno 22
LEVEL: [8] line: [...] I: lws_ssl_capable_read: WANT_WRITE
<From this point POLLOUT is never active for nothing other than read_wants_write>

You can see that mbedtls_ssl_read fails with -0x6900 (WANT_READ) but lws_ssl_capable_read falsely got WANT_WRITE:

lws_ssl_capable_read(...) {
...
        n = SSL_read(wsi->tls.ssl, buf, (int)len);
...
        if (n < 0) {
		m = SSL_get_error(wsi->tls.ssl, n); /* This uses ssl->err which isn't really set in ssl_pm_read */
	        if (m == SSL_ERROR_WANT_WRITE || SSL_want_write(wsi->tls.ssl)) {    
/* This condition is met because last error was from a blocking send */
			lwsl_info("%s: WANT_WRITE\n", __func__);
			lwsl_debug("%s: LWS_SSL_CAPABLE_MORE_SERVICE\n", lws_wsi_tag(wsi));
			wsi->tls_read_wanted_write = 1;
			lws_callback_on_writable(wsi);
			return LWS_SSL_CAPABLE_MORE_SERVICE;
		}

This leads into the following flow from lws_service_fd_tsi:

	if ((pollfd->revents & LWS_POLLOUT) == LWS_POLLOUT &&
	    wsi->tls_read_wanted_write) {
		/*
		 * If this wsi has a pending WANT_WRITE from SSL_read(), it has
		 * asked for a callback on writeable so it can retry the read.
		 *
		 *  Let's consume the POLLOUT by turning it into a POLLIIN, and
		 *  setting a flag to request a new writeable
		 */
		wsi->tls_read_wanted_write = 0;
		pollfd->revents &= ~(LWS_POLLOUT);
		pollfd->revents |= LWS_POLLIN;
		cow = 1;
	}

This flow disables LWS_POLLOUT.

We got WANT_WRITE instead of WANT_READ because ssl_pm_read doesn't really set the ssl->err (Like ssl_pm_send does). Instead it just returns -1 ignoring the real error.
To fix this you need to add a switch case:

int ssl_pm_read(SSL *ssl, void *buffer, int len)
{
    int ret;
    struct ssl_pm *ssl_pm = (struct ssl_pm *)ssl->ssl_pm;

    ret = mbedtls_ssl_read(&ssl_pm->ssl, buffer, (size_t)len);
    if (ret < 0) {
//	    lwsl_notice("%s: mbedtls_ssl_read says -0x%x\n", __func__, -ret);
        SSL_DEBUG(SSL_PLATFORM_ERROR_LEVEL, "mbedtls_ssl_read() return -0x%x", -ret);
        if (ret == MBEDTLS_ERR_NET_CONN_RESET ||
#if defined(MBEDTLS_VERSION_NUMBER) && MBEDTLS_VERSION_NUMBER >= 0x03000000
	    ret <= MBEDTLS_ERR_SSL_HANDSHAKE_FAILURE) /* fatal errors */
#else
            ret <= MBEDTLS_ERR_SSL_NO_USABLE_CIPHERSUITE) /* fatal errors */
#endif
        {
            ssl->err = SSL_ERROR_SYSCALL;
        }
        switch (ret) {
        case MBEDTLS_ERR_NET_CONN_RESET:
            ssl->err = SSL_ERROR_SYSCALL;
            break;
        case MBEDTLS_ERR_SSL_WANT_WRITE:
            ssl->err = SSL_ERROR_WANT_WRITE;
            break;
        case MBEDTLS_ERR_SSL_WANT_READ:
            ssl->err = SSL_ERROR_WANT_READ;
            break;
        default:
            break;
        }
        ret = -1;
    }

    return ret;
}

After this fix I'm able to send large amount of data without any problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions