-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
OSError: [Errno 9] Bad file descriptor with uvloop #10506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also having trouble with file descriptors on the latest versions of BlockingIOError: [Errno 115] Operation now in progress
File "asyncio/selector_events.py", line 509, in _sock_connect
sock.connect(address)
RuntimeError: File descriptor 265 is used by transport <_SelectorSocketTransport fd=265 read=polling write=<idle, bufsize=0>>
File "usr/local/lib/python3.10/site-packages/<internal library>.py", line 53, in mpub_asyncio
async with self.session_asyncio.post(f"{self.url}/mpub", timeout=timeout, **kwargs) as r:
File "aiohttp/client.py", line 1425, in __aenter__
self._resp: _RetType = await self._coro
File "ddtrace/contrib/trace_utils_async.py", line 35, in wrapper
return await func(mod, pin, wrapped, instance, args, kwargs)
File "ddtrace/contrib/aiohttp/patch.py", line 108, in _traced_clientsession_request
resp = await func(*args, **kwargs) # type: aiohttp.ClientResponse
File "aiohttp/client.py", line 703, in _request
conn = await self._connector.connect(
File "ddtrace/contrib/aiohttp/patch.py", line 62, in connect
result = await self.__wrapped__.connect(req, *args, **kwargs)
File "aiohttp/connector.py", line 548, in connect
proto = await self._create_connection(req, traces, timeout)
File "aiohttp/connector.py", line 1056, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
File "aiohttp/connector.py", line 1380, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
File "aiohttp/connector.py", line 1116, in _wrap_create_connection
sock = await aiohappyeyeballs.start_connection(
File "aiohappyeyeballs/impl.py", line 93, in start_connection
raise first_exception
File "aiohappyeyeballs/impl.py", line 71, in start_connection
sock = await _connect_sock(
File "aiohappyeyeballs/impl.py", line 163, in _connect_sock
await loop.sock_connect(sock, address)
File "asyncio/selector_events.py", line 499, in sock_connect
self._sock_connect(fut, sock, address)
File "asyncio/selector_events.py", line 515, in _sock_connect
self._ensure_fd_no_transport(fd)
File "asyncio/selector_events.py", line 248, in _ensure_fd_no_transport
raise RuntimeError( |
@mklokocka Are you using uvloop as well? |
|
The trace back you posted has uvloop in it. |
I'm talking about @mklokocka comment |
@mklokocka Does the problem still happen if you disable the patching of the internals in https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/contrib/internal/aiohttp/patch.py ? The previous issues thatw ere reported focused on the Which version of |
related aiohttp issue aio-libs/aiohttp#10506 In #101 we replaced the staggered race implementation since the cpython version had races that were not fixed at the time. cpython has since updated the implementation to fix additional races. Our current implementation still has problems with cancellation and cpython has fixed that in python/cpython#128475 and python/cpython#124847 This PR ports the latest cpython implementation
Please update to |
@bdraco Thanks, the project ran for a few hours with the latest |
Thanks. If the problem reoccurs, let me know and we can reopen |
@bdraco Hello! We've updated aiohappyeyeballs to 2.6.1 and it haven't helped with errors from my original message. |
I've dug though the aiohappyeyeballs code again, and can find no place where we can work around this. so for uvloop, we are waiting for MagicStack/uvloop#646 to get fixed upstream In the mean time you might be able to get away with disabling happyeyeballs on the connector if you don't have any use cases where you have non working IPv6 https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.TCPConnector
|
Hi @bdraco! We've tried setting |
Disabling happy eyeballs will only reduce the chance the problem can happen with uvloop. There isn't anything we can do further but wait for uvloop to fix the issue. |
In our case, all the domains we make requests to with aiohttp resolve to a single IPv4. Perhaps disabling Happy Eyeballs couldn't help us because Happy Eyeballs isn't used in the first place? One thought keeps really bothering me – why don't we see any similar errors on aiohttp 3.9.5, but on 3.10-3.11.13, with Happy Eyeballs disabled, we do? As if something else has changed. |
#10464 was probably the most relevant change in 3.11.13 |
One difference between how It probably makes sense to re-raise the OSError from the |
…ose connector socket This is a followup to #10464 to handle the case where socket.close() can also raise. This matches the logic we have in aiohappyeyeballs: https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227 fixes #10506
@tkukushkin Can you give #10551 a try? |
…close connector socket (#10551) <!-- Thank you for your contribution! --> ## What do these changes do? This is a followup to #10464 to handle the case where `socket.close()` can also raise. This matches the logic we have in aiohappyeyeballs: https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227 We shouldn't raising `OSError` externally from this method as callers expect a `ClientError` ## Are there changes in behavior for the user? bugfix ## Is it a substantial burden for the maintainers to support this? no ## Related issue number fixes #10506 ## Checklist - [x] I think the code is well written - [x] Unit tests for the changes exist - [x] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES/` folder * name it `<issue_or_pr_num>.<type>.rst` (e.g. `588.bugfix.rst`) * if you don't have an issue number, change it to the pull request number after creating the PR * `.bugfix`: A bug fix for something the maintainers deemed an improper undesired behavior that got corrected to match pre-agreed expectations. * `.feature`: A new behavior, public APIs. That sort of stuff. * `.deprecation`: A declaration of future API removals and breaking changes in behavior. * `.breaking`: When something public is removed in a breaking way. Could be deprecated in an earlier release. * `.doc`: Notable updates to the documentation structure or build process. * `.packaging`: Notes for downstreams about unobvious side effects and tooling. Changes in the test invocation considerations and runtime assumptions. * `.contrib`: Stuff that affects the contributor experience. e.g. Running tests, building the docs, setting up the development environment. * `.misc`: Changes that are hard to assign to any of the above categories. * Make sure to use full sentences with correct case and punctuation, for example: ```rst Fixed issue with non-ascii contents in doctest text files -- by :user:`contributor-gh-handle`. ``` Use the past tense or the present tense a non-imperative mood, referring to what's changed compared to the last released version of this project. (cherry picked from commit d067260)
…close connector socket (#10551) <!-- Thank you for your contribution! --> ## What do these changes do? This is a followup to #10464 to handle the case where `socket.close()` can also raise. This matches the logic we have in aiohappyeyeballs: https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227 We shouldn't raising `OSError` externally from this method as callers expect a `ClientError` ## Are there changes in behavior for the user? bugfix ## Is it a substantial burden for the maintainers to support this? no ## Related issue number fixes #10506 ## Checklist - [x] I think the code is well written - [x] Unit tests for the changes exist - [x] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES/` folder * name it `<issue_or_pr_num>.<type>.rst` (e.g. `588.bugfix.rst`) * if you don't have an issue number, change it to the pull request number after creating the PR * `.bugfix`: A bug fix for something the maintainers deemed an improper undesired behavior that got corrected to match pre-agreed expectations. * `.feature`: A new behavior, public APIs. That sort of stuff. * `.deprecation`: A declaration of future API removals and breaking changes in behavior. * `.breaking`: When something public is removed in a breaking way. Could be deprecated in an earlier release. * `.doc`: Notable updates to the documentation structure or build process. * `.packaging`: Notes for downstreams about unobvious side effects and tooling. Changes in the test invocation considerations and runtime assumptions. * `.contrib`: Stuff that affects the contributor experience. e.g. Running tests, building the docs, setting up the development environment. * `.misc`: Changes that are hard to assign to any of the above categories. * Make sure to use full sentences with correct case and punctuation, for example: ```rst Fixed issue with non-ascii contents in doctest text files -- by :user:`contributor-gh-handle`. ``` Use the past tense or the present tense a non-imperative mood, referring to what's changed compared to the last released version of this project. (cherry picked from commit d067260)
…ionError when failing to explicitly close connector socket (#10561) **This is a backport of PR #10551 as merged into master (d067260).** <!-- Thank you for your contribution! --> ## What do these changes do? This is a followup to #10464 to handle the case where `socket.close()` can also raise. This matches the logic we have in aiohappyeyeballs: https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227 We shouldn't raising `OSError` externally from this method as callers expect a `ClientError` ## Are there changes in behavior for the user? bugfix ## Is it a substantial burden for the maintainers to support this? no ## Related issue number fixes #10506 ## Checklist - [x] I think the code is well written - [x] Unit tests for the changes exist - [x] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES/` folder * name it `<issue_or_pr_num>.<type>.rst` (e.g. `588.bugfix.rst`) * if you don't have an issue number, change it to the pull request number after creating the PR * `.bugfix`: A bug fix for something the maintainers deemed an improper undesired behavior that got corrected to match pre-agreed expectations. * `.feature`: A new behavior, public APIs. That sort of stuff. * `.deprecation`: A declaration of future API removals and breaking changes in behavior. * `.breaking`: When something public is removed in a breaking way. Could be deprecated in an earlier release. * `.doc`: Notable updates to the documentation structure or build process. * `.packaging`: Notes for downstreams about unobvious side effects and tooling. Changes in the test invocation considerations and runtime assumptions. * `.contrib`: Stuff that affects the contributor experience. e.g. Running tests, building the docs, setting up the development environment. * `.misc`: Changes that are hard to assign to any of the above categories. * Make sure to use full sentences with correct case and punctuation, for example: ```rst Fixed issue with non-ascii contents in doctest text files -- by :user:`contributor-gh-handle`. ``` Use the past tense or the present tense a non-imperative mood, referring to what's changed compared to the last released version of this project. Co-authored-by: J. Nick Koston <[email protected]>
…ionError when failing to explicitly close connector socket (#10562) **This is a backport of PR #10551 as merged into master (d067260).** <!-- Thank you for your contribution! --> ## What do these changes do? This is a followup to #10464 to handle the case where `socket.close()` can also raise. This matches the logic we have in aiohappyeyeballs: https://github.com/aio-libs/aiohappyeyeballs/blob/e3bd5bdf44f5d187802de6dcb08d27e1ca6da048/src/aiohappyeyeballs/impl.py#L227 We shouldn't raising `OSError` externally from this method as callers expect a `ClientError` ## Are there changes in behavior for the user? bugfix ## Is it a substantial burden for the maintainers to support this? no ## Related issue number fixes #10506 ## Checklist - [x] I think the code is well written - [x] Unit tests for the changes exist - [x] Documentation reflects the changes - [x] If you provide code modification, please add yourself to `CONTRIBUTORS.txt` * The format is <Name> <Surname>. * Please keep alphabetical order, the file is sorted by names. - [x] Add a new news fragment into the `CHANGES/` folder * name it `<issue_or_pr_num>.<type>.rst` (e.g. `588.bugfix.rst`) * if you don't have an issue number, change it to the pull request number after creating the PR * `.bugfix`: A bug fix for something the maintainers deemed an improper undesired behavior that got corrected to match pre-agreed expectations. * `.feature`: A new behavior, public APIs. That sort of stuff. * `.deprecation`: A declaration of future API removals and breaking changes in behavior. * `.breaking`: When something public is removed in a breaking way. Could be deprecated in an earlier release. * `.doc`: Notable updates to the documentation structure or build process. * `.packaging`: Notes for downstreams about unobvious side effects and tooling. Changes in the test invocation considerations and runtime assumptions. * `.contrib`: Stuff that affects the contributor experience. e.g. Running tests, building the docs, setting up the development environment. * `.misc`: Changes that are hard to assign to any of the above categories. * Make sure to use full sentences with correct case and punctuation, for example: ```rst Fixed issue with non-ascii contents in doctest text files -- by :user:`contributor-gh-handle`. ``` Use the past tense or the present tense a non-imperative mood, referring to what's changed compared to the last released version of this project. Co-authored-by: J. Nick Koston <[email protected]>
@bdraco Hello! Sorry for not answering you, you made the release faster than I was able to check it. We still see these errors with aiohttp 3.11.16 and aiohappyeyeballs 2.6.1.
And this is still relevant, everything works perfectly with 3.9.5 and we have a lot of errors with any version after. Once we had 700 such errors in 1 minute from one container. I understand that root cause of the problem is inside uvloop, but I still don't understand why aiohttp 3.9.5 works fine. |
@tkukushkin The problem only happens with 3.11 Line 1122 in 98add82
3.9 Line 1025 in e057906
|
Got it, thanks a lot! Is it possible to work around this problem somehow? Because 700 errors in one minute is too much. |
I haven’t tested this, but something along these lines might help work around the issue. That said, I’m hesitant to take another stab at shipping a diff --git a/aiohttp/connector.py b/aiohttp/connector.py
index 08e6ae275..46386a987 100644
--- a/aiohttp/connector.py
+++ b/aiohttp/connector.py
@@ -1123,6 +1123,15 @@ class TCPConnector(BaseConnector):
async with ceil_timeout(
timeout.sock_connect, ceil_threshold=timeout.ceil_threshold
):
+ if self._happy_eyeballs_delay is None:
+ first_addr_infos = addr_infos[0]
+ address_tuple = first_addr_infos[4]
+ host: str = address_tuple[0]
+ port: int = address_tuple[1]
+ return await self._loop.create_connection(
+ host, port, *args, **kwargs
+ )
+ else:
sock = await aiohappyeyeballs.start_connection(
addr_infos=addr_infos,
local_addr_infos=self._local_addr_infos,
@@ -1131,7 +1140,9 @@ class TCPConnector(BaseConnector):
loop=self._loop,
socket_factory=self._socket_factory,
)
- return await self._loop.create_connection(*args, **kwargs, sock=sock)
+ return await self._loop.create_connection(
+ *args, **kwargs, sock=sock
+ )
except cert_errors as exc:
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
except ssl_errors as exc:
@@ -1324,7 +1335,10 @@ class TCPConnector(BaseConnector):
)
except (ClientConnectorError, asyncio.TimeoutError) as exc:
last_exc = exc
- aiohappyeyeballs.pop_addr_infos_interleave(addr_infos, self._interleave)
+ aiohappyeyeballs.pop_addr_infos_interleave(
+ addr_infos,
+ None if self._happy_eyeballs_delay is None else self._interleave,
+ )
continue
if req.is_ssl() and fingerprint: |
I would really like to test this workaround, but I need wheels. Maybe it's possible to make a test release (like a release candidate), so it will not be delivered to people who will not request this specific test version? IMHO it could be a good approach to test other fixes in future as well. |
It would take me a couple of hours to put something like that together, since I’d need to adjust the existing tests. I’ve already spent a few weeks trying to work around this issue in uvloop, so it’s a bit discouraging to keep pushing on it—especially when the proposed fix hasn’t gotten any attention upstream. |
Thank you so much for your efforts trying to work around this issue on the aiohttp side. |
I'll leave this open, and when I have some cycles, I'll take a look at it again. |
Also, in the meantime, I added a note to the top of this issue and pinned it, in the hopes that someone has a relationship with the uvloop maintainers, who will see this issue, and might be able to help get the fix moving upstream. |
Also, since you're using Python 3.13, have you considered disabling uvloop? |
Yes, I've tried to disable uvloop, but the performance impact was significant enough in our case. I have three ideas at the moment:
|
One and two are probably decent strategies, I wouldn't do three because you're gonna end up in a panic the next time there's some type of vulnerability that has to be fixed. |
Yes, I agree, and we already need some fixes from newer releases in one of our services. |
Hi! Today we've tried to monkeypatch aiohttp like that: import asyncio
from typing import Any
import aiohappyeyeballs
from aiohttp import ClientConnectorError, ClientRequest, ClientTimeout, TCPConnector
from aiohttp.client_exceptions import ClientConnectorCertificateError, ClientConnectorSSLError, cert_errors, ssl_errors
from aiohttp.client_proto import ResponseHandler
from aiohttp.helpers import ceil_timeout
async def _wrap_create_connection(
self: TCPConnector,
*args: Any,
addr_infos: list[aiohappyeyeballs.AddrInfoType],
req: ClientRequest,
timeout: ClientTimeout,
client_error: type[Exception] = ClientConnectorError,
**kwargs: Any,
) -> tuple[asyncio.Transport, ResponseHandler]:
try:
async with ceil_timeout(timeout.sock_connect, ceil_threshold=timeout.ceil_threshold):
first_addr_infos = addr_infos[0]
address_tuple = first_addr_infos[4]
host: str = address_tuple[0]
port: int = address_tuple[1]
return await self._loop.create_connection(*args, host=host, port=port, **kwargs) # type: ignore[call-overload]
except cert_errors as exc:
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
except ssl_errors as exc:
raise ClientConnectorSSLError(req.connection_key, exc) from exc
except OSError as exc:
if exc.errno is None and isinstance(exc, asyncio.TimeoutError):
raise
raise client_error(req.connection_key, exc) from exc
TCPConnector._wrap_create_connection = _wrap_create_connection # type: ignore[method-assign] And it perfectly works. Haven't seen any errors about bad file descriptors for 3 hours. Before that, we saw at least several errors every minute. |
That's great, let me know if it's still working well at the end of the week. The connection order probably isn't ideal from the address list, but it may already be sorted. More testing would be required to make sure we could ship a change like that. |
Looks like the uvloop PR might be moving along. 🤞 |
I hope so. By the way, our monkey patched version of aiohttp still works great in production. We don't have IPv6 in production, but we use aiohttp for tests too, and
|
Pin aiohttp to 3.9.5 (downgrade). This is a workaround to aio-libs/aiohttp#10506, which causes bad sockets (especially under load).
Note: we are waiting for MagicStack/uvloop#646 to get fixed upstream
Describe the bug
Hello! After updating aiohttp to 3.11.13 we see new errors.
To Reproduce
I don't know how to reproduce these errors, but it seems they are somehow connected with cancellation.
Expected behavior
No errors
Logs/tracebacks
Python Version
3.13.1
aiohttp Version
3.11.13
multidict Version
6.1.0
propcache Version
0.3.0
yarl Version
1.18.3
OS
Debian 12.9
Related component
Client
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: