-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mockttp stops responding to requests after random amount of time/requests... #190
Comments
Alright so apparently the proxy still correctly receives the connection as the debug keeps outputting: Handling request for https://x.com/... But nothing else is shown, the beforeResponse is not called for sure... Restarting the server with mockServer.stop() and reruning the setup scripts makes the proxy server working again. I have found that the "abort" event is being fired multiple times. There is no error in the object passed but I found a tag is added: 'passthrough-error:ECONNRESET' This looks to be happening when the clients are showing that HTTP2 error. But I do not understand which end is actually "bailing out". The "abort" is also happening when the requests are hanging, but the behaviour on the browser is different as already state it just hands in connecting state. The client is set to timeout and retry the call every 10 seconds so does not receive http error. If that can be of any help, I've managed to grab a netlog from chromium, here's the part of interest I think: t= 93370 [st= 52538] HTTP2_SESSION_RECV_RST_STREAM I don't see any other hints to errors. The complete log is too big for here, I'd be happy to provide it if needed. When the proxy stops responding, there is nothing displayed in the netlog besides the connection resets (aborts) that are the expected behaviour for requests longer than 10sec: t=123661 [st= 82829] HTTP2_SESSION_SEND_RST_STREAM I kept digging more, I have tried to listen to the internal http2 error events to see if I could get something: // eslint-disable-next-line @typescript-eslint/ban-ts-comment Digging into the source code, this seemed like it would propagate this to polyglot which I tested using connect event first. But nothing got logged. The aborts are different when it is a one-time http2 error than when the server stops responding. The Another hint I had was about the volume of the sessions and requests done by the clients, as the frequency of the errors increases dramatically with the amount of clients the app opens. I have tryied playing with the polyglot internal code to add Note that this should not be a problem from the target server as everything works perfectly when not using the proxy. I am now thinking this has to do inside the passThrough part of the codes... |
This is very interesting! I'm certainly keen to fix any issues around these sorts of use cases, but it's likely that Mockttp isn't normally getting stressed this way, so there might be some issues here to fix. To summarize what it sounds like you're seeing:
Is that all correct? I think you can safely ignore the ECONNRESET/RST_STREAM errors before the failure. It's not unusual for some upstream server connections to occasionally reset (or web pages to reference servers that don't even exist) and when they do Mockttp intentionally triggers downstream connection errors to simulate the same failure as closely as possible. That's not a problem. I think you probably also don't need to worry about debugging httpolyglot and related behaviour - if the debug line is appearing, then the downstream connection does seem to be working, so this would just be a problem with the passthrough logic. It sounds like this is some issue with our upstream connections, which stops new requests being sent after some time. It would be helpful to know:
In general, if the request is being received (appearing in the debug log) the only risky things I can think of that happen before the beforeRequest handler is called are DNS (which will usually come from the cache anyway) and receiving & buffering for the incoming request body (when |
Hello pimterry You are correct on what you understood but for one point: It is not the beforeRequest that does not get called but beforeResponse. As for ignoring the ECONNRESET/RST_STREAM I understand what you mean, but in this exact case, it never happened when not passing through the proxy. Maybe there are retry mechanics backed in chromium that are not in the proxy tough. You also are right about httpolyglot, after testing bigger settings and still seeing the problem I concluded the same as you, the problem is certainly coming from upstream. As for your 4 questions, answered in order:
What I think happens is that upstream is not HTTP/2 (I couldn't really find/understand the passthrough code but did not see any dependency for the http2 node module. This would mean the upstream server is getting bombarded by those 300 connection requests per second which could potential activate DOS protections. But then again, if it was something like that I'd expect to be blacklisted from the server for some time, but everything keeps working if I bypass the proxy. Maybe an unhandled/silenced error when the upstream server closes connection unexpectedly? |
Ok, that makes it clear that this is happening in the upstream connection & request process itself then. In this case, you might find it interesting to listen to
It probably is HTTP/2 upstream. If the incoming client connection is HTTP/2, the upstream connection is made using http2-wrapper which handles automatically negotiating use of HTTP/2 or HTTP/1 depending on what the servers supports.
That's very helpful, and suggests this is related to the upstream HTTP/2 connections specifically. An interesting test would be to manually set It would be very interesting to know if this works, since that would make it clear that it's an issue in either http2-wrapper, our agent configuration for that, or Node's HTTP/2 module itself.
It depends who you're talking to, but it's possible that they'd blacklist the IP + TLS fingerprint together, and Mockttp will have a different fingerprint from your other client. That's not common but it wouldn't be particularly surprising. That means it's not impossible that this is a DOS protection issue (but it's still not clear). Doing a very quick restart of the proxy and checking if the issue continues would confirm this though. |
Without restarting the process, just calling stop and start again works makes the proxy work again. Unfortunately I have ran out of time to fix this case on our project and we started using a fallback option which for now covers our needs. I kept everything and will try to provide you with a repro when I have a bit more time. |
Hello there, I am sorry in advance for the lack of knowledge/information about the problems I am facing but I'm coming here to mostly get help in digging this further in hope to return the favor back by helping find/fix a bug...
Here's the context:
At some points, the chromium instances will stop being able to connect, the xhr requests being stuck at connecting. I am not sure if this part is about connecting to the proxy or the upstream.
I've seen it happen a couple times, sometimes 5 minutes after starting the app, sometimes it would be able to run half a day.
I don't see any errors on the node console which makes me really hesitant as to where to problem comes from. Listening on tls-client-error and client-error produces no logs either.
To make sure this was not related with other parts of the app, when the server stops responding to the chromium instances, I opened up a chrome window with the proxy set up and I could access other websites but not the targeted domain in HTTPS. The domain was still working on HTTP but not https which makes me believe it is something regarding TLS.
On a side note, we get rare ERR_HTTP2_PROTOCOL_ERROR errors on the chromium side. I am not sure if this is linked in any means but when we do receive this error, the next requests continue working.
Here's the code we are using:
Please note that debug was not turned on until 1 hours ago. And since then, the proxy has been working very fine. I will report back if the debug logs provide anything when a problem arises.
The text was updated successfully, but these errors were encountered: