Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mockttp stops responding to requests after random amount of time/requests... #190

Open
Salketer opened this issue Mar 21, 2025 · 5 comments

Comments

@Salketer
Copy link

Salketer commented Mar 21, 2025

Hello there, I am sorry in advance for the lack of knowledge/information about the problems I am facing but I'm coming here to mostly get help in digging this further in hope to return the favor back by helping find/fix a bug...

Here's the context:

  • Mockttp is used in an app that crawls the web using chromium with --proxy-server switch.
  • We use mockttp as a mitm proxy to see requests/responses.
  • The app is running between 20 to 60 chromium instances, are all connecting to the proxy.
  • The pages visited by the chromium instances are polling in xhr POST some data every second.
  • Everything happens on the same domain, with rules set for that particular domain.

At some points, the chromium instances will stop being able to connect, the xhr requests being stuck at connecting. I am not sure if this part is about connecting to the proxy or the upstream.

I've seen it happen a couple times, sometimes 5 minutes after starting the app, sometimes it would be able to run half a day.

I don't see any errors on the node console which makes me really hesitant as to where to problem comes from. Listening on tls-client-error and client-error produces no logs either.

To make sure this was not related with other parts of the app, when the server stops responding to the chromium instances, I opened up a chrome window with the proxy set up and I could access other websites but not the targeted domain in HTTPS. The domain was still working on HTTP but not https which makes me believe it is something regarding TLS.

On a side note, we get rare ERR_HTTP2_PROTOCOL_ERROR errors on the chromium side. I am not sure if this is linked in any means but when we do receive this error, the next requests continue working.

Here's the code we are using:

const mockServer = getLocal({
 http2: true,
 suggestChanges: false,
 recordTraffic: false,
 debug: true,
 https: {
   keyPath: "./testCA.key",
   certPath: "./testCA.pem",
 },
});
const beforeRequest = (req: CompletedRequest) => {
 const ua = req.headers["user-agent"].split("]");
 req.tags[0] = ua[0];
 const headers = req.headers;
 headers["user-agent"] = ua[1];
 return {
   headers,
 };
};
mockServer
   .forAnyRequest()
   .forHost("x.com")
   .withUrlMatching(/\.cgi$/)
   .thenPassThrough({
     beforeRequest,
     beforeResponse: (res, req) => {
       try {
         res.body
           .getText()
           .then((b) => {
             const agent = Arbiter.agents.get(req.tags[0]);
             if (!agent) {
               console.error("Agent not found", req.tags[0], req);
               return;
             }
             agent.on('response',b);
           })
           .catch(console.error);
       } catch (e) {
         console.error(e);
       }
     },
   });
mockServer
   .forUnmatchedRequest()
   .forHost("x.com")
   .thenPassThrough({
     beforeRequest,
   });
 mockServer.forUnmatchedRequest().thenPassThrough();

Please note that debug was not turned on until 1 hours ago. And since then, the proxy has been working very fine. I will report back if the debug logs provide anything when a problem arises.

@Salketer
Copy link
Author

Salketer commented Mar 21, 2025

Alright so apparently the proxy still correctly receives the connection as the debug keeps outputting:

Handling request for https://x.com/...
Request matched rule: Match requests for anything, for host x.com, and matching URL /.cgi$/, and then pass the request through to the target host.

But nothing else is shown, the beforeResponse is not called for sure...

Restarting the server with mockServer.stop() and reruning the setup scripts makes the proxy server working again.

I have found that the "abort" event is being fired multiple times. There is no error in the object passed but I found a tag is added: 'passthrough-error:ECONNRESET'

This looks to be happening when the clients are showing that HTTP2 error. But I do not understand which end is actually "bailing out".

The "abort" is also happening when the requests are hanging, but the behaviour on the browser is different as already state it just hands in connecting state. The client is set to timeout and retry the call every 10 seconds so does not receive http error.

If that can be of any help, I've managed to grab a netlog from chromium, here's the part of interest I think:

t= 93370 [st= 52538] HTTP2_SESSION_RECV_RST_STREAM
--> error_code = "2 (INTERNAL_ERROR)"
--> stream_id = 1203

I don't see any other hints to errors. The complete log is too big for here, I'd be happy to provide it if needed.

When the proxy stops responding, there is nothing displayed in the netlog besides the connection resets (aborts) that are the expected behaviour for requests longer than 10sec:

t=123661 [st= 82829] HTTP2_SESSION_SEND_RST_STREAM
--> description = ""
--> error_code = "8 (CANCEL)"
--> stream_id = 2397

I kept digging more, I have tried to listen to the internal http2 error events to see if I could get something:

// eslint-disable-next-line @typescript-eslint/ban-ts-comment
//@ts-ignore
mockServer.server.on("error", console.log);
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
//@ts-ignore
mockServer.server.on("frameError", console.log);

Digging into the source code, this seemed like it would propagate this to polyglot which I tested using connect event first. But nothing got logged.

The aborts are different when it is a one-time http2 error than when the server stops responding. The passthrough-error:ECONNRESET is only present on the one-time errors, not in the aborts that are produced when everything stops working.

Another hint I had was about the volume of the sessions and requests done by the clients, as the frequency of the errors increases dramatically with the amount of clients the app opens. I have tryied playing with the polyglot internal code to add {maxSessionMemory:1000,peerMaxConcurrentStreams:1000} as suggested here vitejs/vite#6207 where the users were seeing the same kind of behaviour when projects were getting bigger.

Note that this should not be a problem from the target server as everything works perfectly when not using the proxy. I am now thinking this has to do inside the passThrough part of the codes...

@pimterry
Copy link
Member

This is very interesting! I'm certainly keen to fix any issues around these sorts of use cases, but it's likely that Mockttp isn't normally getting stressed this way, so there might be some issues here to fix.

To summarize what it sounds like you're seeing:

  • At some point Chrome stops being able to connect over HTTPS (but HTTP still works fine)
  • Once that happens, it sounds like from the netlog that Chrome is trying to send the requests and they're cancelling after 10 seconds with no response, is that right?
  • When the failure happens, frequent abort events fire from Mockttp.
  • With debug output enabled, you can see that requests are still being received, but they're not reaching beforeRequest or doing anything else.

Is that all correct?

I think you can safely ignore the ECONNRESET/RST_STREAM errors before the failure. It's not unusual for some upstream server connections to occasionally reset (or web pages to reference servers that don't even exist) and when they do Mockttp intentionally triggers downstream connection errors to simulate the same failure as closely as possible. That's not a problem.

I think you probably also don't need to worry about debugging httpolyglot and related behaviour - if the debug line is appearing, then the downstream connection does seem to be working, so this would just be a problem with the passthrough logic. It sounds like this is some issue with our upstream connections, which stops new requests being sent after some time.

It would be helpful to know:

  • If you add a separate callback response rule (e.g. .always().thenCallback(() => ({ statusCode: 200, body: 'hello world' }))) for some special url (e.g. example.com/test), does that work as expected after the failure occurs? This will confirm for sure whether this does indeed only affect upstream passthrough connections between Mockttp & the servers, or whether it's an issue in the code between clients & Mockttp.
  • Do requests to other URLs (hitting your forUnmatchedRequest rule with no beforeRequest/Response callback) still work after this happens?
  • Do you still see failures for direct connections to IP addresses after failure occurs? This would help to exclude some DNS causes.
  • Does this happen if you disable HTTP/2 with http2: false?

In general, if the request is being received (appearing in the debug log) the only risky things I can think of that happen before the beforeRequest handler is called are DNS (which will usually come from the cache anyway) and receiving & buffering for the incoming request body (when beforeRequest is set, we don't call it or continue with the request until we have the full request body available to provide to the callback). I would also have assumed that incoming connections could cause problems, but if you're seeing debug messages for the requests which then fail then this isn't the problem.

@Salketer
Copy link
Author

Hello pimterry

You are correct on what you understood but for one point: It is not the beforeRequest that does not get called but beforeResponse.

As for ignoring the ECONNRESET/RST_STREAM I understand what you mean, but in this exact case, it never happened when not passing through the proxy. Maybe there are retry mechanics backed in chromium that are not in the proxy tough.

You also are right about httpolyglot, after testing bigger settings and still seeing the problem I concluded the same as you, the problem is certainly coming from upstream.

As for your 4 questions, answered in order:

  • The .always works. In fact, all rules continue working unless they are using a passthrough and targeting the same host. Using a mirror host also works, until it crashes too.
  • Other rules, are indeed working, even if passedThrough, as long as the domain is not the same.
  • I could not get it to work because of certificate problems with IPs, but I have modified the host of the machine to bind to the IP, which should pretty much remove DNS dependency.
  • This does not happen without HTTP/2 BUT: as I said, http connection keeps working which is in http/1.1 even if HTTP/2 is turned on but in broken state, only HTTPS connections work on HTTP/2 from what I understand. Also, by using chromium, we are limited to 6 concurrent connexions. This means the 60 instances, doing requests at less than 200ms each execute somewhere close to 300 requests per second. With the conccurency limit, we do 6 every 200ms for a total of 30 requests per second, 10 times less.

What I think happens is that upstream is not HTTP/2 (I couldn't really find/understand the passthrough code but did not see any dependency for the http2 node module. This would mean the upstream server is getting bombarded by those 300 connection requests per second which could potential activate DOS protections. But then again, if it was something like that I'd expect to be blacklisted from the server for some time, but everything keeps working if I bypass the proxy. Maybe an unhandled/silenced error when the upstream server closes connection unexpectedly?

@pimterry
Copy link
Member

It is not the beforeRequest that does not get called but beforeResponse.

Ok, that makes it clear that this is happening in the upstream connection & request process itself then.

In this case, you might find it interesting to listen to rule-event events - this will fire lots of different types of events from the passthrough rule itself, including raw events of the data that is sent & received to the upstream server. There is a very new passthrough-abort rule event here, which might include details of connection issues upstream.

What I think happens is that upstream is not HTTP/2 (I couldn't really find/understand the passthrough code but did not see any dependency for the http2 node module.

It probably is HTTP/2 upstream. If the incoming client connection is HTTP/2, the upstream connection is made using http2-wrapper which handles automatically negotiating use of HTTP/2 or HTTP/1 depending on what the servers supports.

This does not happen without HTTP/2

That's very helpful, and suggests this is related to the upstream HTTP/2 connections specifically.

An interesting test would be to manually set shouldTryH2Upstream to false here. In that configuration, your clients can still send HTTP/2, but the server connections will always be HTTP/1. This won't be limited to 6 connections, as Mockttp (and Node.js generally) doesn't use the same constraints as Chromium there, so requests should be sent at basically the same rate as using HTTP/2 end-to-end.

It would be very interesting to know if this works, since that would make it clear that it's an issue in either http2-wrapper, our agent configuration for that, or Node's HTTP/2 module itself.

But then again, if it was something like that I'd expect to be blacklisted from the server for some time, but everything keeps working if I bypass the proxy

It depends who you're talking to, but it's possible that they'd blacklist the IP + TLS fingerprint together, and Mockttp will have a different fingerprint from your other client. That's not common but it wouldn't be particularly surprising. That means it's not impossible that this is a DOS protection issue (but it's still not clear). Doing a very quick restart of the proxy and checking if the issue continues would confirm this though.

@Salketer
Copy link
Author

Doing a very quick restart of the proxy and checking if the issue continues would confirm this though.

Without restarting the process, just calling stop and start again works makes the proxy work again. Unfortunately I have ran out of time to fix this case on our project and we started using a fallback option which for now covers our needs. I kept everything and will try to provide you with a repro when I have a bit more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants