Skip to content

THRIFT-5862: Validate the message size at the endpoint transport only #3127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dengzhhu653
Copy link
Member

@dengzhhu653 dengzhhu653 commented Apr 22, 2025

  • Did you create an Apache Jira ticket? (Request account here, not required for trivial changes)
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

@Jens-G
Copy link
Member

Jens-G commented Apr 22, 2025

Would love to hear the reason for this. Really.

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Apr 22, 2025

-1 Would love to hear the reason for this. Really.

Hi @Jens-G, mainly two reasons:

  • In a chain of transports, every transport(the transport extends TEndpointTransport, except the TLayeredTransport) updates and checks the knownMessageSize/remainingMessageSize based on the bytes fed, this is not straightforward and confused, which one should take effect, what if a transport updates the max message size via TTransport#getConfiguration.setMaxMessageSize.
  • For a service, we should always have an endpoint transport which gets fed from external source in a chain, it's more reasonable to validate the limit here instead of all the downstream transports.

For a chain which has only TSocket, this configured limit doesn't take effect, and no clear way to mark the boundary of the message, i.e, assumption on TTransport#flush to set the message end, though this is the check on the message read.

@Jens-G
Copy link
Member

Jens-G commented Apr 22, 2025

@ all: Opinions?

@Jens-G Jens-G added the java Pull requests that update Java code label Apr 23, 2025
@Jens-G
Copy link
Member

Jens-G commented Apr 29, 2025

I still have problems with this one. Some layered transports do implement specific limits, the most prominent one is TFramedTransport. If we don't want to check that anymore, the whole point of the validation gets lost.

Another example would be compression implemented as a layered transport. Endpoint raeds a 42 KB message which all of a sudden produces petabytes during decompression.

this configured limit doesn't take effect, and no clear way to mark the boundary of the message,

That's why we introduce artificial limits to validate against. If the limit is (just as an example) 10 MB then reading a message of 10 MB plus one byte should fail with the appropriate error message, no matter how much more data the socket would be able to deliver. This is by design.

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Apr 29, 2025

Thank you @Jens-G! Here is my understanding, please correct me if wrong.
The TFramedTransport takes in innerTransport as the source transport, the message limit will be checked at the source innerTransport(If the innerTransport is a TEndpointTransport, otherwise the limit check would be propagated to the source of innerTransport, continue...), this TFramedTransport will validate the frame limit only. For the frame limit we can get from the innerTransport's configuration if any, otherwise default, it's missing in current PR.

Another example would be compression implemented as a layered transport

This is a good case. TZlibTransport for example, it still checks the message limit after the de-compression. we still allow the user to implement their own message limit based on their needs, for example:

public int read(byte[] buf, int off, int len) throws TTransportException {
int iBytesRead = bufferTrans.read(buf, off, len);
consumeReadMessageBytes(iBytesRead);
return iBytesRead;
}

no matter how much more data the socket would be able to deliver. This is by design.

An issue I can think of is how to distinguish different messages a socket might send, a Hive client can re-use the same socket for future uses, either for a different request or polling the request's status from the server.

@Jens-G
Copy link
Member

Jens-G commented May 1, 2025

For the frame limit we can get from the innerTransport's configuration if any, otherwise default, it's missing in current PR.

I was thinking about the frame size delivered with the message. That one is usually way lower than the allowed limit. That's what I meant by

Some layered transports do implement specific limits

@Jens-G
Copy link
Member

Jens-G commented May 1, 2025

for example:

Your example is broken. If someone passes in int.Max for len the function will try to read it w/o complaint. Exactly that should not happen. Asking consumeReadMessageBytes() for validation after you already did the read() it is rather pointless, because the test is intended to happen before we even try to read.

While I appreciate the efforts you put into this, I'm still not convinced (a) we should do this at all and (b) with the quality of the patch. If there are good arguments for a patch that let me believe in it and convince me that my doubts are unfounded, I am perfectly fine with it. But so far, you only managed to increased the latter.

@dengzhhu653
Copy link
Member Author

While I was trying to debug a problem, I found every transport had knownMessageSize and remainingMessageSize, in this example, TSaslServerTransport, TSocket and TMemoryInputTransport:
Screenshot 2025-05-05 at 11 16 03

When "MaxMessageSize reached", only remainingMessageSize in TMemoryInputTransport was reaching the limit, the stack:

org.apache.thrift.transport.TTransportException: MaxMessageSize reached
at org.apache.thrift.transport.TEndpointTransport.countConsumedMessageBytes(TEndpointTransport.java:96)
at org.apache.thrift.transport.TMemoryInputTransport.read(TMemoryInputTransport.java:97)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:390)
at org.apache.thrift.transport.TSaslServerTransport.read(TSaslServerTransport.java:44)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:109)
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:417)
at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:411)
at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(StorageDescriptor.java:1289)

Looks it's of no use to maintain the knownMessageSize and remainingMessageSize in the TSaslServerTransport, TSocket, so I'm thinking of how to improve and trim the validation from the chain. Instead of putting the limit check on the top of the chain(in reality we don't know who will at the top), I try to put it at the bottom(the TSocket) as this PR suggests.

the test is intended to happen before we even try to read.

yeah, this happens on the read on a string or a container as far as I know, which we have some checks on them on the protocol lawyer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants