Skip to content

RpcChannel在发送失败后未通知回调,导致长时间hang住直到超时。 #94

@liming30

Description

@liming30

Netty channel 在发送消息到server端时可能出现失败,但是目前没有对channelFuture进行处理,导致只能依靠超时来确保不会完全hang住,但这种方式丢失了正确的异常栈,让问题很难排查。
https://github.com/baidu/Jprotobuf-rpc-socket/blob/master/jprotobuf-rpc-core/src/main/java/com/baidu/jprotobuf/pbrpc/transport/RpcChannel.java#L141

这是使用 arthas 捕获到的一个特殊case的异常信息,但是在任何地方都没有对异常进行处理,最终只能等待RPC调用超时。

method=io.netty.channel.AbstractChannelHandlerContext.writeAndFlush location=AtExit
ts=2024-02-20 11:52:40; [cost=0.064681ms] result=@ArrayList[
    io.netty.handler.codec.EncoderException: java.lang.RuntimeException: Negative initial size: -736704836
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856)
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856)
	at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:304)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
	at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: Negative initial size: -736704836
	at com.baidu.jprotobuf.pbrpc.data.RpcDataPackage.write(RpcDataPackage.java:687)
	at com.baidu.jprotobuf.pbrpc.transport.handler.RpcDataPackageEncoder.encode(RpcDataPackageEncoder.java:88)
	at com.baidu.jprotobuf.pbrpc.transport.handler.RpcDataPackageEncoder.encode(RpcDataPackageEncoder.java:1)
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
	... 21 more
Caused by: java.lang.IllegalArgumentException: Negative initial size: -736704836
	at java.base/java.io.ByteArrayOutputStream.<init>(ByteArrayOutputStream.java:76)
	at com.baidu.jprotobuf.pbrpc.data.RpcDataPackage.write(RpcDataPackage.java:668)
	... 24 more

我认为我们应该在发送失败后调用callback,以保证正确的异常信息被透出。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions