Skip to content
This repository was archived by the owner on Aug 2, 2022. It is now read-only.

fix alerting stats API for ES 7.3.2 #128

Merged

Conversation

jinsoor-amzn
Copy link
Contributor

Issue #, if available:
#99 (comment)

Description of changes:
Error stack trace

[2019-12-02T12:03:13,225][DEBUG][c.a.o.a.c.a.n.ScheduledJobsStatsTransportAction] [Data1] failed to execute on node [EgCz2GQcQASrEP6tNRIAGQ]
org.elasticsearch.transport.RemoteTransportException: [Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]]
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]
    at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:213) [elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:141) [elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.3.2.jar:7.3.2]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]
    at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.lang.IllegalStateException: node must not be null
    at com.amazon.opendistroforelasticsearch.alerting.core.action.node.ScheduledJobStats.<init>(ScheduledJobStats.kt:46) ~[?:?]
    at com.amazon.opendistroforelasticsearch.alerting.core.action.node.ScheduledJobsStatsTransportAction.newNodeResponse(ScheduledJobsStatsTransportAction.kt:76) ~[?:?]
    at com.amazon.opendistroforelasticsearch.alerting.core.action.node.ScheduledJobsStatsTransportAction.newNodeResponse(ScheduledJobsStatsTransportAction.kt:39) ~[?:?]
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.read(TransportNodesAction.java:186) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.read(TransportNodesAction.java:183) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(TransportService.java:1092) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(TransportService.java:1079) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:209) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:141) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) ~[elasticsearch-7.3.2.jar:7.3.2]
    at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
    at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) ~[?:?]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) ~[?:?]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) ~[?:?]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) ~[?:?]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) ~[?:?]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[?:?]
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) ~[?:?]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
    at java.lang.Thread.run(Thread.java:835) ~[?:?]

The error was happening because ES was creating a new object of ScheduledJobStats with empty constructor. But as part of object variable we try to use the node variable which was not initialized yet.

(This only happens during multi-node ES cluster because ES needs to recreate the node and it does this by creating a new "empty" object and "read" the data that has been transported.)

I have modified the code such that it only reads the variable when it is used within the toXContent.

Testing of changes:
I have tested locally using two different ES processes running.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

@lucaswin-amzn lucaswin-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for finding the root cause of this issue.

@lucaswin-amzn lucaswin-amzn merged commit f3df2f2 into opendistro-for-elasticsearch:master Dec 2, 2019
tlfeng pushed a commit that referenced this pull request Feb 6, 2021
Fix multi node alerting stats API.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants