Skip to content

Some messages are lost #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tarasrng opened this issue Jun 11, 2020 · 5 comments
Closed

Some messages are lost #51

tarasrng opened this issue Jun 11, 2020 · 5 comments
Labels

Comments

@tarasrng
Copy link

I'm using the latest logback-gelf appender (3.0.0) and also tried the old one (2.2.0) with Spring Boot 2.2.2.RELEASE (web app)

Not all the messages are reaching Graylog, and when the app is under heavy load almost none of them are logged.

When enabling logback debug I see the following errors very often:

ERROR in de.siegmar.logbackgelf.GelfTcpAppender[GELF] - Error sending message via tcp://my host:12201 java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketException: Broken pipe (Write failed)
	at 	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at 	at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
	at 	at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:138)
	at 	at de.siegmar.logbackgelf.TcpConnection.write(TcpConnection.java:55)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender$2.accept(GelfTcpAppender.java:176)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender$2.accept(GelfTcpAppender.java:173)
	at 	at de.siegmar.logbackgelf.pool.SimpleObjectPool.execute(SimpleObjectPool.java:63)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender.sendMessage(GelfTcpAppender.java:173)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender.appendMessage(GelfTcpAppender.java:147)
	at 	at de.siegmar.logbackgelf.AbstractGelfAppender.append(AbstractGelfAppender.java:104)
	at 	at de.siegmar.logbackgelf.AbstractGelfAppender.append(AbstractGelfAppender.java:27)
	at 	at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
	at 	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	at 	at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:290)


Also, there are the following errors:

ERROR in de.siegmar.logbackgelf.GelfTcpAppender[GELF] - Error sending message via tcp://my host:12201 java.net.SocketException: An established connection was aborted by the software in your host machine
	at java.net.SocketException: An established connection was aborted by the software in your host machine
	at 	at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:421)
	at 	at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:441)
	at 	at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:825)
	at 	at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1007)
	at 	at java.base/java.io.OutputStream.write(OutputStream.java:122)
	at 	at de.siegmar.logbackgelf.TcpConnection.write(TcpConnection.java:55)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender.lambda$sendMessage$1(GelfTcpAppender.java:161)
	at 	at de.siegmar.logbackgelf.pool.SimpleObjectPool.execute(SimpleObjectPool.java:63)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender.sendMessage(GelfTcpAppender.java:161)
	at 	at de.siegmar.logbackgelf.GelfTcpAppender.appendMessage(GelfTcpAppender.java:135)
	at 	at de.siegmar.logbackgelf.AbstractGelfAppender.append(AbstractGelfAppender.java:104)
	at 	at de.siegmar.logbackgelf.AbstractGelfAppender.append(AbstractGelfAppender.java:27)
	at 	at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
	at 	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	at 	at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:290)

Here is my config

<appender name="GELF" class="de.siegmar.logbackgelf.GelfTcpAppender">
        <graylogHost>${MY_GRAYLOG_HOST}</graylogHost>
        <graylogPort>12201</graylogPort>
        <connectTimeout>15000</connectTimeout>
        <reconnectInterval>300</reconnectInterval>
        <maxRetries>2</maxRetries>
        <retryDelay>3000</retryDelay>
        <poolSize>2</poolSize>
        <poolMaxWaitTime>5000</poolMaxWaitTime>
        <encoder class="de.siegmar.logbackgelf.GelfEncoder">
            <originHost>localhost</originHost>
            <includeMdcData>true</includeMdcData>
            <shortPatternLayout class="ch.qos.logback.classic.PatternLayout">
                <pattern>%m%nopex</pattern>
            </shortPatternLayout>
            <fullPatternLayout class="ch.qos.logback.classic.PatternLayout">
                <pattern>%m%n</pattern>
            </fullPatternLayout>
            <staticField>application:${application}</staticField>
        </encoder>
    </appender>

    <appender name="ASYNC_GELF" class="ch.qos.logback.classic.AsyncAppender">
        <appender-ref ref="GELF" />
    </appender>

    <root level="info">
        <appender-ref ref="ASYNC_GELF" />
        <appender-ref ref="Console" />
    </root>
@osiegmar
Copy link
Owner

Hello @tarasrng !

Please check AsyncAppender documentation about how to configure queueSize and discardingThreshold regarding dropped messages in heavy load situations.

As the error message from your stack trace 'An established connection was aborted by the software in your host machine' says, it is because of a closed TCP connection. Maybe the connection has been closed by Graylog or a firewall. Depending on the configuration of this library, a retry operation is being performed afterwards with another connection but that retry could end up with a similar exception if this other connection also has been closed.

You could try the yet unreleased version of this library and configure the newly poolMaxIdleTime to automatically close connections after some time of inactivity. I think, this would probably solve your problem. See #49 for more details about this.

Let me know, if that helped and the value you had to use.

@tarasrng
Copy link
Author

@osiegmar thank you for suggestions!
I tried configuring queueSize and discardingThreshold - unfortunately, no luck.
Not sure how poolMaxIdleTime can help, it looks more like a cleanup parameter. Furthermore, I see lost messages even when running a short test app that runs only ~10 seconds.
Maybe retry configuration has to be tuned.
We will also try to check Graylog and monitor traffic using WireShark.

@osiegmar
Copy link
Owner

Indeed, poolMaxIdleTime probably wouldn't help for that short lifetime. If the problem also appear on an application that runs for ~10 seconds, tcpdump or Wireshark are definitely the tools I'd use for debugging. Just another thought – you're shutting down the application gracefully, right? You may check the documentation about stopping logback-classic for additional (debugging) steps in order to ensure graceful shutdown.

@tarasrng
Copy link
Author

Yep, it's a graceful shutdown, and I even do sleep to make sure all the messages have a chance to be logged.

@osiegmar
Copy link
Owner

I'll close this ticket now as I highly doubt that it is related to this library itself. Please re-open it if you have additional information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants