Description
We have noticed since the upgrade from 24.12.0 to 25.2.0 a significant increase in memory usage that has caused several jvm OOM errors. Our web3signers are responsible for 10s of thousands of keys and this issue appeared after onboarding a new series of ~200keys leading to the OOM crash and missed attestations.
You can see from the graph that with version 24.12.0 we didn’t have this memory issue, as soon as we deployed 25.2.0 on 19/02 then memory usage spiked considerably. As a temporary measure we have set our -XX:MaxRAMPercentage=50
and given our nodes more memory, we generally see memory usage plateau but the plateau will only last a day or 2 before it climbs again. This gradual climb continues until we are alerted and have to restart. Originally we were at -XX:MaxRAMPercentage=25
without any issues or memory spikes. Memory usage for the web3signer up until 25.2.0 has always been consistently low.
When we onboarded 200 new keys and we had missed attestations we noticed a considerable jvm garbage collection time as shown on the graph below across 2 of our 3 signers (jvm_gc_collection_seconds).
despite the long gc times we didnt see any reduction in memory for 2 of our signers during this period but this could be the result of memory pressure at the time.
We have noticed a worrying increase in memory and a gradual climb over several days. Where before 25.2.0 we were seeing a usage of ~1GB with no issues we now gradually climb beyond 6GB after 2 or 3 days.