TP-568: Rebind service beans when PU has restarted #121

oskarwiksten · 2021-12-28T11:19:17Z

Makes Astrix clients rebind their service beans whenever any of the PUs in the space have restarted.
Reason for rebinding is to make sure that clients have up-to-date connection details to which hosts that the space resides on, in case any of the server-side PUs have been relocated or restarted.
However, GS will detect if any instances have moved automatically, but only after an initial failed connection to any such outdated instances - which might result in timeouts. With this commit, the intention is that such timeouts will be less likely.
Before this change, if any of the PU instances had been relocated, all clients would fail their connections to the previous relocated instance at least once.
In particular, this commit intends to resolve the case where
- An instance of a PU has been relocated to a new server
- The old server is unavailable, and network requests to it will time-out instead of being rejected
- The astrix service is called seldomly (for example, only at a certain time during the day)
- Such a case would previously get timeout on the first (next) call after the PU had been restarted. For example, the next day when the calling app needs the service again.
- With this commit, the calling apps will reconnect some time after the PU has been relocated.

Example testcase that was used to verify this change:

0min: An app calls an astrix service on a PU using @AstrixBroadcast
1min: One instance of the PU partitions is restarted
3min: The calling app makes the same broadcast call again

Results before this proposed change:

0min: The call succeeds without exception
3min: Calling app logs WARN: Async execution failed: java.rmi.ConnectException: Broken pipe
3min: The call succeeds without exception (the code making the astrix call does not see the exception logged above)
The calling app has therefore tried to make at least one network connection to an old PU instance

Results with this commit:

0min: The call succeeds without exception
1min 15s: The calling app logs Service properties for bean=... have changed, will rebind
3min: The call succeeds without exception
The calling app has not logged any WARN logs.
The calling app has not tried to connect to the old PU instance

* Makes Astrix clients rebind their service beans whenever any of the PUs in the space have restarted. * Reason for rebinding is to make sure that clients have up-to-date connection details to which hosts that the space resides on, in case any of the server-side PUs have been relocated or restarted. * However, GS will detect if any instances have moved automatically, but only after an initial failed connection to any such outdated instances - which might result in timeouts. With this commit, the intention is that such timeouts will be less likely. * Before this change, if any of the PU instances had been relocated, all clients would fail their connections to the previous relocated instance at least once. * In particular, this commit intends to resolve the case where * An instance of a PU has been relocated to a new server * The old server is unavailable, and network requests to it will time-out instead of being rejected * The astrix service is called seldomly (for example, only at a certain time during the day) * Such a case would previously get timeout on the first (next) call after the PU had been restarted. For example, the next day when the calling app needs the service again. * With this commit, the calling apps will reconnect some time after the PU has been relocated. Example testcase that was used to verify this change: * 0min: An app calls an astrix service on a PU using `@AstrixBroadcast` * 1min: One instance of the PU partitions is restarted * 3min: The calling app makes the same broadcast call again Results before this proposed change: * 0min: The call succeeds without exception * 3min: Calling app logs `WARN: Async execution failed: java.rmi.ConnectException: Broken pipe` * 3min: The call succeeds without exception (the code making the astrix call does not see the exception logged above) * The calling app has therefore tried to make at least one network connection to an old PU instance Results with this commit: * 0min: The call succeeds without exception * 1min 15s: The calling app logs `Service properties for bean=... have changed, will rebind` * 3min: The call succeeds without exception * The calling app has not logged any WARN logs. * The calling app has not tried to connect to the old PU instance

ath0s

👍

ath0s · 2022-01-03T09:13:43Z

astrix-context/src/main/java/com/avanza/astrix/beans/service/ServiceBeanInstance.java

@@ -118,6 +118,10 @@ public void renewLease() {
 				return;
 			}
 			if (serviceHasChanged(serviceDiscoveryResult.getResult())) {
+				if (isBound() && currentProperties != null) {
+					log.info("Service properties for bean={} astrixBeanId={} have changed, will rebind service bean.", getBeanKey(), id);
+					destroy();


If I understand this correctly, calling destroy here will cause the ServiceBeanInstance to be rebound. So it is not really destroy in the sense that it can no longer be used, but rather unbound. If this is the case, I suggest we rename the method and rephrase the logging a bit to reflect this.

Have added another commit now that makes such changes.

oskarwiksten force-pushed the tp-568-invalidate-connection-on-pu-restart branch from 7358442 to 98d6156 Compare December 30, 2021 08:16

ath0s approved these changes Jan 3, 2022

View reviewed changes

TP-568: Rebind service beans when PU has restarted

1578d4a

askoog approved these changes Jan 9, 2022

View reviewed changes

oskarwiksten merged commit 6b220e6 into master Jan 10, 2022

oskarwiksten deleted the tp-568-invalidate-connection-on-pu-restart branch January 10, 2022 08:38

oskarwiksten mentioned this pull request Jan 12, 2022

TP-568: Invalidate gs connections if PU starttime changes #122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TP-568: Rebind service beans when PU has restarted #121

TP-568: Rebind service beans when PU has restarted #121

Uh oh!

oskarwiksten commented Dec 28, 2021

Uh oh!

ath0s left a comment

Uh oh!

ath0s Jan 3, 2022

Uh oh!

oskarwiksten Jan 3, 2022

Uh oh!

Uh oh!

TP-568: Rebind service beans when PU has restarted #121

TP-568: Rebind service beans when PU has restarted #121

Uh oh!

Conversation

oskarwiksten commented Dec 28, 2021

Uh oh!

ath0s left a comment

Choose a reason for hiding this comment

Uh oh!

ath0s Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

oskarwiksten Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!