Skip to content

TP-568: Rebind service beans when PU has restarted #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 10, 2022

Conversation

oskarwiksten
Copy link
Contributor

  • Makes Astrix clients rebind their service beans whenever any of the PUs in the space have restarted.
  • Reason for rebinding is to make sure that clients have up-to-date connection details to which hosts that the space resides on, in case any of the server-side PUs have been relocated or restarted.
  • However, GS will detect if any instances have moved automatically, but only after an initial failed connection to any such outdated instances - which might result in timeouts. With this commit, the intention is that such timeouts will be less likely.
  • Before this change, if any of the PU instances had been relocated, all clients would fail their connections to the previous relocated instance at least once.
  • In particular, this commit intends to resolve the case where
    • An instance of a PU has been relocated to a new server
    • The old server is unavailable, and network requests to it will time-out instead of being rejected
    • The astrix service is called seldomly (for example, only at a certain time during the day)
    • Such a case would previously get timeout on the first (next) call after the PU had been restarted. For example, the next day when the calling app needs the service again.
    • With this commit, the calling apps will reconnect some time after the PU has been relocated.

Example testcase that was used to verify this change:

  • 0min: An app calls an astrix service on a PU using @AstrixBroadcast
  • 1min: One instance of the PU partitions is restarted
  • 3min: The calling app makes the same broadcast call again

Results before this proposed change:

  • 0min: The call succeeds without exception
  • 3min: Calling app logs WARN: Async execution failed: java.rmi.ConnectException: Broken pipe
  • 3min: The call succeeds without exception (the code making the astrix call does not see the exception logged above)
  • The calling app has therefore tried to make at least one network connection to an old PU instance

Results with this commit:

  • 0min: The call succeeds without exception
  • 1min 15s: The calling app logs Service properties for bean=... have changed, will rebind
  • 3min: The call succeeds without exception
  • The calling app has not logged any WARN logs.
  • The calling app has not tried to connect to the old PU instance

* Makes Astrix clients rebind their service beans whenever any of the PUs in the space have restarted.
* Reason for rebinding is to make sure that clients have up-to-date connection details to which hosts that the space resides on, in case any of the server-side PUs have been relocated or restarted.
* However, GS will detect if any instances have moved automatically, but only after an initial failed connection to any such outdated instances - which might result in timeouts. With this commit, the intention is that such timeouts will be less likely.
* Before this change, if any of the PU instances had been relocated, all clients would fail their connections to the previous relocated instance at least once.
* In particular, this commit intends to resolve the case where
	* An instance of a PU has been relocated to a new server
	* The old server is unavailable, and network requests to it will time-out instead of being rejected
	* The astrix service is called seldomly (for example, only at a certain time during the day)
	* Such a case would previously get timeout on the first (next) call after the PU had been restarted. For example, the next day when the calling app needs the service again.
	* With this commit, the calling apps will reconnect some time after the PU has been relocated.

Example testcase that was used to verify this change:
* 0min: An app calls an astrix service on a PU using `@AstrixBroadcast`
* 1min: One instance of the PU partitions is restarted
* 3min: The calling app makes the same broadcast call again

Results before this proposed change:
* 0min: The call succeeds without exception
* 3min: Calling app logs `WARN: Async execution failed: java.rmi.ConnectException: Broken pipe`
* 3min: The call succeeds without exception (the code making the astrix call does not see the exception logged above)
* The calling app has therefore tried to make at least one network connection to an old PU instance

Results with this commit:
* 0min: The call succeeds without exception
* 1min 15s: The calling app logs `Service properties for bean=... have changed, will rebind`
* 3min: The call succeeds without exception
* The calling app has not logged any WARN logs.
* The calling app has not tried to connect to the old PU instance
@oskarwiksten oskarwiksten force-pushed the tp-568-invalidate-connection-on-pu-restart branch from 7358442 to 98d6156 Compare December 30, 2021 08:16
Copy link
Contributor

@ath0s ath0s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -118,6 +118,10 @@ public void renewLease() {
return;
}
if (serviceHasChanged(serviceDiscoveryResult.getResult())) {
if (isBound() && currentProperties != null) {
log.info("Service properties for bean={} astrixBeanId={} have changed, will rebind service bean.", getBeanKey(), id);
destroy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, calling destroy here will cause the ServiceBeanInstance to be rebound. So it is not really destroy in the sense that it can no longer be used, but rather unbound. If this is the case, I suggest we rename the method and rephrase the logging a bit to reflect this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added another commit now that makes such changes.

@oskarwiksten oskarwiksten merged commit 6b220e6 into master Jan 10, 2022
@oskarwiksten oskarwiksten deleted the tp-568-invalidate-connection-on-pu-restart branch January 10, 2022 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants