-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Externalize kvm agent storage reboot configuration #4586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Externalize kvm agent storage reboot configuration #4586
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @GutoVeronezi nice feature ...
I was thinking and if we change the name of these files, put something more accurate or self explainable?
-
AgentProperty.java => AgentProperties.java
since we will work to bring all available properties to this file, right? -
AgentPropertyFile.java => AgentPropertiesFileHandler.java
this class sounds like a handler or facility to me, am I right?
Bearing in mind that we are looking to standardize and normalize things here. What do you think of normalizing the KVMHAMonitor.java file, at least on the lines where some changes were needed.
String _hostIP => String hostIP
My eyes bleed when I see things like that in the code.
private static final Logger s_logger = Logger.getLogger(KVMHAMonitor.class); | ||
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<String, NfsStoragePool>(); | ||
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we start to normalize these variable names in this file?
Removing these underscore at the start of any variable. at least where some changes were needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<>(); | |
private final Map<String, NfsStoragePool> storagePool = new ConcurrentHashMap<>(); |
@RodrigoDLopez ;)
(will need work in the rest of the file!!!)
@RodrigoDLopez sure, the new files names will be more intuitive. About the normalizations of the names, I understand what you mean, I'll do it too. |
good work @GutoVeronezi looking forward to see this merged, but I am going to put my foot down for very extensive testing. 👍 |
nice job @GutoVeronezi |
Can anyone review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good (in spite of the unguarded formats in debug statements. but this needs (as said) extensive 3rd party testing.
private static final Logger s_logger = Logger.getLogger(KVMHAMonitor.class); | ||
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<String, NfsStoragePool>(); | ||
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean
private final Map<String, NfsStoragePool> _storagePool = new ConcurrentHashMap<>(); | |
private final Map<String, NfsStoragePool> storagePool = new ConcurrentHashMap<>(); |
@RodrigoDLopez ;)
(will need work in the rest of the file!!!)
|
||
private final String _hostIP; /* private ip address */ | ||
/* private ip address */ | ||
private final String _hostIP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
private final String _hostIP; | |
private final String hostIP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the contributor decides to proceed with renaming this variable, I would also suggest removing the commented line /* private IP address */
, and leave simply hostIP
, or then something as hostPrivateIP
or privateIpAddress
, as this is expected to be the HostVO.privateIpAddress
.
I just did, but this is not what you need most, I'll look in the company to find test effort for us, to get this merged. As a general rule we need two lgtm, on of which should have at least tested extensively and the other at least looked at the code. |
Packaging result: ✔️ centos7 ✖️ centos8 ✔️ debian. SL-JID 547 |
Packaging result: ✖️ centos7 ✖️ centos8 ✖️ debian. SL-JID 561 |
Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 578 |
@blueorangutan test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM.
I added a suggestion regarding the variable naming change; nothing that would block this PR anyway.
|
||
private final String _hostIP; /* private ip address */ | ||
/* private ip address */ | ||
private final String _hostIP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the contributor decides to proceed with renaming this variable, I would also suggest removing the commented line /* private IP address */
, and leave simply hostIP
, or then something as hostPrivateIP
or privateIpAddress
, as this is expected to be the HostVO.privateIpAddress
.
@DaanHoogland @GabrielBrascher rename done. |
agent/conf/agent.properties
Outdated
@@ -263,3 +263,6 @@ iscsi.session.cleanup.enabled=false | |||
# Automatically clean up iscsi sessions not attached to any VM. | |||
# Should be enabled for users using managed storage for example solidfire. | |||
# Should be disabled for users with unmanaged iscsi connections on their hosts | |||
# | |||
#This parameter specifies if the host must be rebooted when something go wrong with heartbeat. | |||
# reboot.host.and.alert.management.on.hearbeat.timeout=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo hearbeat
=> heartbeat
* Default value: 60000 (ms). | ||
*/ | ||
public static final AgentProperties<Integer> HEARTBEAT_UPDATE_TIMEOUT = new AgentProperties<Integer>("hearbeat.update.timeout", 60000); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the typo error 'hearbeat', and correct it wherever applicable.
@GutoVeronezi is this PR ready for review? |
@sureshanaparti as I mentioned in #4585 (comment), this PR will be rebased if #4585 be merged, therefore it still is a WiP. |
1022328
to
53fb485
Compare
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 701 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM.
i'll be able to test this one on this weekend. perhaps i'll post my results in the next monday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cltgm
@@ -270,3 +270,6 @@ iscsi.session.cleanup.enabled=false | |||
# This parameter specifies the heartbeat update timeout in ms; The default value is 60000ms (1 min). | |||
# Depending on the use case, this timeout might need increasing/decreasing. | |||
# heartbeat.update.timeout=60000 | |||
|
|||
# This parameter specifies if the host must be rebooted when something goes wrong with the heartbeat. | |||
# reboot.host.and.alert.management.on.heartbeat.timeout=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope the newline is there ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaanHoogland It's there =)
GitHub warns when there is no newline at the end of the file, e.g.:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, you're right. forgot about that.
Trillian test result (tid-1434)
|
Trillian test result (tid-1437)
|
test_vpc_redundant is failing all over the open PRs, but running one last time anyway to be as sure as possible |
Trillian test result (tid-1446)
|
@RodrigoDLopez did you test this? |
Sorry @DaanHoogland I couldn't run the tests yet. I'm having problems with my test environment, but I will run the tests as soon as possible. |
@blueorangutan package |
@GutoVeronezi a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 815 |
@blueorangutan test |
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-1588)
|
@RodrigoDLopez anything to share yet? |
I ran some tests with this propose. But I'm not sure about my environment. I couldn't see my kvm agent stop at any point. forcing my heartbeat timeout on two kvm hosts it's possible to see that the ones without the parameter Kvm-nd01
Kvm-nd02
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @GutoVeronezi.
LGTM based on code review and manual tests done by @RodrigoDLopez.
The test looks good, probably the host was not rebooted due to some test env details. The logic behind loading the agent configuration, validating it and changing the execution flow according to it seems OK.
@DaanHoogland @GabrielBrascher @RodrigoDLopez is this good to go? Is there anything else to do? |
Thanks for testing @RodrigoDLopez |
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
@nvazquez ok, thanks! |
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 981 |
@blueorangutan test |
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-1754)
|
Description
On KVMHAMonitor, when there is an inconsistency on the heartbeat's file or heartbeat timeout is extrapolated several times, the host is restarted. The host restarting amid several operations might bring inconsistency problems; therefore, it is interesting to let operators choose if they want the host restart or not on the hearbeat's timeout or storage errors.
This PR intends to externalize the heartbeat reboot configuration on KVMHAMonitor. The default value is
true
; therefore, the current behavior is maintained.This PR depends on PR#4585.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
How Has This Been Tested?
It has been tested locally on a test lab.
hearbeat.update.timeout=10
setting toagent.properties
to force a timeout on hearbeat.reboot.host.and.alert.management.on.hearbeat.timeout
setting toagent.properties
and changed it some times;