Library with common methods and default values for High Availability Extension (HA or HAE) tests.
$default_timeout: default scaled timeout for most operations with SUT
$join_timeout: default scaled timeout for ha-cluster-join
calls
$softdog_timeout: default scaled timeout for the softdog watchdog
$crm_mon_cmd: crm_mon (crm monitoring) command
$corosync_token: command to filter the value of runtime.config.totem.token
from the output of corosync-cmapctl
$corosync_consensus: command to filter the value of runtime.config.totem.consensus
from the output of corosync-cmapctl
$sbd_watchdog_timeout: command to extract the value of SBD_WATCHDOG_TIMEOUT
from /etc/sysconfig/sbd
$sbd_delay_start: command to extract the value of SBD_DELAY_START
from /etc/sysconfig/sbd
$pcmk_delay_max: command to get the value of the pcmd_delay_max
parameter from the STONITH resource in the cluster configuration.
exec_csync();
Runs csync2 -vxF
in the SUT, to sync files from SUT to other nodes in the cluster. Sometimes it is expected that the first call to csync2 -vxF
fails, so this method will run the command twice.
add_file_in_csync( value => '/path/to/file', [ conf_file => '/path/to/csync2.cfg' ] );
Adds /path/to/file to a csync2 configuration file in SUT. Path to add must be passed with the named argument value, while csync2 configuration file can be passed on the named argument conf_file (defaults to /etc/csync2/csync2.cfg). Returns true on success or croaks if command execution fails in SUT.
get_cluster_info();
Returns a hashref containing the info parsed from the CLUSTER_INFOS variable. This does not reflect the current state of the cluster but the intended steady state once the LUNs are configured and the nodes have joined.
get_cluster_name();
Returns the cluster name, as defined in the CLUSTER_NAME setting. Croaks if the setting is not defined, as it is a mandatory setting for HA tests.
get_hostname();
Returns the hostname, as defined in the HOSTNAME setting. Croaks if the setting is not defined, as it is a mandatory setting for HA tests.
get_node_to_join();
Returns the hostname of the node to join, as defined in the HA_CLUSTER_JOIN setting. Croaks if the setting is not defined, as this setting is mandatory for all nodes that run ha-cluster-join
. As such, avoid scheduling tests that call this method on nodes that would run ha-cluster-init
instead.
get_ip( $node_hostname );
Returns the IP address of a node given its hostname, either by calling the host
command in SUT (which in turns would do a DNS query on tests using support server), or by searching for the host entry in SUT's /etc/hosts. Returns 0 on failure.
get_my_ip();
Returns the IP address of SUT or 0 if the address cannot be determined. Special case of get_ip()
.
get_node_number();
Returns the number of nodes configured in the cluster.
get_node_index();
Returns the index number of the SUT. This information is taken from the node hostnames, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.
is_node( $node_number );
Checks whether SUT is the node identified by $node_number. Returns true or false. This information is matched against the node hostname, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.
add_to_known_hosts( $host );
Adds $host to the .ssh/known_hosts file of the current user in SUT. Croaks if any of the commands to do so fail.
choose_node( $node_number );
Returns the hostname of the node identified by $node_number. This information relies on the node hostnames, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.
save_state();
Prints the cluster configuration and cluster status in SUT, and saves the screenshot.
is_package_installed( $package );
Checks if $package is installed in SUT. Returns true or false.
check_rsc( $resource );
Checks if cluster resource $resource is configured in the cluster. Returns true or false.
ensure_process_running( $process );
Checks for up to $default_timeout seconds whether process $process is running in SUT. Returns 0 if process is running or croaks on timeout.
ensure_resource_running( $resource, $regexp );
Checks for up to $default_timeout seconds in the output of crm resource status $resource
if a resource $resource is configured in the cluster; uses $regexp to check. Returns 0 on success or croaks on timeout.
ensure_dlm_running();
Checks that the dlm
resource is running in the cluster, and that its associated process (dlm_controld) is running in SUT. Returns 0 if process is running or croaks on error.
write_tag( $tag );
Create a cluster-specific file in /tmp/ of the SUT with $tag as its content. Returns 0 on success or croaks on failure.
read_tag();
Read the content of the cluster-specific file created in /tmp/ with write_tag()
. Returns the content of the file or croaks on failure.
block_device_real_path( $device );
Returns the real path of the block device specified by $device as shown by realpath -ePL
, or croak on failure.
lvm_add_filter( $type, $filter );
Add filter $filter of type $type to /etc/lvm/lvm.conf.
lvm_remove_filter( $filter );
Remove filter $filter from /etc/lvm/lvm.conf.
rsc_cleanup( $resource );
Execute a crm resource cleanup
on the resource identified by $resource.
ha_export_logs();
Upload HA-relevant logs from SUT. These include: crm configuration, cluster bootstrap log, corosync configuration, crm report, list of installed packages, list of iSCSI devices, /etc/mdadm.conf, support config and y2logs. If available, logs from the HAWK test, from CTS and from HANA are also included.
check_cluster_state( [ proceed_on_failure => 1 ] );
Checks the state of the cluster. Calls $crm_mon_cmd and inspects its output checking:
Checks that the reported number of nodes in the output of crm node list
and $crm_mon_cmd is the same.
And runs crm_verify -LV
.
With the named argument proceed_on_failure set to 1, the function will use script_run() and attempt to run all commands in SUT without checking for errors. Without it, the method uses assert_script_run() and will croak on failure.
wait_until_resources_stopped( [ timeout => $timeout, minchecks => $tries ] );
Wait for resources to be stopped. Runs $crm_mon_cmd until there are no resources in stopping state or up to $timeout seconds. Timeout must be specified by the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. The named argument minchecks (defaults to 3, can be disabled with 0) provides a minimum number of times to check independently of the return status; this helps avoid race conditions where the method checks before the HA stack starts to stop the resources. Croaks on timeout.
wait_until_resources_started( [ timeout => $timeout ] );
Wait for resources to be started. Runs crm cluster wait_for_startup
in SUT as well as other verifications on newer versions of SLES (12-SP3+), for up to $timeout seconds for each command. Timeout must be specified by the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. Croaks on timeout.
wait_for_idle_cluster( [ timeout => $timeout ] );
Use cs_wait_for_idle
to wait until the cluster is idle before continuing the tests. Supply a timeout with the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. Croaks on timeout.
get_lun( [ use_once => $bool ] );
Returns a LUN from the LUN list file stored in the support server or in the support NFS share in scenarios without support server. If the named argument use_once is passed and set to true (defaults to true), the returned LUN will be removed from the file, so it will not be selected again. Croaks on failure.
check_device_available( $device, [ $timeout ] );
Checks for the presence of a device in the SUT for up to a defined timeout (defaults to 20 seconds). Returns 0 on success, or croaks on failure.
set_lvm_config( $lvm_config_file, [ use_lvmetad => $val1, locking_type => $val2, use_lvmlockd => $val3, ... ] );
Configures the LVM parameters/values pairs passed as a HASH into the LVM configuration file specified by the first argument $lvm_config_file. These LVM parameters are usually use_lvmetad, locking_type and use_lvmlockd but any other existing parameter from the LVM configuration file is also valid. Parameters that do not exist in the LVM configuration file in SUT will be ignored. Returns 0 on success or croaks on failure.
add_lock_mgr( $lock_manager, [ force => bool ] );
Configures a $lock_manager resource in the cluster configuration on SUT. $lock_mgr usually is either clvmd or lvmlockd, but any other cluster primitive could work as well.
Takes a second named argument force which if set to true will add --force
to the crmsh command. Should be used with care. Defaults to false.
is_not_maintenance_update( $package );
Checks if the package specified in $package is not targeted by a maintenance update. Returns true if the package is not targeted, i.e., MAINTENANCE setting is active and package name does not appear in the BUILD setting nor is it in the list of packages in the related INCIDENT_ID. Returns false in all other cases. Besides the package $package, it also checks for kernel in the BUILD setting and the list of packages, as the tests should always run with updates to the kernel.
activate_ntp();
Enables NTP service in SUT.
script_output_retry_check(cmd=>$cmd, regex_string=>$regex_sring, [retry=>$retry, sleep=>$sleep, ignore_failure=>$ignore_failure]);
Executes command via script_output
subroutine and makes a sanity check against a regular expression. Command output is returned after success, otherwise the command is retried a defined number of times. Test dies after last unsuccessfull retry.
$cmd command being executed.
$regex_string regular expression to check output against.
$retry number of retries. Defaults to 5
.
$sleep sleep time between retries. Defaults to 10s
.
$ignore_failure do not kill the test upon failure.
Example: script_output_retry_check(cmd=>'hostname', regex_string=>'^node01$', retry=>'100', sleep=>'60', ignore_failure=>'1');
collect_sbd_delay_parameters();
Collects a series of SBD parameters from the SUT and returns them in a HASH format. Commands are collected from /etc/sysconfig/sbd
or by filtering the output of corosync-cmapctl
. Due to possible race conditions, all these parameters are collected using the helper function script_output_retry_check
also defined in this library.
calculate_sbd_start_delay(\%sbd_parameters);
Calculates start time delay after node is fenced. This delay time is used as a wait time after a node fence to prevent cluster failures in cases where the fenced node restarts too quickly. Delay time is used either if specified in sbd config variable SBD_DELAY_START or calculated by the formula:
corosync token timeout + consensus timeout + pcmk_delay_max + msgwait
Variables corosync_token and corosync_consensus are converted to seconds. For diskless SBD pcmk_delay_max is set to static 30s.
%sbd_parameters = {
'corosync_token' => <runtime.config.totem.token>,
'corosync_consensus' => <runtime.config.totem.consensus>,
'sbd_watchdog_timeout' => <SBD_WATCHDOG_TIMEOUT>,
'sbd_delay_start' => <SBD_DELAY_START>,
'pcmk_delay_max' => <pcmk_delay_max>
}
If %sbd_parameters
argument is omitted, then function will try to obtain the values from the configuration files. See collect_sbd_delay_parameters
setup_sbd_delay()
This function configures in the SUT the SBD_DELAY_START parameter in /etc/sysconfig/sbd
to whatever value is supplied in the setting HA_SBD_START_DELAY, and then call calculate_sbd_start_delay
and set_sbd_service_timeout
to set the service timeout for the SBD service in the SUT. It returns the calculated delay. Will croak if any of the commands sent to the SUT fail.
set_sbd_service_timeout($service_timeout)
Set the service timeout for the SBD service in the SUT to the number of seconds passed as argument.
This is accomplished by configuring a systemd override file for the SBD service.
If the override file exists, the function will edit it and replace the timeout there, otherwise it creates the file from scratch.
check_iscsi_failure();
Workaround for bsc#1129385, checks system log for iSCSI connection failures, if necessary restarts iscsi and pacemaker service
Check crm status output against a hardcode regular expression in order to check the cluster health
crm_maintenance_status();
Check maintenance mode status. Returns true (maintenance active) or false (maintenance inactive). Croaks if unknown status is received.
crm_wait_for_maintenance(target_state=>$target_state, [loop_sleep=>$loop_sleep, timeout=>$timeout]);
Wait for maintenance to be turned on or off. Croaks on timeout.
target_state Target state of the maintenance mode (true/false)
loop_sleep Override default sleep value between checks
timeout Override default timeout value
crm_check_resource_location(resource=>$resource, [wait_for_target=>$wait_for_target, timeout=>$timeout]);
Checks current resource location, returns hostname of the node. Can be used to wait for desired state Eg: after failover. Croaks upon timeout.
wait_for_target Target location of the resource specified - physical hostname
resource Resource to check
timeout Override default timeout value
generate_lun_list()
This generates the information that nodes need to use iSCSI. This is stored in /tmp/$cluster_name-lun.list where nodes can get it using scp.