lib/hacluster.pm

SYNOPSIS

Library with common methods and default values for High Availability Extension (HA or HAE) tests.

Global Variables

$default_timeout: default scaled timeout for most operations with SUT
$join_timeout: default scaled timeout for ha-cluster-join calls
$softdog_timeout: default scaled timeout for the softdog watchdog
$crm_mon_cmd: crm_mon (crm monitoring) command
$corosync_token: command to filter the value of runtime.config.totem.token from the output of corosync-cmapctl
$corosync_consensus: command to filter the value of runtime.config.totem.consensus from the output of corosync-cmapctl
$sbd_watchdog_timeout: command to extract the value of SBD_WATCHDOG_TIMEOUT from /etc/sysconfig/sbd
$sbd_delay_start: command to extract the value of SBD_DELAY_START from /etc/sysconfig/sbd
$pcmk_delay_max: command to get the value of the pcmd_delay_max parameter from the STONITH resource in the cluster configuration.

exec_csync

exec_csync();

Runs csync2 -vxF in the SUT, to sync files from SUT to other nodes in the cluster. Sometimes it is expected that the first call to csync2 -vxF fails, so this method will run the command twice.

add_file_in_csync

add_file_in_csync( value => '/path/to/file', [ conf_file => '/path/to/csync2.cfg' ] );

Adds /path/to/file to a csync2 configuration file in SUT. Path to add must be passed with the named argument value, while csync2 configuration file can be passed on the named argument conf_file (defaults to /etc/csync2/csync2.cfg). Returns true on success or croaks if command execution fails in SUT.

get_cluster_info

get_cluster_info();

Returns a hashref containing the info parsed from the CLUSTER_INFOS variable. This does not reflect the current state of the cluster but the intended steady state once the LUNs are configured and the nodes have joined.

get_cluster_name

get_cluster_name();

Returns the cluster name, as defined in the CLUSTER_NAME setting. Croaks if the setting is not defined, as it is a mandatory setting for HA tests.

get_hostname

get_hostname();

Returns the hostname, as defined in the HOSTNAME setting. Croaks if the setting is not defined, as it is a mandatory setting for HA tests.

get_node_to_join

get_node_to_join();

Returns the hostname of the node to join, as defined in the HA_CLUSTER_JOIN setting. Croaks if the setting is not defined, as this setting is mandatory for all nodes that run ha-cluster-join. As such, avoid scheduling tests that call this method on nodes that would run ha-cluster-init instead.

get_ip

get_ip( $node_hostname );

Returns the IP address of a node given its hostname, either by calling the host command in SUT (which in turns would do a DNS query on tests using support server), or by searching for the host entry in SUT's /etc/hosts. Returns 0 on failure.

get_my_ip

get_my_ip();

Returns the IP address of SUT or 0 if the address cannot be determined. Special case of get_ip().

get_node_number

get_node_number();

Returns the number of nodes configured in the cluster.

get_node_index

get_node_index();

Returns the index number of the SUT. This information is taken from the node hostnames, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.

is_node

is_node( $node_number );

Checks whether SUT is the node identified by $node_number. Returns true or false. This information is matched against the node hostname, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.

add_to_known_hosts

add_to_known_hosts( $host );

Adds $host to the .ssh/known_hosts file of the current user in SUT. Croaks if any of the commands to do so fail.

choose_node

choose_node( $node_number );

Returns the hostname of the node identified by $node_number. This information relies on the node hostnames, so be sure to define proper hostnames in the tests settings, for example alpha-node01, alpha-node02, etc.

save_state

save_state();

Prints the cluster configuration and cluster status in SUT, and saves the screenshot.

is_package_installed

is_package_installed( $package );

Checks if $package is installed in SUT. Returns true or false.

check_rsc

check_rsc( $resource );

Checks if cluster resource $resource is configured in the cluster. Returns true or false.

ensure_process_running

ensure_process_running( $process );

Checks for up to $default_timeout seconds whether process $process is running in SUT. Returns 0 if process is running or croaks on timeout.

ensure_resource_running

ensure_resource_running( $resource, $regexp );

Checks for up to $default_timeout seconds in the output of crm resource status $resource if a resource $resource is configured in the cluster; uses $regexp to check. Returns 0 on success or croaks on timeout.

ensure_dlm_running

ensure_dlm_running();

Checks that the dlm resource is running in the cluster, and that its associated process (dlm_controld) is running in SUT. Returns 0 if process is running or croaks on error.

execute_crm_resource_refresh_and_check

execute_crm_resource_refresh_and_check();

Execute crm resource refresh for specified resource on instance_hostname and check the crm_failcount returns value=0. Check no failover happens and state of cluster resources is healthy.

write_tag

write_tag( $tag );

Create a cluster-specific file in /tmp/ of the SUT with $tag as its content. Returns 0 on success or croaks on failure.

read_tag

read_tag();

Read the content of the cluster-specific file created in /tmp/ with write_tag(). Returns the content of the file or croaks on failure.

block_device_real_path

block_device_real_path( $device );

Returns the real path of the block device specified by $device as shown by realpath -ePL, or croak on failure.

lvm_add_filter

lvm_add_filter( $type, $filter );

Add filter $filter of type $type to /etc/lvm/lvm.conf.

lvm_remove_filter

lvm_remove_filter( $filter );

Remove filter $filter from /etc/lvm/lvm.conf.

rsc_cleanup

rsc_cleanup( $resource );

Execute a crm resource cleanup on the resource identified by $resource.

ha_export_logs

ha_export_logs();

Upload HA-relevant logs from SUT. These include: crm configuration, cluster bootstrap log, corosync configuration, crm report, list of installed packages, list of iSCSI devices, /etc/mdadm.conf, support config and y2logs. If available, logs from the HAWK test, from CTS and from HANA are also included.

check_cluster_state

check_cluster_state( [ proceed_on_failure => 1 ] );

Checks the state of the cluster. Calls $crm_mon_cmd and inspects its output checking:

The current state of the cluster.
Inactive resources.
partition with quorum

Checks that the reported number of nodes in the output of crm node list and $crm_mon_cmd is the same by calling check_online_nodes.

And runs crm_verify -LV.

With the named argument proceed_on_failure set to 1, the function will use script_run() and attempt to run all commands in SUT without checking for errors. Without it, the method uses assert_script_run() and will croak on failure.

check_online_nodes

check_online_nodes( [ proceed_on_failure => 1 ] );

Checks that the reported number of nodes in the output of crm node list and $crm_mon_cmd is the same.

With the named argument proceed_on_failure set to 1, the function will only report the number of nodes configured and online. Otherwise it will die when the number of configured nodes is different than the number of online nodes, or if it fails to get any of these numbers.

This function is not exported and it's used only by check_cluster_state.

This function requires crmsh-4.4.2 or newer.

wait_until_resources_stopped

wait_until_resources_stopped( [ timeout => $timeout, minchecks => $tries ] );

Wait for resources to be stopped. Runs $crm_mon_cmd until there are no resources in stopping state or up to $timeout seconds. Timeout must be specified by the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. The named argument minchecks (defaults to 3, can be disabled with 0) provides a minimum number of times to check independently of the return status; this helps avoid race conditions where the method checks before the HA stack starts to stop the resources. Croaks on timeout.

wait_until_resources_started

wait_until_resources_started( [ timeout => $timeout ] );

Wait for resources to be started. Runs crm cluster wait_for_startup in SUT as well as other verifications on newer versions of SLES (12-SP3+), for up to $timeout seconds for each command. Timeout must be specified by the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. Croaks on timeout.

wait_for_idle_cluster

wait_for_idle_cluster( [ timeout => $timeout ] );

Use cs_wait_for_idle to wait until the cluster is idle before continuing the tests. Supply a timeout with the named argument timeout (defaults to 120 seconds). This timeout is scaled by the factor specified in the TIMEOUT_SCALE setting. Dies on timeout.

get_lun

get_lun( [ use_once => $bool ] );

Returns a LUN from the LUN list file stored in the support server or in the support NFS share in scenarios without support server. If the named argument use_once is passed and set to true (defaults to true), the returned LUN will be removed from the file, so it will not be selected again. Croaks on failure.

check_device_available

check_device_available( $device, [ $timeout ] );

Checks for the presence of a device in the SUT for up to a defined timeout (defaults to 20 seconds). Returns 0 on success, or croaks on failure.

set_lvm_config

set_lvm_config( $lvm_config_file, [ use_lvmetad => $val1, locking_type => $val2, use_lvmlockd => $val3, ... ] );

Configures the LVM parameters/values pairs passed as a HASH into the LVM configuration file specified by the first argument $lvm_config_file. These LVM parameters are usually use_lvmetad, locking_type and use_lvmlockd but any other existing parameter from the LVM configuration file is also valid. Parameters that do not exist in the LVM configuration file in SUT will be ignored. Returns 0 on success or croaks on failure.

add_lock_mgr

add_lock_mgr( $lock_manager, [ force => bool ] );

Configures a $lock_manager resource in the cluster configuration on SUT. $lock_mgr usually is either clvmd or lvmlockd, but any other cluster primitive could work as well.

Takes a second named argument force which if set to true will add --force to the crmsh command. Should be used with care. Defaults to false.

is_not_maintenance_update

is_not_maintenance_update( $package );

Checks if the package specified in $package is not targeted by a maintenance update. Returns true if the package is not targeted, i.e., MAINTENANCE setting is active and package name does not appear in the BUILD setting nor is it in the list of packages in the related INCIDENT_ID. Returns false in all other cases. Besides the package $package, it also checks for kernel in the BUILD setting and the list of packages, as the tests should always run with updates to the kernel.

activate_ntp

activate_ntp();

Enables NTP service in SUT.

script_output_retry_check

script_output_retry_check(cmd=>$cmd, regex_string=>$regex_sring, [retry=>$retry, sleep=>$sleep, ignore_failure=>$ignore_failure]);

Executes command via script_output subroutine and makes a sanity check against a regular expression. Command output is returned after success, otherwise the command is retried a defined number of times. Test dies after last unsuccessfull retry.

$cmd command being executed.

$regex_string regular expression to check output against.

$retry number of retries. Defaults to 5.

$sleep sleep time between retries. Defaults to 10s.

$ignore_failure do not kill the test upon failure.

Example: script_output_retry_check(cmd=>'hostname', regex_string=>'^node01$', retry=>'100', sleep=>'60', ignore_failure=>'1');

collect_sbd_delay_parameters

collect_sbd_delay_parameters();

Collects a series of SBD parameters from the SUT and returns them in a HASH format. Commands are collected from /etc/sysconfig/sbd or by filtering the output of corosync-cmapctl. Due to possible race conditions, all these parameters are collected using the helper function script_output_retry_check also defined in this library.

calculate_sbd_start_delay

calculate_sbd_start_delay(\%sbd_parameters);

Calculates start time delay after node is fenced. This delay time is used as a wait time after a node fence to prevent cluster failures in cases where the fenced node restarts too quickly. Delay time is used either if specified in sbd config variable SBD_DELAY_START or calculated by the formula:

corosync token timeout + consensus timeout + pcmk_delay_max + msgwait

Variables corosync_token and corosync_consensus are converted to seconds. For diskless SBD pcmk_delay_max is set to static 30s.

%sbd_parameters = {
    'corosync_token' => <runtime.config.totem.token>,
    'corosync_consensus' => <runtime.config.totem.consensus>,
    'sbd_watchdog_timeout' => <SBD_WATCHDOG_TIMEOUT>,
    'sbd_delay_start' => <SBD_DELAY_START>,
    'pcmk_delay_max' => <pcmk_delay_max>
}

If %sbd_parameters argument is omitted, then function will try to obtain the values from the configuration files. See collect_sbd_delay_parameters

setup_sbd_delay

setup_sbd_delay()

This function configures in the SUT the SBD_DELAY_START parameter in /etc/sysconfig/sbd to whatever value is supplied in the setting HA_SBD_START_DELAY, and then call calculate_sbd_start_delay and set_sbd_service_timeout to set the service timeout for the SBD service in the SUT. It returns the calculated delay. Will croak if any of the commands sent to the SUT fail.

set_sbd_service_timeout

set_sbd_service_timeout($service_timeout)

Set the service timeout for the SBD service in the SUT to the number of seconds passed as argument.

This is accomplished by configuring a systemd override file for the SBD service.

If the override file exists, the function will edit it and replace the timeout there, otherwise it creates the file from scratch.

check_iscsi_failure

check_iscsi_failure();

Workaround for bsc#1129385, checks system log for iSCSI connection failures, if necessary restarts iscsi and pacemaker service

cluster_status_matches_regex

Check crm status output against a hardcode regular expression in order to check the cluster health

SHOW_CLUSTER_STATUS - Output from 'crm status' command

crm_maintenance_status

crm_maintenance_status();

Check maintenance mode status. Returns true (maintenance active) or false (maintenance inactive). Croaks if unknown status is received.

crm_wait_for_maintenance

crm_wait_for_maintenance(target_state=>$target_state, [loop_sleep=>$loop_sleep, timeout=>$timeout]);

Wait for maintenance to be turned on or off. Croaks on timeout.

target_state Target state of the maintenance mode (true/false)

loop_sleep Override default sleep value between checks

timeout Override default timeout value

crm_check_resource_location

crm_check_resource_location(resource=>$resource, [wait_for_target=>$wait_for_target, timeout=>$timeout]);

Checks current resource location, returns hostname of the node. Can be used to wait for desired state Eg: after failover. Croaks upon timeout.

wait_for_target Target location of the resource specified - physical hostname

resource Resource to check

timeout Override default timeout value

generate_lun_list

generate_lun_list()

This generates the information that nodes need to use iSCSI. This is stored in /tmp/$cluster_name-lun.list where nodes can get it using scp.

set_cluster_parameter

set_cluster_parameter(resource=>'Totoro', parameter=>'neighbour', value=>'my');

Manage HA cluster parameter using crm shell.

resource: Resource containing parameter
parameter: Parameter name
value: Target parameter value

show_cluster_parameter

show_cluster_parameter(resource=>'Totoro', parameter=>'neighbour');

Show cluster parameter value using CRM shell.

resource: Resource containing parameter
parameter: Parameter name

prepare_console_for_fencing

prepare_console_for_fencing();

Some HA tests modules will cause a node to fence. In these cases, the tests will need to assert a grub2 or bootmenu screen, so the modules will need to select the root-console before any calls to assert_screen. On some systems, a simple call to select_console 'root-console' will not work as the console could be "dirty" with messages obscuring the root prompt. This function will pre-select the console without asserting anything on the screen, clear it, and then select it normally.

crm_get_failcount

crm_get_failcount(crm_resource=>'ASCS_00' [, assert_result=>'true']);

Returns failcount number for specified resource.

crm_resource: Cluster resource name
assert_result: Make test fail instead of returning value. Default: 'false'

crm_wait_failcount

crm_wait_failcount(crm_resource=>'ASCS_00' [, timeout=>'60', delay=>'3']);

Waits till crm fail count reached non-zero value of fail after timeout

crm_resource: Cluster resource name
timeout: Give up after timeout in sec. Default 60 sec.
delay: Delay between retries. Default: 5 sec

crm_resources_by_class

crm_resources_by_class(primitive_class=>'stonith:external/sbd');

Returns resource name ARRAYREF filtered by class. Refer to CRM help pages for details: crm configure show --help and crm ra classes

primitive_class: CRM resource class name. Example: 'stonith:external/sbd', 'IPaddr2'

crm_resource_locate

crm_resource_locate(crm_resource=>'ASCS_00');

Returns hostname of cluster node where defined crm_resource currently resides.

crm_resource: Cluster resource name

crm_resource_meta_show

crm_resource_meta_show(resource=>'Totoro', meta_argument=>'neighbour');

Return resource meta-argument value.

resource: Resource containing parameter
meta_argument: Meta-argument name

crm_resource_meta_set

crm_resource_meta_set(resource=>'Totoro', meta_argument=>'neighbour', argument_value=>'my');

Change or delete resource meta-argument value.

resource: Resource containing parameter
meta_argument: Meta-argument name
argument_value: Meta-argument value. If undef, meta argument will be removed.

crm_list_options

my $ret = crm_list_options();

Executes a series of crm commands to list metadata options for different resource types (primitive, fencing, cluster attributes) and validates that their XML output is well-formed. This function is designed to test a new feature in crmsh version 5.0.0 and newer, which provides a CLI interface to query resource meta-attributes.

The function will execute the following commands:

crm_resource --list-options primitive --output-as xml
crm_resource --list-options fencing --output-as xml
crm_attribute --list-options cluster --all --output-as=xml

Return values:

1: All commands executed successfully and their XML output was valid.
0: The installed crmsh version is older than 5.0.0. The function performs no operation.
-1: At least one of the commands produced output that was not valid XML.