bityard Blog

// Check_MK - Race Condition in Processing of Piggyback Data

In current versions of Check_MK – namely 1.2.8p24 and 1.4.0p8 – which are included in the Open Monitoring Distribution, there are several race conditions in the code parts responsible for processing of piggyback data sent by the agents. These race conditions are usually triggered when a host is monitored more than once, e.g. by the use of cluster services or if by chance a manual check on the command line coincides with a scheduled check from the monitoring core. While not critical to the whole monitoring process, the effect of these races are intermittent and annoying UNKNOWN - [Errno 2] No such file or directory errors for the monitored host. This article provides patches for both Check_MK versions in order to deal with the spurious error messages.

The normal order of processing of piggybacked data received from agents seems to be as follows:

  1. The Check_MK server calls the agent on the monitored host.

  2. The agent on the monitored host responds with the output of its own host as well as the piggybacked output from other entities.

  3. The Check_MK server processes the agents response and extracts the piggybacked data.

  4. If there is already data stored from the piggybacked host, it's considered stale and is thus removed.

  5. The piggybacked data is stored in a file named <AGENT_HOSTNAME> in the directory named <PIGGYBACKED_HOSTNAME> under the path ${OMD_ROOT}/var/check_mk/piggyback/ (e.g.: ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>).

  6. The Check_MK server lists the contents of the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME>. For each file item in the list it processes the data contained in the file.

  7. The Check_MK server removes the files in and subsequently the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/ itself.

In case a host is monitored more than once, e.g. through the use of cluster services, the steps 4 through 7 above can overlap for two or more concurrent monitoring processes. Such an overlap can cause one process at step 4 to delete the directory ${OMD_ROOT}/var/check_mk/piggyback/<PIGGYBACKED_HOSTNAME>/<AGENT_HOSTNAME> while another process is still working on the directory at step 5 or step 6.

The effect of this issue are mainly intermittent and annoying UNKNOWN - [Errno 2] No such file or directory errors in the Check_MK WebUI as well as errors like the following examples in the log files:

2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> An exception occured while processing host "hostA"
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> Traceback (most recent call last):
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     status = command_function(command_tuple)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     return mode_function(hostname, ipaddress)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1290, in do_check
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     do_all_checks_on_host(hostname, ipaddress, only_check_types)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1506, in do_all_checks_on_host
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     info = get_info_for_check(hostname, ipaddress, infotype)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     ignore_check_interval=True)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 541, in get_realhost_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     store_persisted_info(hostname, persisted)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 567, in store_persisted_info
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >>     os.rename("%s.#new" % file_path, file_path)
2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> OSError: [Errno 2] No such file or directory
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> An exception occured while processing host "hostA"
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> Traceback (most recent call last):
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     status = command_function(command_tuple)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     return mode_function(hostname, ipaddress)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1241, in do_check
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     do_all_checks_on_host(hostname, ipaddress, only_check_types)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1457, in do_all_checks_on_host
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     info = get_info_for_check(hostname, ipaddress, infotype)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     ignore_check_interval=True)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 540, in get_realhost_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     store_piggyback_info(hostname, piggybacked)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>   File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 651, in store_piggyback_info
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >>     os.rename(dir + "/.new." + sourcehost, dir + "/" + sourcehost)
2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> OSError: [Errno 2] No such file or directory

In version 1.4.0p8 of Check_MK there is already a partial fix for the described issue in the form of Werk 4755 (Git commit 77b3bfc3). For both Check_MK versions the following two trivial patches are provided in order to deal with the spurious error messages:

In the current development version of Check_MK – probably to be named 1.5 – there is a rather large amount of code rewrite and movement. From a first glance i couldn't determine if the issue still exists there.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information