====== Check_MK - Race Condition in Processing of Piggyback Data ====== In current versions of [[https://mathias-kettner.de/check_mk_download.php|Check_MK]] -- namely [[https://mathias-kettner.de/check_mk_download_version.php?HTML=yes&version=1.2.8p24&edition=cre|1.2.8p24]] and [[https://mathias-kettner.de/check_mk_download_version.php?HTML=yes&version=1.4.0p8&edition=cre|1.4.0p8]] -- which are included in the [[http://omdistro.org/|Open Monitoring Distribution]], there are several race conditions in the code parts responsible for processing of piggyback data sent by the agents. These race conditions are usually triggered when a host is monitored more than once, e.g. by the use of cluster services or if by chance a manual check on the command line coincides with a scheduled check from the monitoring core. While not critical to the whole monitoring process, the effect of these races are intermittent and annoying ''UNKNOWN - [Errno 2] No such file or directory'' errors for the monitored host. This article provides patches for both Check_MK versions in order to deal with the spurious error messages. The normal order of processing of piggybacked data received from agents seems to be as follows: - The Check_MK server calls the agent on the monitored host. < - The agent on the monitored host responds with the output of its own host as well as the piggybacked output from other entities. < - The Check_MK server processes the agents response and extracts the piggybacked data. < - If there is already data stored from the piggybacked host, it's considered stale and is thus removed. < - The piggybacked data is stored in a file named '''' in the directory named '''' under the path ''${OMD_ROOT}/var/check_mk/piggyback/'' (e.g.: ''${OMD_ROOT}/var/check_mk/piggyback//''). < - The Check_MK server lists the contents of the directory ''${OMD_ROOT}/var/check_mk/piggyback//''. For each file item in the list it processes the data contained in the file. < - The Check_MK server removes the files in and subsequently the directory ''${OMD_ROOT}/var/check_mk/piggyback//'' itself. < In case a host is monitored more than once, e.g. through the use of cluster services, the steps 4 through 7 above can overlap for two or more concurrent monitoring processes. Such an overlap can cause one process at step 4 to delete the directory ''${OMD_ROOT}/var/check_mk/piggyback//'' while another process is still working on the directory at step 5 or step 6. The effect of this issue are mainly intermittent and annoying ''UNKNOWN - [Errno 2] No such file or directory'' errors in the Check_MK WebUI as well as errors like the following examples in the log files: 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> An exception occured while processing host "hostA" 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> Traceback (most recent call last): 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> status = command_function(command_tuple) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> return mode_function(hostname, ipaddress) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1290, in do_check 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> do_all_checks_on_host(hostname, ipaddress, only_check_types) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1506, in do_all_checks_on_host 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> info = get_info_for_check(hostname, ipaddress, infotype) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> ignore_check_interval=True) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 541, in get_realhost_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> store_persisted_info(hostname, persisted) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 567, in store_persisted_info 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> os.rename("%s.#new" % file_path, file_path) 2017-07-18 06:32:07 [4] Check_MK helper [16533]: : >> OSError: [Errno 2] No such file or directory 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> An exception occured while processing host "hostA" 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> Traceback (most recent call last): 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 125, in do_keepalive 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> status = command_function(command_tuple) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/keepalive.py", line 437, in execute_keepalive_command 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> return mode_function(hostname, ipaddress) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1241, in do_check 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> do_all_checks_on_host(hostname, ipaddress, only_check_types) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 1457, in do_all_checks_on_host 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> info = get_info_for_check(hostname, ipaddress, infotype) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 319, in get_info_for_check 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> info = apply_parse_function(get_host_info(hostname, ipaddress, section_name, max_cachefile_age, ignore_check_interval), section_name) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 371, in get_host_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> ignore_check_interval=True) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 540, in get_realhost_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> store_piggyback_info(hostname, piggybacked) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> File "/omd/sites/SITE/share/check_mk/modules/check_mk_base.py", line 651, in store_piggyback_info 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> os.rename(dir + "/.new." + sourcehost, dir + "/" + sourcehost) 2017-03-27 00:02:21 [4] Check_MK helper [27005]: : >> OSError: [Errno 2] No such file or directory In version [[https://mathias-kettner.de/check_mk_download_version.php?HTML=yes&version=1.4.0p8&edition=cre|1.4.0p8]] of Check_MK there is already a partial fix for the described issue in the form of [[https://mathias-kettner.de/check_mk_werks.php?werk_id=4755|Werk 4755]] ([[http://git.mathias-kettner.de/git/?p=check_mk.git;a=commit;h=77b3bfc33f30b980db5e61351ae4a86b315ba357|Git commit 77b3bfc3]]). For both Check_MK versions the following two trivial patches are provided in order to deal with the spurious error messages: * {{:2017:07:21:check_mk_base.py_1.2.8p24.patch|Patch for Check_MK 1.2.8p24}}\\ to be applied by the command: root@host:~# patch -b -z .race < check_mk_base.py_1.2.8p24.patch < * {{:2017:07:21:check_mk_base.py_1.4.0p8.patch|Patch for Check_MK 1.4.0p8}}\\ to be applied by the command: root@host:~# patch -b -z .race < check_mk_base.py_1.4.0p8.patch < In the current development version of Check_MK -- probably to be named 1.5 -- there is a rather large amount of code rewrite and movement. From a first glance i couldn't determine if the issue still exists there.