bityard Blog

// Check_MK Monitoring - SAP Cloud Connector

The SAP Cloud Connector provides a service which is used to connect on-premise systems – non-SAP, SAP ECC and SAP HANA – with applications running on the SAP Cloud Platform. This article introduces a new Check_MK agent and several service checks to monitor the status of the SAP Cloud Connector and its connections to the SAP Cloud Platform.

For the impatient and TL;DR here is the Check_MK package of the SAP Cloud Connector monitoring checks:

SAP Cloud Connector monitoring checks (Compatible with Check_MK versions 1.4.0p19 and later)

The sources are to be found in my Check_MK repository on GitHub


Monitoring the SAP Cloud Connector can be done in two different, not mutually exclusive, ways. The first approach uses the traditional application monitoring features, already built into Check_MK, like:

  • the presence and count of the application processes

  • the reachability of the applications TCP ports

  • the validity of the SSL Certificates

  • queries to the applications health-check URL

The first approach is covered by the section Built-in Check_MK Monitoring below.

The second approach uses a new Check_MK agent and several service checks, dedicated to monitor the internal status of the SAP Cloud Connector and its connections to the SAP Cloud Platform. The new Check_MK agent uses the monitoring API provided by the SAP Cloud Connector in order the monitor the application specific states and metrics. The monitoring endpoints on the SAP Cloud Connector currently used by this Check_MK agent are the:

  • “List of Subaccounts” (URL: https://<scchost>:<sccport>/api/monitoring/subaccounts)

  • “List of Open Connections” (URL: https://<scchost>:<sccport>/api/monitoring/connections/backends)

  • “Performance Monitor Data” (URL: https://<scchost>:<sccport>/api/monitoring/performance/backends)

  • “Top Time Consumers” (URL: https://<scchost>:<sccport>/api/monitoring/performance/toptimeconsumers)

At the time of writing, there unfortunately is no monitoring endpoint on the SAP Cloud Connector for the Most Recent Requests metric. This metric is currently only available via the SAP Cloud Connectors WebUI. The Most Recent Requests metric would be a much more interesting and useful metric than the currently available Top Time Consumers or Performance Monitor Data, both of which have limitations. The application requests covered by the Top Time Consumers metric need a manual acknowledgement inside the SAP Cloud Connector in order to reset events with the longest request runtime, which limits the metrics usability for external monitoring tools. The Performance Monitor Data metric aggregates the application requests into buckets based on their overall runtime. By itself this can be useful for external monitoring tools and is in fact used by the Check_MK agent covered in this article. In the process of runtime bucket aggregation though, the Performance Monitor Data metric hides the much more useful breakdown of each request into runtime subsections (“External (Back-end)”, “Open Connection”, “Internal (SCC)”, “SSO Handling” and “Latency Effects”). Hopefully the Most Recent Requests metric will in the future also be exposed via the monitoring API provided by the SAP Cloud Connector. The new Check_MK agent can then be extended to use the newly exposed metric in order to gain a more fine grained insight into the runtime of application requests through the SAP Cloud Connector.

The second approach is covered by the section SAP Cloud Connector Agent below.

Built-in Check_MK Monitoring

Application Processes

To monitor the SAP Cloud Connector process, use the standard Check_MK check “State and count of processes”. This can be found in the WATO WebUI under:

-> Manual Checks
   -> Applications, Processes & Services
      -> State and count of processes
         -> Create rule in folder ...
            -> Rule Options
               Description: Process monitoring of the SAP Cloud Connector
               Checktype: [ps - State and Count of Processes]
               Process Name: SAP Cloud Connector

            -> Parameters
               [x] Process Matching
               [Exact name of the process without arguments]
               [/opt/sapjvm_8/bin/java]

               [x] Name of the operating system user
               [Exact name of the operating system user]
               [sccadmin]

               [x] Levels for process count
               Critical below [1] processes
               Warning below [1] processes
               Warning above [1] processes
               Critical above [2] processes

            -> Conditions
               Folder [The folder containing the SAP Cloud Connector systems]
               and/or
               Explicit hosts [x]
               Specify explicit host names [SAP Cloud Connector systems]

Application Health Check

To implement a rudimentary monitoring of the SAP Cloud Connector application health, use the standard Check_MK check “Check HTTP service” to query the Health Check endpoint of the monitoring API provided by the SAP Cloud Connector. The “Check HTTP service” can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Active checks (HTTP, TCP, etc.)
      -> Check HTTP service
         -> Create rule in folder ...
            -> Rule Options
               Description: SAP Cloud Connector (SCC)

            -> Check HTTP service
               Name: SAP SCC
               [x] Check the URL
               [x] URI to fetch (default is /)
               [/exposed?action=ping]

               [x] TCP Port
               [8443]

               [x] Use SSL/HTTPS for the connection:
               [Use SSL with auto negotiation]

            -> Conditions
               Folder [The folder containing the SAP Cloud Connector systems]
               and/or
               Explicit hosts [x]
               Specify explicit host names [SAP Cloud Connector systems]

SSL Certificates

To monitor the validity of the SSL certificate of the SAP Cloud Connector WebUI, use the standard Check_MK check “Check HTTP service”. The “Check HTTP service” can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Active checks (HTTP, TCP, etc.)
      -> Check HTTP service
         -> Create rule in folder ...
            -> Rule Options
               Description: SAP Cloud Connector (SCC)

            -> Check HTTP service
               Name: SAP SCC Certificate
               [x] Check SSL Certificate Age
               Warning at or below [30] days
               Critical at or below [60] days

               [x] TCP Port
               [8443]

            -> Conditions
               Folder [The folder containing the SAP Cloud Connector systems]
               and/or
               Explicit hosts [x]
               Specify explicit host names [SAP Cloud Connector systems]

SAP Cloud Connector Agent

The new Check_MK package to monitor the status of the SAP Cloud Connector and its connections to the SAP Cloud Platform consists of three major parts – an agent plugin, two check plugins and several auxiliary files and plugins (WATO plugins, Perf-o-meter plugins, metrics plugins and man pages).

Prerequisites

The following prerequisites are necessary in order for the SAP Cloud Connector agent to work properly:

  • A SAP Cloud Connector application user must be created for the Check_MK agent to be able to authenticate against the SAP Cloud Connector and gain access to the protected monitoring API endpoints. See the article SAP Cloud Connector - Configuring Multiple Local Administrative Users on how to create a new application user.

  • A DNS alias or an additional IP address for the SAP Cloud Connector service.

  • An additional host in Check_MK for the SAP Cloud Connector service with the previously created DNS alias or IP address.

  • Installation of the Python requests library on the Check_MK server. This library is used in the Check_MK agent plugin agent_sapcc to perform the authentication and the HTTP requests against the monitoring API of the SAP Cloud Connector. On e.g. RHEL based systems it can be installed with:

    root@host:# yum install python-requests
    
  • Installation of the new Check_MK package for the SAP Cloud Connector monitoring checks on the Check_MK server.

SAP Cloud Connector Agent Plugin

The Check_MK agent plugin agent_sapcc is responsible for querying the endpoints of the monitoring API on the SAP Cloud Connector, which are described above. It transforms the data returned from the monitoring endpoints into a format digestible by Check_MK. The following example shows the – anonymized and abbreviated – agent plugin output for a SAP Cloud Connector system:

<<<check_mk>>>
Version: 0.1

<<<sapcc_connections_backends:sep(59)>>>
subaccounts,abcdefghi,locationID;Test Location
subaccounts,abcdefghi,regionHost;hana.ondemand.com
subaccounts,abcdefghi,subaccount;abcdefghi

<<<sapcc_performance_backends:sep(59)>>>
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,1,minimumCallDurationMs;10
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,1,numberOfCalls;1
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,2,minimumCallDurationMs;20
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,2,numberOfCalls;36
[...]
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,20,minimumCallDurationMs;3000
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,21,minimumCallDurationMs;4000
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,22,minimumCallDurationMs;5000
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,name;PROTOCOL/sapecc.example.com:44300
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,protocol;PROTOCOL
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,virtualHost;sapecc.example.com
subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,virtualPort;44300
subaccounts,abcdefghi,locationID;Test Location
subaccounts,abcdefghi,regionHost;hana.ondemand.com
subaccounts,abcdefghi,sinceTime;2019-02-13T08:05:36.084 +0100
subaccounts,abcdefghi,subaccount;abcdefghi

<<<sapcc_performance_toptimeconsumers:sep(59)>>>
subaccounts,abcdefghi,locationID;Test Location
subaccounts,abcdefghi,regionHost;hana.ondemand.com
subaccounts,abcdefghi,requests,0,externalTime;373
subaccounts,abcdefghi,requests,0,id;932284302
subaccounts,abcdefghi,requests,0,internalBackend;sapecc.example.com:PORT
subaccounts,abcdefghi,requests,0,openRemoteTime;121
subaccounts,abcdefghi,requests,0,protocol;PROTOCOL
subaccounts,abcdefghi,requests,0,receivedBytes;264
subaccounts,abcdefghi,requests,0,resource;/sap-webservice-url/
subaccounts,abcdefghi,requests,0,sentBytes;4650
subaccounts,abcdefghi,requests,0,startTime;2019-02-13T11:31:59.113 +0100
subaccounts,abcdefghi,requests,0,totalTime;536
subaccounts,abcdefghi,requests,0,user;RFC_USER
subaccounts,abcdefghi,requests,0,virtualBackend;sapecc.example.com:PORT
subaccounts,abcdefghi,requests,1,externalTime;290
subaccounts,abcdefghi,requests,1,id;1882731830
subaccounts,abcdefghi,requests,1,internalBackend;sapecc.example.com:PORT
subaccounts,abcdefghi,requests,1,latencyTime;77
subaccounts,abcdefghi,requests,1,openRemoteTime;129
subaccounts,abcdefghi,requests,1,protocol;PROTOCOL
subaccounts,abcdefghi,requests,1,receivedBytes;264
subaccounts,abcdefghi,requests,1,resource;/sap-webservice-url/
subaccounts,abcdefghi,requests,1,sentBytes;4639
subaccounts,abcdefghi,requests,1,startTime;2019-02-13T11:31:59.114 +0100
subaccounts,abcdefghi,requests,1,totalTime;532
subaccounts,abcdefghi,requests,1,user;RFC_USER
subaccounts,abcdefghi,requests,1,virtualBackend;sapecc.example.com:PORT
[...]
subaccounts,abcdefghi,requests,49,externalTime;128
subaccounts,abcdefghi,requests,49,id;1774317106
subaccounts,abcdefghi,requests,49,internalBackend;sapecc.example.com:PORT
subaccounts,abcdefghi,requests,49,protocol;PROTOCOL
subaccounts,abcdefghi,requests,49,receivedBytes;263
subaccounts,abcdefghi,requests,49,resource;/sap-webservice-url/
subaccounts,abcdefghi,requests,49,sentBytes;4660
subaccounts,abcdefghi,requests,49,startTime;2019-02-16T11:32:09.352 +0100
subaccounts,abcdefghi,requests,49,totalTime;130
subaccounts,abcdefghi,requests,49,user;RFC_USER
subaccounts,abcdefghi,requests,49,virtualBackend;sapecc.example.com:PORT
subaccounts,abcdefghi,sinceTime;2019-02-13T08:05:36.085 +0100
subaccounts,abcdefghi,subaccount;abcdefghi

<<<sapcc_subaccounts:sep(59)>>>
subaccounts,abcdefghi,displayName;Test Application
subaccounts,abcdefghi,locationID;Test Location
subaccounts,abcdefghi,regionHost;hana.ondemand.com
subaccounts,abcdefghi,subaccount;abcdefghi
subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,connectionCount;8
subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,name;abcdefg:hijklmnopqr
subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,type;JAVA
subaccounts,abcdefghi,tunnel,connectedSince;2019-02-14T10:11:00.630 +0100
subaccounts,abcdefghi,tunnel,connections;8
subaccounts,abcdefghi,tunnel,state;Connected
subaccounts,abcdefghi,tunnel,user;P123456

The agent plugin comes with a Check_MK check plugin of the same name, which is solely responsible for the construction of the command line arguments from the WATO configuration and passing it to the Check_MK agent plugin.

With the additional WATO plugin sapcc_agent.py it is possible to configure the username and password for the SAP Cloud Connector application user which is used to connect to the monitoring API. It is also possible to configure the TCP port and the connection timeout for the connection to the monitoring API through the WATO WebUI and thus override the default values. The default value for the TCP port is 8443, the default value for the connection timeout is 30 seconds. The configuration options for the Check_MK agent plugin agent_sapcc can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Datasource Programs
      -> SAP Cloud Connector systems
         -> Create rule in folder ...
            -> Rule Options
               Description: SAP Cloud Connector (SCC)

            -> SAP Cloud Connector systems
               SAP Cloud Connector user name: [username]
               SAP Cloud Connector password: [password]
               SAP Cloud Connector TCP port: [8443]

            -> Conditions
               Folder [The folder containing the SAP Cloud Connector systems]
               and/or
               Explicit hosts [x]
               Specify explicit host names [SAP Cloud Connector systems]

After saving the new rule, restarting Check_MK and doing an inventory on the additional host for the SAP Cloud Connector service in Check_MK, several new services starting with the name prefix SAP CC should appear.

The following image shows a status output example from the WATO WebUI with the service checks HTTP SAP SCC TLS and HTTP SAP SCC TLS Certificate from the Built-in Check_MK Monitoring described above. In addition to those, the example also shows the service checks based on the data from the SAP Cloud Connector Agent. The service checks SAP CC Application Connection, SAP CC Subaccount and SAP CC Tunnel are provided by the check plugin sapcc_subaccounts, the service check SAP CC Perf Backend is provided by the plugin sapcc_performance_backends:

Status output example for the complete monitoring of the SAP Cloud Connector

SAP Cloud Connector Subaccount

The check plugin sapcc_subaccounts implements the three sub-checks sapcc_subaccounts.app_conn, sapcc_subaccounts.info and sapcc_subaccounts.tunnel.

Info

The sub-check sapcc_subaccounts.info just gathers information on several configuration options for each subaccount on the SAP Cloud Connector and displays them in the status details of the check. These configuration options are the:

  • subaccount name on the SAP Cloud Platform to which the connection is made.

  • display name of the subaccount.

  • location ID of the subaccount.

  • the region host of the SAP Cloud Platform to which the SAP Cloud Connector establishes a connection.

The sub-check sapcc_subaccounts.info always returns an OK status. No performance data is currently reported by this check.

Tunnel

The sub-check sapcc_subaccounts.tunnel is responsible for the monitoring of each tunnel connection for each subaccount on the SAP Cloud Connector. Upon inventory this sub-check creates a service check for each tunnel connection found on the SAP Cloud Connector. During normal check execution, the status of the tunnel connection is determined for each inventorized item. If the tunnel connection is not in the Connected state, an alarm is raised accordingly. Additionally, the number of currently active connections over a tunnel as well as the elapsed time in seconds since the tunnel connection was established are determined for each inventorized item. If either the value of the currently active connections or the number of seconds since the connection was established are above or below the configured warning and critical threshold values, an alarm is raised accordingly. For both values, performance data is reported by the check.

With the additional WATO plugin sapcc_subaccounts.py it is possible to configure the warning and critical levels for the sub-check sapcc_subaccounts.tunnel through the WATO WebUI and thus override the following default values:

Metric Warning Low Threshold Critical Low Threshold Warning High Threshold Critical High Threshold
Number of connections 0 0 30 40
Connection duration 0 sec 0 sec 284012568 sec 315569520 sec

The configuration options for the tunnel connection levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Applications, Processes & Services
         -> SAP Cloud Connector Subaccounts
            -> Create Rule in Folder ...
               -> Rule Options
                  Description: SAP Cloud Connector Subaccounts

               -> Parameters
                  [x] Number of tunnel connections
                      Warning if equal or below [0] connections
                      Critical if equal or below [0] connections
                      Warning if equal or above [30] connections
                      Critical if equal or above [40] connections
                  [x] Connection time of tunnel connections 
                      Warning if equal or below [0] seconds
                      Critical if equal or below [0] seconds
                      Warning if equal or above [284012568] seconds
                      Critical if equal or above [315569520] seconds

               -> Conditions
                  Folder [The folder containing the SAP Cloud Connector systems]
                  and/or
                  Explicit hosts [x]
                  Specify explicit host names [SAP Cloud Connector systems]
                  and/or
                  Application Or Tunnel Name [x]
                  Specify explicit values [Tunnel name]

The above image with a status output example from the WATO WebUI shows one sapcc_subaccounts.tunnel service check as the last of the displayed items. The service name is prefixed by the string SAP CC Tunnel and followed by the subaccount name, which in this example is anonymized. For each tunnel connection the connection state, the overall number of application connections currently active over the tunnel, the time when the tunnel connection was established and the number of seconds elapsed since establishing the connection are shown. The overall number of currently active application connections is also visualized in the perf-o-meter, with a logarithmic scale growing from the left to the right.

The following image shows an example of the two metric graphs for a single sapcc_subaccounts.tunnel service check:

Example metric graph for a single sapcc_subaccounts.tunnel service check

The upper graph shows the time elapsed since the tunnel connection was established. The lower graph shows the overall number of application connections currently active over the tunnel connection. Both graphs would show warning and critical thresholds values, which in this example are currently outside the displayed range of values for the y-axis.

Application Connection

The sub-check sapcc_subaccounts.app_conn is responsible for the monitoring of each applications connection through each tunnel connection for each subaccount on the SAP Cloud Connector. Upon inventory this sub-check creates a service check for each application connection found on the SAP Cloud Connector. During normal check execution, the number of currently active connections for each application is determined for each inventorized item. If the value of the currently active connections is above or below the configured warning and critical threshold values, an alarm is raised accordingly. For the number of currently active connections, performance data is reported by the check.

With the additional WATO plugin sapcc_subaccounts.py it is possible to configure the warning and critical levels for the sub-check sapcc_subaccounts.app_conn through the WATO WebUI and thus override the following default values:

Metric Warning Low Threshold Critical Low Threshold Warning High Threshold Critical High Threshold
Number of connections 0 0 30 40

The configuration options for the tunnel connection levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Applications, Processes & Services
         -> SAP Cloud Connector Subaccounts
            -> Create Rule in Folder ...
               -> Rule Options
                  Description: SAP Cloud Connector Subaccounts

               -> Parameters
                  [x] Number of application connections
                      Warning if equal or below [0] connections
                      Critical if equal or below [0] connections
                      Warning if equal or above [30] connections
                      Critical if equal or above [40] connections

               -> Conditions
                  Folder [The folder containing the SAP Cloud Connector systems]
                  and/or
                  Explicit hosts [x]
                  Specify explicit host names [SAP Cloud Connector systems]
                  and/or
                  Application Or Tunnel Name [x]
                  Specify explicit values [Application name]

The above image with a status output example from the WATO WebUI shows one sapcc_subaccounts.app_conn service check as the 5th item from top of the displayed items. The service name is prefixed by the string SAP CC Application Connection and followed by the application name, which in this example is anonymized. For each application connection the number of currently active connections and the connection type are shown. The number of currently active application connections is also visualized in the perf-o-meter, with a logarithmic scale growing from the left to the right.

The following image shows an example of the metric graph for a single sapcc_subaccounts.app_conn service check:

Example metric graph for a single sapcc_subaccounts.app_conn service check

The graph shows the number of currently active application connections. The graph would show warning and critical thresholds values, which in this example are currently outside the displayed range of values for the y-axis.

SAP Cloud Connector Performance Backends

The check sapcc_performance_backends is responsible for the monitoring of the performance of each (on-premise) backend system connected to the SAP Cloud Connector. Upon inventory this check creates a service check for each backend connection found on the SAP Cloud Connector. During normal check execution, the number of requests to the backend system, categorized in one of the 22 runtime buckets is determined for each inventorized item. From these raw values, the request rate in requests per second is derived for each of the 22 runtime buckets. Also from the raw values, the following four additional metrics are derived:

  • calls_total: the total request rate over all of the 22 runtime buckets.

  • calls_pct_ok: the relative number of requests in percent with a runtime below a given runtime warning threshold.

  • calls_pct_warn: the relative number of requests in percent with a runtime above a given runtime warning threshold.

  • calls_pct_crit: the relative number of requests in percent with a runtime above a given runtime critical threshold.

If the relative number of requests is above the configured warning and critical threshold values, an alarm is raised accordingly. For each of the 22 runtime buckets, the total number of requests and the relative number of requests (calls_pct_ok, calls_pct_warn, calls_pct_crit), performance data is reported by the check.

With the additional WATO plugin sapcc_performance_backends.py it is possible to configure the warning and critical levels for the check sapcc_performance_backends through the WATO WebUI and thus override the following default values:

Metric Warning Threshold Critical Threshold
Request runtime 500 msec 1000 msec
Percentage of requests over request runtime thresholds 10% 5%

The configuration options for the backend performance levels can be found in the WATO WebUI under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Applications, Processes & Services
         -> SAP Cloud Connector Backend Performance
            -> Create Rule in Folder ...
               -> Rule Options
                  Description: SAP Cloud Connector Backend Performance

               -> Parameters
                  [x] Runtime bucket definition and calls per bucket in percent
                      Warning if percentage of calls in warning bucket equal or above [10.00] %
                      Assign calls to warning bucket if runtime equal or above [500] milliseconds
                      Critical if percentage of calls in critical bucket equal or above [5.00] %
                      Assign calls to critical bucket if runtime equal or above [1000] milliseconds

               -> Conditions
                  Folder [The folder containing the SAP Cloud Connector systems]
                  and/or
                  Explicit hosts [x]
                  Specify explicit host names [SAP Cloud Connector systems]
                  and/or
                  Backend Name [x]
                  Specify explicit values [Backend name]

The above image with a status output example from the WATO WebUI shows one sapcc_performance_backends service check as the 6th item from top of the displayed items. The service name is prefixed by the string SAP CC Perf Backend and followed by a string concatenated from the protocol, FQDN and TCP port of the backend system, which in this example is anonymized. For each backend connection the total number of requests, the total request rate, the percentage of requests below the runtime warning threshold, the percentage of requests above the runtime warning threshold and the percentage of requests above the runtime critical threshold are shown. The relative number of requests in percent are also visualized in the perf-o-meter.

The following image shows an example of the metric graph for the total request rate from the sapcc_performance_backends service check:

Example metric graph for the total request rate from the sapcc_performance_backends service check

The following image shows an example of the metric graph for the relative number of requests from the sapcc_performance_backends service check:

Example metric graph for the relative number of requests from the sapcc_performance_backends service check

The graph shows the percentage of requests below the runtime warning threshold in the color green at the bottom, the percentage of requests above the runtime warning threshold in the color yellow stacked above and the percentage of requests above the runtime critical threshold in the color red stacked at the top.

The following image shows an example of the combined metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends service check:

Example combined metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends service check

To provide a better overview, the individual metrics are grouped together into three graphs. The first graph shows the request rate in the runtime buckets >=10ms, >=20ms, >=30ms, >=40ms, >=50ms, >=75ms and >=100ms. The second graph shows the request rate in the runtime buckets >=125ms, >=150ms, >=200ms, >=300ms, >=400ms, >=500ms, >=750ms and >=1000ms. The third and last graph shows the request rate in the runtime buckets >=1250ms, >=1500ms, >=2000ms, >=2500ms, >=3000ms, >=4000ms and >=5000ms.

The following image shows an example of the individual metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends service check:

Example individual metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends service check

Each of the metric graphs shows exactly the same data as the previously show combined graphs. The combined metric graphs are actually based on the individual metric graphs for the request rates to a single backend system.

Conclusion

The newly introduced checks for the SAP Cloud Connector enables you to monitor several application specific aspects of the SAP Cloud Connector with your Check_MK Server. The combination of built-in Check_MK monitoring facilities and a new agent plugin for the SAP Cloud Connector complement each other in this regard. While the new SAP Cloud Connector agent plugin for Check_MK utilizes most of the data provided by the monitoring endpoints on the SAP Cloud Connector, a more in-depth monitoring could be achieved if the data from the Most Recent Requests metric would also be exposed over the monitoring API of SAP Cloud Connector. It hope this will be the case in a future release of the SAP Cloud Connector.

I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// Installing Anitya Upstream Release Monitoring on RHEL 7

Anitya is a release monitoring project developed and provided by the Fedora (Fedora) project. It is used to monitor upstream software projects for new releases in a automated fashion. It is thus similar to the uscan command from the Debian (Debian) devscripts package. Unlike the uscan command, Anitya is written in Python (Python) and comes with an integrated WebUI. Also unlike uscan, Anitya uses the FedMsg (Federated Message Bus) in order to publish messages indicating a detected upstream release event. These messages can in turn be consumed by other systems in order to trigger various actions upon a new upstream release. Anitya can be used “as a service” through https://release-monitoring.org/ or be installed as a dedicated instance. This article describes the necessary steps to set up an own, dedicated Anitya instance on a RHEL 7 system.


According to the official Anitya installation and configuration documentation there are only two steps – pip install anitya and editing /etc/anitya/anitya.toml – needed to set up Anitya. I have found this a bit too optimistic in case of a RHEL 7 system which was the installation target in my case. Here is a more detailed description of the necessary steps to get Anitya up and running on RHEL 7:

  1. Install some basic Anitya dependencies like Python, the Python installer, the Apache webserver, the Apache WSGI module and some development files:

    root@host:# yum install python httpd mod_wsgi python2-pip python-devel zeromq-devel
    
  2. Install Anitya and its dependencies:

    root@host:# pip2 install anitya
    

    Unfortunately, the Python 2 environment is necessary since the Apache WSGI module in RHEL 7 is compiled against Python 2.

  3. Adjust the import of the Anitya classes in the WGSI application. Open the file /usr/lib/python2.7/site-packages/anitya/wsgi.py in an editor of your choice:

    root@host:# vi /usr/lib/python2.7/site-packages/anitya/wsgi.py
    

    Alter its contents like shown in the following patch:

    wsgi.py
    --- /usr/lib/python2.7/site-packages/anitya/wsgi.py.orig        2018-12-05 16:59:26.000000000 +0100
    +++ /usr/lib/python2.7/site-packages/anitya/wsgi.py     2018-12-05 16:40:40.000000000 +0100
    @@ -14,7 +14,7 @@
     # You should have received a copy of the GNU General Public License
     # along with this program; if not, write to the Free Software
     # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
    -from .app import create
    +from anitya.app import create
     
     
     application = create()
  4. Create and adjust the basic Anitya configuration.

    Create the Anitya configuration directory:

    root@host:# mkdir /etc/anitya
    

    and download the sample configuration file from the Anitya GitHub repository:

    root@host:# wget -O /etc/anitya/anitya.toml https://raw.githubusercontent.com/release-monitoring/anitya/master/files/anitya.toml.sample
    

    Generate a secret key with e.g. the pwgen command:

    root@host:# pwgen -sncB 32 1
    

    Adjust the several settings in the Anitya configuration file. Open the file /etc/anitya/anitya.toml in an editor of your choice:

    root@host:# vi /etc/anitya/anitya.toml
    

    Alter the four configuration entries admin_email, secret_key, db_url and default_regex like shown in the following example:

    /etc/anitya/anitya.toml
    admin_email = "<YOUR-EMAIL-ADDRESS>"
    secret_key = "<SECRET-KEY>"
    db_url = "sqlite:////var/www/html/anitya.sqlite"
    default_regex = '''(?i)%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?:[-_]
                       (?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'''

    Substitute the placeholder <YOUR-EMAIL-ADDRESS> with the email address at which you want to receive emails about errors of the Anitya application. This address will also be used as part of the request headers which identify the Anitya HTTP user agent. Substitute the placeholder <SECRET-KEY> with the secret key generated above.

    In case a SQLite database is used, make sure the value of the db_url entry points to a location within the filesystem that is reachable from the context of the webserver the Anitya application is running in. If e.g. the Apache webserver is used, the SQLite database should be located somewhere within the DocumentRoot of the Apache VirtualHost.

    Be careful with the syntax of the configuration file and the configuration entries in there. It seems that the configuration parsing part of Anitya is not very robust. A single invalid or syntactically incorrect entry will cause Anitya to skip all other entries as well and fall back to its default values. It took me a while to figure out that the Anitya database was generated and searched in the default location sqlite:////var/tmp/anitya-dev.sqlite no matter which value was given for the db_url configuration entry, just because another configuration entry – specifically the default_regex – had a syntactically incorrect value.

  5. Download the Anitya cron job from the Anitya GitHub repository:

    root@host:# cd /usr/lib/python2.7/site-packages/anitya
    root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/<RELEASE>/files/anitya_cron.py
    

    The Anitya cron job should match the installed Anitya release. To ensure this, substitute the placeholder <RELEASE> in the above URL with the installed Anitya version. E.g. for the Anitya version 0.13.2:

    root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/0.13.2/files/anitya_cron.py
    

    Make the Anitya cron job executeable:

    root@host:# chmod 755 /usr/lib/python2.7/site-packages/anitya/anitya_cron.py
    

    and adjust the Python interpreter to the RHEL 7 standard by opening the file /usr/lib/python2.7/site-packages/anitya/anitya_cron.py in an editor of your choice:

    root@host:# vi /usr/lib/python2.7/site-packages/anitya/anitya_cron.py
    

    Alter its contents like shown in the following patch:

    anitya_cron.py.patch
    # diff -uwi anitya_cron.py.orig anitya_cron.py
    --- anitya_cron.py.orig 2018-12-23 19:52:31.000000000 +0100
    +++ anitya_cron.py      2018-12-23 19:52:37.000000000 +0100
    @@ -1,4 +1,4 @@
    -#!/usr/bin/env python3
    +#!/usr/bin/env python2
     # -*- coding: utf-8 -*-
     
     import sys
  6. Download the Anitya database creation and initialization script from the Anitya GitHub repository:

    root@host:# cd /usr/lib/python2.7/site-packages/anitya
    root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/master/createdb.py
    

    Run the downloaded script in order to create and initialize the Anitya database:

    root@host:# python createdb.py
    

    To grant access to the webserver the Anitya application is running in, adjust the ownership and the permissions of the Anitya database file:

    root@host:# chown apache:apache /var/www/html/anitya.sqlite
    root@host:# chmod 644 /var/www/html/anitya.sqlite
    
  7. Reapply changes to the database from a missing database version.

    Configure Alembic by opening the file /usr/lib/python2.7/site-packages/anitya/alembic.ini in an editor of your choice:

    root@host:# cd /usr/lib/python2.7/site-packages/anitya
    root@host:# vi alembic.ini
    

    Insert the contents like shown in the following configuration example:

    alembic.ini
    # A generic, single database configuration.
     
    [alembic]
    # path to migration scripts
    script_location = /usr/lib/python2.7/site-packages/anitya/db/migrations
     
    # template used to generate migration files
    # file_template = %%(rev)s_%%(slug)s
     
    # max length of characters to apply to the
    # "slug" field
    #truncate_slug_length = 40
     
    # set to 'true' to run the environment during
    # the 'revision' command, regardless of autogenerate
    # revision_environment = false
     
    # set to 'true' to allow .pyc and .pyo files without
    # a source .py file to be detected as revisions in the
    # versions/ directory
    # sourceless = false
    sqlalchemy.url = sqlite:////var/www/html/anitya.sqlite
     
    # Logging configuration
    [loggers]
    keys = root,sqlalchemy,alembic
     
    [handlers]
    keys = console
     
    [formatters]
    keys = generic
     
    [logger_root]
    level = WARN
    handlers = console
    qualname =
     
    [logger_sqlalchemy]
    level = WARN
    handlers =
    qualname = sqlalchemy.engine
    [logger_alembic]
    level = INFO
    handlers =
    qualname = alembic
     
    [handler_console]
    class = StreamHandler
    args = (sys.stderr,)
    level = NOTSET
    formatter = generic
     
    [formatter_generic]
    format = %(levelname)-5.5s [%(name)s] %(message)s
    datefmt = %H:%M:%S

    Initialize the Alembic table in the Anitya database and show the current database version:

    root@host:# alembic current
    7a8c4aa92678 (head)
    

    Manually change the database version known to Alembic in order to trick it into again applying the missing change to the database:

    root@host:# sqlite3 /var/www/html/anitya.sqlite
    
    sqlite> INSERT INTO alembic_version (version_num) VALUES ('a52d2fe99d4f');
    sqlite> .quit
    

    Reapply the change to the database from the missing database version feeaa70ead67:

    root@host:# alembic upgrade feeaa70ead67
    

    Determine the current “head” database version and manually change the database version known to Alembic:

    root@host:# alembic history
    540bdcf7edbc -> 7a8c4aa92678 (head), Add missing GitHub owner/project pairs
    27342bce1d0f -> 540bdcf7edbc, Convert GitHub URL to owner/project
    [...]
    
    root@host:# sqlite3 /var/www/html/anitya.sqlite
    
    sqlite> UPDATE alembic_version SET version_num = '7a8c4aa92678';
    sqlite> .quit
    
  8. Setup the Anitya cron job with the previously downloaded script by opening the file /etc/cron.d/anitya in an editor of your choice:

    root@host:# vi /etc/cron.d/anitya
    

    Insert the contents like shown in the following example:

    /etc/cron.d/anitya
    */5 * * * * root /usr/lib/python2.7/site-packages/anitya/anitya_cron.py 1>>/var/log/anitya.log 2>&1

    Adjust the execution interval according to your environment and needs.

  9. Create a VirtualHost configuration for e.g. the Apache webserver in whose context the Anitya application will be running in and start the Apache webserver.

    Create the VirtualHost configuration by opening the file /etc/httpd/conf.d/anitya.conf in an editor of your choice:

    root@host:# vi /etc/httpd/conf.d/anitya.conf
    

    Insert the contents like shown in the following configuration example:

    /etc/httpd/conf.d/anitya.conf
    <VirtualHost <IP>:<PORT>>
      DocumentRoot "/var/www/html/"
      ServerName <FQDN>
      <Directory />
        Options FollowSymLinks
        AllowOverride None
      </Directory>
     
      WSGIDaemonProcess anitya user=apache group=apache processes=2 threads=25 python-path=/usr/lib/python2.7/site-packages
      WSGIProcessGroup anitya
      WSGIScriptAlias /anitya /usr/lib/python2.7/site-packages/anitya/wsgi.py process-group=anitya
     
      <Directory "/usr/lib/python2.7/site-packages/anitya">
        <Files wsgi.py>
          Order deny,allow
          Allow from all
          Require all granted
        </Files>
     
        Order allow,deny
        Options Indexes FollowSymLinks
        Allow from all
      </Directory>
     
      Alias "/anitya/static" "/usr/lib/python2.7/site-packages/anitya/static"
      <Directory "/usr/lib/python2.7/site-packages/anitya/static">
        Order allow,deny
        Options Indexes FollowSymLinks
        Allow from all
        Require all granted
      </Directory>
    </VirtualHost>

    Substitute the placeholders <IP> and <PORT> with the IP address and TCP port on which the virtual host should be listening. Substitute the placeholder <FQDN> with the fully qualified domain name under which you want to reach the Anitya application with your browser.

    Enable and start the Apache webserver:

    root@host:# systemctl enable httpd
    root@host:# systemctl start httpd
    

After performing the steps outlined above, your own, dedicated Anitya instance on a RHEL 7 system should be ready. The WebUI of the Anitya instance will be reachable with a browser under the following URL http://<FQDN>/anitya, where the placeholder <FQDN> has to be substituted with the fully qualified domain name given in the above Apache VirtualHost configuration.

After logging into Anitya and adding an upstream software project, the next run of the Anitya cron job should discover the initial and current versions of the upstream software project. This should be visible in two places, firstly in the logfile of the Anitya cron job /var/log/anitya.log and secondly on the projects page of the Anitya WebUI. If this is not the case after the next execution interval of the Anitya cron job, review the logfile /var/log/anitya.log for possible errors.

I hope you find the provided Anitya installation and configuration guide for a RHEL 7 target system useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with this guide. In future articles i will – hopefully soon – write about the neccessary steps to set up the FedMsg (Federated Message Bus) and connect Anitya to it, as well as an enhancement to Anitya itself.

// Check_MK Monitoring - Brocade / Broadcom Fibre Channel Switches

This article provides patches for the standard Check_MK distribution in order to fix and enhance the support for the monitoring of Brocade / Broadcom Fibre Channel Switches.


Out of the box, there is currently already monitoring support available for Brocade / Broadcom Fibre Channel Switches in the standard Check_MK distribution. Unfortunately there are several issues in the check brocade_fcport, which is used to monitor the status and several metrics of the fibre channel switch ports. Those issues prevent the check from working as intended and are fixed in the version provided in this article. Up until recently, there also was no support in the standard Check_MK distribution for monitoring CPU and memory metrics on a switch level and no support for monitoring SFP metrics on a port level. This article introduces the new checks brocade_cpu, brocade_mem and brocade_sfp to cover those metrics. With the most recent upstream version 1.5.x of the Check_MK distribution, there are now the standard checks brocade_sys (covering CPU and memory metrics) and brocade_sfp (covering the SFP port metrics) available, providing basically the same functionality.

For the impatient and TL;DR here is the enhanced version of the brocade_fcport check:

Enhanced version of the brocade_fcport check

And the new brocade_cpu, brocade_mem and brocade_sfp checks:

The new brocade_cpu check
The new brocade_mem check
The new brocade_sfp check

The sources to the new and enhanced versions of all the checks can be found in my Check_MK Plugins repository on GitHub.

Additional Checks

CPU Usage

The Check_MK service check brocade_cpu monitors the current CPU utilization of Brocade / Broadcom fibre channel switches. It uses the SNMP OID swCpuUsage from the fibre channel switch MIB (SW-MIB) in order to create one service check for each CPU found in the system. The current CPU utilization in percent is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the standard WATO plugin for CPU utilization it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of CPU utilization).

The following image shows a status output example for the brocade_cpu service check from the WATO WebUI:

Status output example for the new brocade_cpu service check

This example shows the current CPU utilization of the system in percent.

The following image shows an example of the service metrics graph for the brocade_cpu service check:

Example service metrics graph for the new brocade_cpu service check

The selected example graph shows the current CPU utilization of the system in percent as well as the default warning and critical threshold values.

Memory Usage

The Check_MK service check brocade_mem monitors the current memory (RAM) usage on Brocade / Broadcom fibre channel switches. It uses the SNMP OID swMemUsage from the fibre channel switch MIB (SW-MIB) in order to create a service check for the overall memory usage on the system. The amount of currently used memory is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of used memory). The configuration options for the used memory levels can be found under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Operating System Resources
         -> Brocade Fibre Channel Memory Usage
            -> Create rule in folder ...
               [x] Levels for memory usage

The following image shows a status output example for the brocade_mem service check from the WATO WebUI:

Status output example for the new brocade_mem service check

This example shows the current memory utilization of the system in percent.

The following image shows an example of the service metrics graph for the brocade_mem service check:

Example service metrics graph for the new brocade_mem service check

The selected example graph shows the current memory utilization of the system in percent as well as the default warning and critical threshold values.

SFP Health

The Check_MK service check brocade_sfp monitors several metrics of the SFPs in all enabled and active ports of Brocade / Broadcom fibre channel switches. It uses several SNMP OIDs from the fibre channel switch MIB (SW-MIB) and from the Fabric Alliance Extension MIB (FA-EXT-MIB) in order to create one service check for each SFP found in an enabled and active port on the system. The OIDs from the fibre channel switch MIB (swFCPortSpecifier, swFCPortName, swFCPortPhyState, swFCPortOpStatus, swFCPortAdmStatus) are used to determine the number, name and status of the switch port. The OIDs from the Fabric Alliance Extension MIB are used to determine the actual SFP metrics. Those metrics are the SFP temperature (swSfpTemperature), voltage (swSfpVoltage) and current (swSfpCurrent), as well as the optical receive and transmit power (swSfpRxPower and swSfpTxPower) of the SFP. Unfortunately the SFP metrics are not available on all Brocade / Broadcom switch hardware platforms and FabricOS version. This check was verified to work with Gen6 hardware and FabricOS v8. Another limitation is, that the SFP metrics are not available in real-time, but are only gathered in a 5 minute interval by the switch from all the SFPs in the switch. This is most likely a precausion as not to overload the processing capacity on the SFPs with too many status requests.

The current temperature level as well as the optical receive and transmit power levels are compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values:

Metric Warning Threshold Critical Threshold
System temperature 55°C 65°C
Optical receive power -7.0 dBm -9.0 dBm
Optical transmit power -2.0 dBm -3.0 dBm

The configuration options for the used temperature and optical receive and transmit power levels can be found under:

-> Host & Service Parameters
   -> Parameters for discovered services
      -> Networking
         -> Brocade Fibre Channel SFP
            -> Create rule in folder ...
               [x] Temperature levels in degrees celcius
               [x] Receive power levels in dBm
               [x] Transmit power levels in dBm

Currently, the electrical current and voltage metrics of the SFPs are only used for long-term trends via the respective service metric template and thus are not used to raise any alarms. This is due to the fact that there is little to no information available on which precise electrical current and voltage levels would constitute as an indicator for an immediate or impending failure state.

The following image shows several status output examples for the brocade_sfp service check from the WATO WebUI:

Status output examples for the new brocade_sfp service check

This example shows SFP Port service check items for eight consecutive ports on a switch. For each item the current temperature, voltage and current, as well as the optical receive and transmit power of the SFP are shown. The optical receive and transmit power levels are also visualized in the Perf-O-Meter, optical receive power growing from the middle to the left, optical transmit power growing from the middle to the right.

The following image shows an example of the service metrics graph for the brocade_sfp service check:

Example service metrics graph for the new brocade_sfp service check

The first graph shows the optical receive and transmit power of the SFP. The second shows the electrical current drawn by the SFP. The third graph shows the temperature of the SFP. The fourth and last graph shows the electrical voltage provided to the SFP.

Modified Check

Fibre Channel Port

The Check_MK service check brocade_fcport has several issuse which are addressed by the following set of patches. The patch is actually a monolithic one and is just broken up into individual patches here for ease of discussion.

The first patch is a simple, but ugly workaround to prevent the conversion of OID_END, which carries the port index information, from being treated as a BINARY SNMP value:

brocade_fcport.patch
--- a/checks/brocade_fcport   2018-11-25 18:06:04.674930057 +0100
+++ b/checks/brocade_fcport   2018-11-25 08:43:58.715721271 +0100
@@ -154,8 +135,8 @@
         bbcredits = None
         if len(if64_info) > 0:
             fcmgmt_portstats = []
-            for oidend, dummy, tx_elements, rx_elements, bbcredits_64 in if64_info:
-                if int(index) == int(oidend.split(".")[-1]):
+            for oidend, tx_elements, rx_elements, bbcredits_64 in if64_info:
+                if index == oidend.split(".")[-1]:
                     fcmgmt_portstats = [
                         binstring_to_int(''.join(map(chr, tx_elements))) / 4,
                         binstring_to_int(''.join(map(chr, rx_elements))) / 4,
@@ -477,7 +426,6 @@
         # Not every device supports that
         (".1.3.6.1.3.94.4.5.1", [
             OID_END,
-            "1",            # Dummy value, otherwise OID_END is also treated as a BINARY value
             BINARY("6"),    # FCMGMT-MIB::connUnitPortStatCountTxElements
             BINARY("7"),    # FCMGMT-MIB::connUnitPortStatCountRxElements
             BINARY("8"),    # FCMGMT-MIB::connUnitPortStatCountBBCreditZero

This is achieved by simply inserting a dummy value into the list of SNMP OIDs in the snmp_info variable. I haven't had time to dig out the root cause of this behaviour, but i guess it must be somewhere in the core SNMP components of Check_MK. The patch also implicitly addresses and fixes an issue where two variables with differently typed content are being compared. This is achieved by simply changing the line:

               if index == oidend.split(".")[-1]:

to:

               if int(index) == int(oidend.split(".")[-1]):

and thus forcing a conversion to integer values.

The next patch adds support for additional encoding schemes in the FC-1 layer, which are used for fibre channel beyond the speed of 8 GBit. Up to and including a speed of 8 GBit, fibre channel uses a 8/10b encoding. This means that for every 8 bits of data, 10 bits are actually send over the fibre channel link. The encoding scheme changes for speeds higher than 8 GBit. 16 GBit fibre channel – like 10 GBit ethernet – uses a 64/66b encoding scheme, 32 GBit fibre channel and higher use a 256/257b encoding scheme. Thus the wirespeed calculations of the brocade_fcport service check are incorrect for speeds above 8 GBit. The following patch addresses and fixes this issue:

brocade_fcport.patch
--- a/checks/brocade_fcport   2018-11-25 18:06:04.674930057 +0100
+++ b/checks/brocade_fcport   2018-11-25 08:43:58.715721271 +0100
@@ -283,15 +267,8 @@
 
     output.append(speedmsg)
 
-    if gbit > 16:
-        # convert gbit netto link-rate to Byte/s (256/257 enc)
-        wirespeed = gbit * 1000000000.0 * ( 256 / 257 ) / 8
-    elif gbit > 8:
-        # convert gbit netto link-rate to Byte/s (64/66 enc)
-        wirespeed = gbit * 1000000000.0 * ( 64 / 66 ) / 8
-    else:
         # convert gbit netto link-rate to Byte/s (8/10 enc)
-        wirespeed = gbit * 1000000000.0 * ( 8 / 10 ) / 8
+    wirespeed = gbit * 1000000000.0 * 0.8 / 8
     in_bytes = 4 * get_rate("brocade_fcport.rxwords.%s" % index, this_time, rxwords)
     out_bytes = 4 * get_rate("brocade_fcport.txwords.%s" % index, this_time, txwords)

The third patch simply adds the notxcredits counter and its value to the output of the service check. This information is currently missing from the status output of the check and is just added as a convenience:

brocade_fcport.patch
--- a/checks/brocade_fcport   2018-11-25 18:06:04.674930057 +0100
+++ b/checks/brocade_fcport   2018-11-25 08:43:58.715721271 +0100
@@ -408,9 +361,6 @@
             summarystate = max(1, summarystate)
             text += "(!)"
             output.append(text)
-        else:
-            if counter == "notxcredits":
-                output.append(text)
 
     # P O R T S T A T E
     for dev_state, state_key, state_info, warn_states, state_map in [

The last patch addresses and fixes a more serious issue. The details are explained in the comment of the following code snippet. The digest here being, that the metric notxcredits (“No TX buffer credits”) gathered from the SNMP OID swFCPortNoTxCredits is being calculated wrong. This is due to the fact, that the brocade_fcport service check treats this metric like all the other error metrics of a switchport (e.g. “CRC errors”, “ENC-Out”, “ENC-In” and “C3 discards”) and puts it in relation to the number of frames transmitted over a link. In reality though the definition of the metric behind the SNMP OID swFCPortNoTxCredits is, that it is actually relative to time. This issue leads to false positives in certain edge cases. For example when a little utilized switch port sees an otherwise uncritical number of swFCPortNoTxCredits due to the normal activity of the fibre channel flow control. In such a case, a relatively high number of swFCPortNoTxCredits is put into relation to the sum of a relatively low number of frames transmitted over the link and again the relatively high number of swFCPortNoTxCredits. See the last line of the code snippet below for this calculation. The result is a high value for the metric notxcredits (“No TX buffer credits”) although the fibre channel flow control was working perfectly fine, the configured switch.edgeHoldTime was most likely never reached and frames have never been dropped.

In order to address this issue, the following patch adds a few lines of code for a special treatment of the metric notxcredits to the brocade_fcport service check. This is done in the last section of the patch. The second section of the patch is just for the purpose of clarification, as it sets the metric that is being related to, to None in case of the metric notxcredits. The first section of the patch adjusts the default warning and critical thresholds to lower levels. The previous levels were rather high due to the miscalculation explained above. With the new calculation added by the third section of the patch, the default warning and critical threshold levels need to be much more sensitive.

brocade_fcport.patch
--- a/checks/brocade_fcport   2018-11-25 18:06:04.674930057 +0100
+++ b/checks/brocade_fcport   2018-11-25 08:43:58.715721271 +0100
@@ -100,11 +100,11 @@
 factory_settings["brocade_fcport_default_levels"] = {
     "rxcrcs":           (3.0, 20.0),   # allowed percentage of CRC errors
     "rxencoutframes":   (3.0, 20.0),   # allowed percentage of Enc-OUT Frames
     "rxencinframes":    (3.0, 20.0),   # allowed percentage of Enc-In Frames
-    "notxcredits":      (1.0, 3.0),    # allowed percentage of No Tx Credits
+    "notxcredits": (3.0, 20.0),  # allowed percentage of No Tx Credits
     "c3discards":       (3.0, 20.0),   # allowed percentage of C3 discards
     "assumed_speed":    2.0,           # used if speed not available in SNMP data
 }
 
 
@@ -349,7 +327,7 @@
            ("ENC-Out",              "rxencoutframes",      rxencoutframes,  rxframes_rate),
            ("ENC-In",               "rxencinframes",       rxencinframes,   rxframes_rate),
            ("C3 discards",          "c3discards",          c3discards,      txframes_rate),
-           ("No TX buffer credits", "notxcredits",         notxcredits,     None),
+        ("No TX buffer credits", "notxcredits", notxcredits, txframes_rate),
     ]:
         per_sec = get_rate("brocade_fcport.%s.%s" % (counter, index), this_time, value)
         perfdata.append((counter, per_sec))
@@ -360,31 +338,6 @@
                     (counter, item), this_time, per_sec, average)
             perfdata.append( ("%s_avg" % counter, per_sec_avg ) )
 
-        # Calculate error rates
-        if counter == "notxcredits":
-            # Calculate the error rate for "notxcredits" (buffer credit zero). Since this value
-            # is relative to time instead of the number of transmitted frames it needs special
-            # treatment.
-            # Semantics of the buffer credit zero value on Brocade / Broadcom devices:
-            # The switch ASIC checks the buffer credit value of a switch port every 2.5us and
-            # increments the buffer credit zero counter if the buffer credit value is zero.
-            # This means if in a one second interval the buffer credit zero counter increases
-            # by 400000 the link on this switch port is not allowed to send any frames.
-            # By default the edge hold time on a Brocade / Broadcom device is about 200ms:
-            #   switch.edgeHoldTime:220
-            # If a C3 frame remains in the switches queue for more than 220ms without being
-            # given any credits to be transmitted, it is subsequently dropped. Thus the buffer
-            # credit zero counter would optimally be correlated to the C3 discards counter.
-            # Unfortunately the Brocade / Broadcom devices have no egress buffering and do
-            # ingress buffering instead. Thus the C3 discards counters are increased on the
-            # ingress port, while the buffer credit zero counters are increased on the egress
-            # port. The trade-off is to correlate the buffer credit zero counters relative to
-            # the measured time interval.
-            if per_sec > 0:
-                rate = per_sec / 400000.00
-            else:
-                rate = 0
-        else:
             # compute error rate (errors in relation to number of frames) (from 0.0 to 1.0)
             if ref > 0 or per_sec > 0:
                 rate = per_sec / (ref + per_sec)

Conclusion

Adding the three new checks for Brocade / Broadcom fibre channel switches to your Check_MK server enables you to monitor additional CPU and memory aspects as well as the SFPs of your Brocade / Broadcom fibre channel devices. More recent versions of Check_MK should already include the same functionality through the now included standard checks brocade_sys and brocade_sfp.

New Brocade / Broadcom fibre channel devices should pick up the additional service checks immediately. Existing Brocade / Broadcom fibre channel devices might need a Check_MK inventory to be run explicitly on them in order to pick up the additional service checks.

The enhanced version of the brocade_fcport service check addresses and fixes several issues present in the standard Check_MK service check. It should provide you with a more complete and future-proof monitoring of your fibre channel network infrastructure.

I hope you find the provided new and enhanced checks useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.

// SAP Cloud Connector - Configuring Multiple Local Administrative Users

Out of the box, the SAP Cloud Connector does not support the creation of multiple local users for administration purposes through its WebUI. The standard and documented way provides only the use of LDAP in conjunction with an external user directory in case multiple administrative users are necessary. This in turn introduces an unnecessary overhead and external dependency. It also has rather strict limitations on the names of the groups (admin or sccadmin) that can used to authorize users for administrative tasks in the SAP Cloud Connector. There is a simple workaround for this limitation, which will be described in this blog post.

Since the SAP Cloud Connector uses an embedded Tomcat servlet container as an application server, it also uses some of the infrastructure provided by Tomcat. This includes the local file-based user directory. In order to use multiple, distinctly named administrative users, they simply need to be manually added to this local file-based user directory.

The first step though, is to create a password hash that is compatible with the Tomcat servlet container. This can be accomplished with the use of the Tomcat command line tool digest.sh or digest.bat. Depending on your operating system this tool might have other names, e.g. /usr/bin/tomcat-digest on RedHat Enterprise Linux. On any system that has said command line tool available (e.g. on a RHEL 7 system with the RPM tomcat installed), run the following command:

user@host:$ /usr/bin/tomcat-digest -a sha-256 '<PASSWORD>'

Substitute the placeholder <PASSWORD> with the actual passwort that will be used for the new administrative user. The output of the above command will be in the form of <PASSWORD>:<HASH>. For the following step, only the <HASH> part of the output will be necessary.

The next step is to manually add the new adminstrative user to the local file-based user directory of the embedded Tomcat. This is achieved by editing the file config/users.xml in your SAP Cloud Connector installation. In this example the installation was performed in the standard path for Linux systems, which is /opt/sap/scc. Open the file config/users.xml in an editor of your choice:

root@host:# vi /opt/sap/scc/config/users.xml

Alter its contents by adding a new line, like shown in the following example:

/opt/sap/scc/config/users.xml
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <role rolename="admin"/>
  <group groupname="initial"/>
  <user username="Administrator" password="********" roles="admin"/>
  <user username="<USERNAME>" password="<HASH>" roles="admin"/>
  [...]
</tomcat-users>

Substitute the placeholder <USERNAME> with the desired username of the new administrative user and substitute the placeholder <HASH> with the password hash generated above.

Restart the SAP Cloud Connector.

The newly created user should now be able to log in via the WebUI of the SAP Cloud Connector.

This kind of local file-based user directory is fairly simple to manage and to understand. I keep wondering why SAP would not document or even integrate this kind of user management in the WebUI of the SAP Cloud Connector. Another issue is the use of only one authorization role (admin or sccadmin), which is independent of the particular user directory used. This authorization role grants full administrative rights to the SAP Cloud Connector for a given user. A slightly more differentiated authorization scheme with roles for e.g. operations (connector restart and monitoring, no configuration changes) or monitoring (only monitoring, no connector restart, no configuration changes) purposes would be much more suitable for purposes in an enterprise environment.

// Experiences with Java and X.509 Certificates - Certificate Revokation Lists

As already mentioned in the article Experiences with Java and X.509 Certificates - Code Signing i recently had the task of researching two issues involving Java and X.509 certificates. While i'm familiar with X.509 certificates, i'm not too familiar with the inner workings of Java, the Java run-time environment or let alone programming in Java. So this was a good opportunity to familiarize myself with the inner workings of Java and also a great learning experience.

The second issue, which this blog post will be about, was in the area of TLS-secured HTTPS network connections, naturally involving a X.509 certificate on the server side. In particular the server processes of two Java application servers were connecting the application logic of two different systems via an API over a HTTPS based network connection. The source system would make RPC calls to an API on the target system in order to store data in and retrieve data from this system.

The communication between the two systems would work fine while using an unencrypted HTTP based network connection, but would fail while using a secured HTTPS network connections. The errors reported back to the server process on the source system which was initiating the communication, would point in the direction of an issue with the X.509 certificate that was being used on the target server. The error message would look something like this:

%% Invalidated:  [Session-1, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256]
main, SEND TLSv1.2 ALERT:  fatal, description = certificate_unknown
main, WRITE: TLSv1.2 Alert, length = 2
[Raw write]: length = 7
0000: 15 03 03 00 02 02 2E                               .......
main, called closeSocket()
main, handling exception: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException:
  PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find
  valid certification path to requested target
  Exception Failed to access the WSDL at: https://hostname:port/url/rpc-call. It failed with: 
        sun.security.validator.ValidatorException: PKIX path building failed:
            sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification
                path to requested target.
x.y.z.ExceptionHandler
        at x.y.z.method1(source1.java:line-number)
        at x.y.z.method2(source2.java:line-number)
        at x.y.z.method3(source3.java:line-number)

Checking the certificate used on the side of the target server would turn up nothing unusual. The certificate was issued by the internal CA of our infrastructure and like many other certificates looked valid. Verifiying this, by initiating a non-Java based HTTPS request (e.g. with a browser) to the target system worked fine. The certificate on the side of the target server was accepted and a secured HTTPS connection was successfully established. The next debugging step was to check the certstore on the side of the source server which was initiating the connection. It showed that a valid entry, containing the full chain of the root and the intermediate certificates of our internal CA, existed.

From an infrastructure point of view, everything looked reasonably good and should work just fine. Still, the connection between the two systems was failing.

Unfortunately, but inevitably the Java routines responsible for initiating the HTTPS connection on the source system were part of and thus embedded in a programming logic more complex than necessary or useful for the purpose of low-level debugging. For further debugging i wanted a Java-based test program which was stripped down to the bare minimum of initiating a HTTPS connection. Again, not being a seasoned Java programmer, i managed to cobble together the following test program from several examples on the internet:

CertChecker.java
import java.net.*;
import java.io.*;
import javax.net.ssl.*;
import java.security.GeneralSecurityException;
import java.security.KeyStore;
import java.security.SecureRandom;
import java.security.cert.*;
import java.util.*;
 
public class CertChecker implements X509TrustManager {
    public static void main(String[] args) throws Exception {
        try {
            SSLContext sc = SSLContext.getInstance("SSL");
            sc.init(null, new TrustManager[]{new CertChecker()}, new SecureRandom());
            SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
            SSLSocket socket = (SSLSocket)factory.createSocket("hostname", port);
            socket.startHandshake();
 
            PrintWriter out = new PrintWriter(
                                  new BufferedWriter(
                                  new OutputStreamWriter(
                                  socket.getOutputStream())));
 
            out.println("GET / HTTP/1.0");
            out.println();
            out.flush();
 
            if (out.checkError())
                System.out.println("SSLSocketClient:  java.io.PrintWriter error");
 
            BufferedReader in = new BufferedReader(
                                    new InputStreamReader(
                                    socket.getInputStream()));
 
            String inputLine;
            while ((inputLine = in.readLine()) != null)
                System.out.println(inputLine);
 
            in.close();
            out.close();
            socket.close();
 
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
 
    private final X509TrustManager defaultTM;
 
    public CertChecker() throws GeneralSecurityException {
        TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
        tmf.init((KeyStore) null);
        defaultTM = (X509TrustManager) tmf.getTrustManagers()[0];
    }
 
    public void checkServerTrusted(X509Certificate[] certs, String authType) {
        if (defaultTM != null) {
            try {
                defaultTM.checkServerTrusted(certs, authType);
                Set<TrustAnchor> trustAnchors = getTrustAnchors();
                System.out.println("Certificate valid");
            } catch (CertificateException ex) {
                System.out.println("Certificate invalid: " + ex.getMessage());
            }
        }
    }
 
    private Set<TrustAnchor> getTrustAnchors() {
        X509Certificate[] acceptedIssuers = defaultTM.getAcceptedIssuers();
        Set<TrustAnchor> trustAnchors = new HashSet<TrustAnchor>();
        for (X509Certificate acceptedIssuer : acceptedIssuers) {
            TrustAnchor trustAnchor = new TrustAnchor(acceptedIssuer, null);
            trustAnchors.add(trustAnchor);
        }
        return trustAnchors;
    }
 
    public void checkClientTrusted(X509Certificate[] chain, String authType) throws CertificateException {
    }
 
    public X509Certificate[] getAcceptedIssuers() {
        return null;
    }
}

In line 16 of the above Java program, the placeholders hostname and port were replaced by the appropriate values for the target server system. The code was then compiled and run against the target server system, with the following Java command line call:

user@host:$ java -Djava.security.debug=all -Djavax.net.debug=all CertChecker

The result from this was the following error message output:

[...]
certpath: BasicChecker.updateState issuer: CN=root-ca, DC=domain, DC=tld; subject: CN=intermediate-ca, DC=domain, DC=tld; serial#: ...
certpath: -checker6 validation succeeded
certpath: -Using checker7 ... [sun.security.provider.certpath.RevocationChecker]
certpath: RevocationChecker.check: checking cert
  SN:     26000000 0cff68dd 06c50fb0 4a000100 00000c
  Subject: CN=intermediate-ca, DC=domain, DC=tld
  Issuer: CN=root-ca, DC=domain, DC=tld
certpath: RevocationChecker.checkCRLs() ---checking revocation status ...
certpath: RevocationChecker.checkCRLs() possible crls.size() = 0
certpath: RevocationChecker.checkCRLs() approved crls.size() = 0
certpath: DistributionPointFetcher.getCRLs: Checking CRLDPs for CN=intermediate-ca, DC=domain, DC=tld
certpath: Trying to fetch CRL from DP http://crl-fqdn/filename.crl
certpath: CertStore URI:http://crl-fqdn/filename.crl
certpath: Downloading new CRL...
certpath: Exception fetching CRL:
java.security.cert.CRLException: Empty input
java.security.cert.CRLException: Empty input
        at sun.security.provider.X509Factory.engineGenerateCRL(X509Factory.java:397)
        at java.security.cert.CertificateFactory.generateCRL(CertificateFactory.java:497)
        at sun.security.provider.certpath.URICertStore.engineGetCRLs(URICertStore.java:419)
        at java.security.cert.CertStore.getCRLs(CertStore.java:181)
        at sun.security.provider.certpath.DistributionPointFetcher.getCRL(DistributionPointFetcher.java:245)
        at sun.security.provider.certpath.DistributionPointFetcher.getCRLs(DistributionPointFetcher.java:189)
        at sun.security.provider.certpath.DistributionPointFetcher.getCRLs(DistributionPointFetcher.java:121)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:552)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:465)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:367)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:337)
        at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:219)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:140)
        at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:79)
        at java.security.cert.CertPathValidator.validate(CertPathValidator.java:292)
        at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:347)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:249)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
        at CertChecker.main(CertChecker.java:31)
[...]
%% Invalidated:  [Session-1, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256]
main, SEND TLSv1.2 ALERT:  fatal, description = certificate_unknown
main, WRITE: TLSv1.2 Alert, length = 2
[Raw write]: length = 7
0000: 15 03 03 00 02 02 2E                               .......
main, called closeSocket()
main, handling exception: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: \
    PKIX path validation failed: java.security.cert.CertPathValidatorException: Could not determine revocation status
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: \
    java.security.cert.CertPathValidatorException: Could not determine revocation status
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:961)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
        at CertChecker.main(CertChecker.java:31)
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: \
    java.security.cert.CertPathValidatorException: Could not determine revocation status
        at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:352)
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:249)
        at sun.security.validator.Validator.validate(Validator.java:260)
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
        ... 8 more
Caused by: java.security.cert.CertPathValidatorException: Could not determine revocation status
        at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:135)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:219)
        at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:140)
        at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:79)
        at java.security.cert.CertPathValidator.validate(CertPathValidator.java:292)
        at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:347)
        ... 14 more
Caused by: java.security.cert.CertPathValidatorException: Could not determine revocation status
        at sun.security.provider.certpath.RevocationChecker.buildToNewKey(RevocationChecker.java:1092)
        at sun.security.provider.certpath.RevocationChecker.verifyWithSeparateSigningKey(RevocationChecker.java:910)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:577)
        at sun.security.provider.certpath.RevocationChecker.checkCRLs(RevocationChecker.java:465)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:367)
        at sun.security.provider.certpath.RevocationChecker.check(RevocationChecker.java:337)
        at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125)
        ... 19 more

The Java exceptions shown above suggested an issue with the Certificate Revokation List (CRL) of the root CA. This was confusing, since the certificate worked when the target server was accessed with a browser. Manually downloading the CRL was also possibly. Verifying the downloaded CRL with the help of OpenSSL showed no anomalies to which this issue could be attributed.

The line:

java.security.cert.CRLException: Empty input

from the above error messages, lead the way to solve this issue. This line could be interpreted in two ways:

  1. either there was a field in the CRL currently being parsed that had an empty or an unexpected value leading to a parse error
  2. or the downloaded CRL data which was handed over to the CRL parsing code was an empty set.

I decided to first check option number 2 and – if this turned out to be not the case – move on from there to check option number 1. Since a manual download of the CRL worked per se, i decided to take a closer look at the download process with the curl command line utility:

user@host:$ curl -v http://crl-fqdn/filename.crl
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.0.0.15...
* Connected to crl-fqdn (10.0.0.15) port 80 (#0)
> GET /filename.crl HTTP/1.1
> User-Agent: curl/7.38.0
> Host: crl-fqdn
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
< Location: https://crl-fqdn/filename.crl
< Server: loadbalancer-vendor
* HTTP/1.0 connection set to keep alive!
< Connection: Keep-Alive
< Content-Length: 0
< 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host crl-fqdn left intact

The HTTP request for the CRL file was in fact answered with a HTTP status code 302 and a new location for the CRL file, redirecting any client to a HTTPS secured location. What is apparently working for any regular browser or – with the “-L” command line option – even for the curl command line utility, seemed to be an issue for the part of the JRE that is handling the download of the CRL. Instead of following the HTTP redirect and requesting the CRL from the alternate location, the CRL-handling code within the JRE is given the content data that is returned from the initial HTTP request which is an empty set (see the Content-Length: 0 in the above curl output). This in turn leads to the failure of parsing the CRL, which leads to the failure verifying the certificate against the CRL, which leads to the failure of not establishing a TLS-secured HTTPS network connection between the source and the target server system.

The immediate and simple solution to this issue was to add an exception to the HTTP-to-HTTPS redirect for the URL of the CRL. This global HTTP-to-HTTPS redirect was implemented a while ago upon request by the information security department as an attempt make the site serving the CRL and other content more secure. Arguably serving CRLs with HTTPS is generally a bad idea, since it can cause a chicken-and-egg type of problem. In this case though it would have worked, since the host serving the CRL was protected by a certificate from a different CA than the one for which the CRL was being provided.

This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information