2022-04-21 // Keycloak Script Providers
Documentation
See:
Server Developer Guide - Service Provider Interfaces (SPI) - JavaScript providers
GitHub Keycloak Script Providers Repository
on how to prepare the Keycloak script providers for deployment into a Keycloak instance.
Protocol Mapper
With the so-called protocol mapper, Keycloak offers a method for the mapping of arbitrary attributes onto a specific authentication protocol (e.g. OpenID Connect) and its protocol specific attributes (e.g. Claims in OpenID Connect).
Script Mapper
If, for a specific use-case, there is no built-in protocol mapper available in Keycloak, it is possible to implement a protocol mapper in JavaScript. Such a script mapper will be running within the Keycloak application and will be executed via the Java Nashorn scripting interface.
The possibilities for what can be achieved with a script mapper are numerous. The following will only show the general approach with a basic example. It implements a recursive mapper for a role attribute (in this case the attribute policy
) or rather its values onto the OpenID Connect attribute policy
.
Unfortunately both the role attribute policy
as well as the OpenID Connect attribute policy
are currently hard coded in the script mapper, since i couldn't figure out a way to dynamically pass a value from the Keycloak configuration of the script mapper. Such a functionality seems to be only available for protocol mappers implemented in Java.
Development and Test
On a Keycloak test or development system edit the file:
vi <PATH_TO_KEYCLOAK>/standalone/configuration/profile.properties
and set the following parameters:
feature.scripts=enabled feature.upload_scripts=enabled
restart the Keycloak application.
After this the different mapper dialogs within the WebUI of the Keycloak application show an additional entry Script Mapper
in the drop-down menu Mapper Type
. If the Script Mapper
entry is selected, a editor dialog will be presented which can be used for the development and testing of the script mapper.
Attention: After finishing the development and testing of the script mapper the parameter feature.upload_scripts
shown above should be disabled again, since it poses a security risk!
Deployment
For the actual deployment of the previously developed script mapper onto a production Keycloak system the following preparation steps are necessary:
Directory creation:
mkdir -p /tmp/script_provider/role_attribute_mapper_policy/META-INF/
Copy the source code of the newly developed script mapper and save it into a file:
vi /tmp/script_provider/role_attribute_mapper_policy/role_attribute_mapper_policy.js
File contents:
/** * Available variables: * user - the current user * realm - the current realm * token - the current token * userSession - the current userSession * keycloakSession - the current keycloakSession */ /** * Change value to "true" in order to enable debugging * output to /var/log/keycloak/keycloak.log. */ var debug = false var ArrayList = Java.type("java.util.ArrayList"); var policies = new ArrayList(); /** * The actual debug output function */ function debugOutput(msg) { if (debug) print("Debug script mapper: " + msg); } /** * Helper function to determine **all** roles assigned * to a given user, even the indirectly assigned ones * (e.g. through group memberships). The built-in method * "getRealmRoleMappings" does unfortunately not work * here, since it only returns the **directly** assigned * roles (see: https://www.keycloak.org/docs-api/<version>/javadocs/org/keycloak/models/RoleMapperModel.html#getRealmRoleMappings-- * for details). */ function getAllUserRoles() { var userRoles = []; realm.getRoles().forEach(function(role) { if(user.hasRole(role)) { debugOutput('found role "' + role.getName() + '" assigned to user.'); userRoles.push(role); } else { debugOutput('role "' + role.getName() + '" is not assigned to user.'); } }); return userRoles; } /** * Check all roles assigned to a given user for a "policy" * role attribute. If the role attribute is present, split * the value into individual elements and insert each element * into the array to be returned. */ var roles = getAllUserRoles(); roles.forEach(function(role) { var policy = role.getAttribute("policy"); if (policy.length) { debugOutput('attribute "policy" found in role ' + role.getName()); policy.forEach(function(value) { var arrayOfValues = value.replace(/, /g, ',').split(','); arrayOfValues.forEach(function(value) { debugOutput('adding "policy" attribute value ' + value + ' to array'); if (policies.indexOf(value) < 0) policies.add(value); }); }); } }); /** * Return the array populated above to Keycloak */ debugOutput('final "policy" array ' + policies); exports = policies;
Create a deployment descriptor for the script mapper:
vi /tmp/script_provider/role_attribute_mapper_policy/META-INF/keycloak-scripts.json
File contents:
{ "mappers": [ { "name": "Role Attribute Mapper - policy", "fileName": "role_attribute_mapper_policy.js", "description": "Maps the 'policy' role attribute to the OIDC token of a user" } ] }
Pack the script mapper and the deployment descriptor into a deployable JAR file:
cd /tmp/script_provider/role_attribute_mapper_policy/ zip -r role_attribute_mapper_policy.jar META-INF/ role_attribute_mapper_policy.js
On the Keycloak system edit the file:
vi <PATH_TO_KEYCLOAK>/standalone/configuration/profile.properties
and set the following parameters:
feature.scripts=enabled
restart the Keycloak application.
Put the JAR file created above into the directory <PATH_TO_KEYCLOAK>/standalone/deployments/
. The deployment of the script mapper into the Keycloak application should happen automatically. This can be verified in the Keycloak log file by checking for log lines like:
INFO [org.jboss.as.repository] (DeploymentScanner-threads - 2) WFLYDR0001: Content added at location <PATH_TO_KEYCLOAK>/standalone/data/content/e1/714bbd9b178cd2004d0a2f999584030f06a54c/content INFO [org.jboss.as.server.deployment] (MSC service thread 1-1) WFLYSRV0027: Starting deployment of "role_attribute_mapper_policy.jar" (runtime-name: "role_attribute_mapper_policy.jar") INFO [org.keycloak.subsystem.server.extension.KeycloakProviderDeploymentProcessor] (MSC service thread 1-2) Deploying Keycloak provider: role_attribute_mapper_policy.jar INFO [org.jboss.as.server] (DeploymentScanner-threads - 2) WFLYSRV0010: Deployed "role_attribute_mapper_policy.jar" (runtime-name : "role_attribute_mapper_policy.jar")
Within the Keycloak application the availability of the new script mapper can be verified with:
Admin
→Server Info
→Providers
→protocol-mapper
→script-role_attribute_mapper_policy.js
or in one of the different mapper dialogs within whe WebUI of the Keycloak application in the drop-down menu
Mapper Type
as a new entryRole Attribute Mapper - policy
2019-02-24 // Check_MK Monitoring - SAP Cloud Connector
The SAP Cloud Connector provides a service which is used to connect on-premise systems – non-SAP, SAP ECC and SAP HANA – with applications running on the SAP Cloud Platform. This article introduces a new Check_MK agent and several service checks to monitor the status of the SAP Cloud Connector and its connections to the SAP Cloud Platform.
For the impatient and TL;DR here is the Check_MK package of the SAP Cloud Connector monitoring checks:
SAP Cloud Connector monitoring checks (Compatible with Check_MK versions 1.4.0p19 and later)
The sources are to be found in my Check_MK repository on GitHub
Monitoring the SAP Cloud Connector can be done in two different, not mutually exclusive, ways. The first approach uses the traditional application monitoring features, already built into Check_MK, like:
the presence and count of the application processes
the reachability of the applications TCP ports
the validity of the SSL Certificates
queries to the applications health-check URL
The first approach is covered by the section Built-in Check_MK Monitoring below.
The second approach uses a new Check_MK agent and several service checks, dedicated to monitor the internal status of the SAP Cloud Connector and its connections to the SAP Cloud Platform. The new Check_MK agent uses the monitoring API provided by the SAP Cloud Connector in order the monitor the application specific states and metrics. The monitoring endpoints on the SAP Cloud Connector currently used by this Check_MK agent are the:
“List of Subaccounts” (URL:
https://<scchost>:<sccport>/api/monitoring/subaccounts
)“List of Open Connections” (URL:
https://<scchost>:<sccport>/api/monitoring/connections/backends
)“Performance Monitor Data” (URL:
https://<scchost>:<sccport>/api/monitoring/performance/backends
)“Top Time Consumers” (URL:
https://<scchost>:<sccport>/api/monitoring/performance/toptimeconsumers
)
At the time of writing, there unfortunately is no monitoring endpoint on the SAP Cloud Connector for the Most Recent Requests metric. This metric is currently only available via the SAP Cloud Connectors WebUI. The Most Recent Requests metric would be a much more interesting and useful metric than the currently available Top Time Consumers or Performance Monitor Data, both of which have limitations. The application requests covered by the Top Time Consumers metric need a manual acknowledgement inside the SAP Cloud Connector in order to reset events with the longest request runtime, which limits the metrics usability for external monitoring tools. The Performance Monitor Data metric aggregates the application requests into buckets based on their overall runtime. By itself this can be useful for external monitoring tools and is in fact used by the Check_MK agent covered in this article. In the process of runtime bucket aggregation though, the Performance Monitor Data metric hides the much more useful breakdown of each request into runtime subsections (“External (Back-end)”, “Open Connection”, “Internal (SCC)”, “SSO Handling” and “Latency Effects”). Hopefully the Most Recent Requests metric will in the future also be exposed via the monitoring API provided by the SAP Cloud Connector. The new Check_MK agent can then be extended to use the newly exposed metric in order to gain a more fine grained insight into the runtime of application requests through the SAP Cloud Connector.
The second approach is covered by the section SAP Cloud Connector Agent below.
Built-in Check_MK Monitoring
Application Processes
To monitor the SAP Cloud Connector process, use the standard Check_MK check “State and count of processes”. This can be found in the WATO WebUI under:
-> Manual Checks -> Applications, Processes & Services -> State and count of processes -> Create rule in folder ... -> Rule Options Description: Process monitoring of the SAP Cloud Connector Checktype: [ps - State and Count of Processes] Process Name: SAP Cloud Connector -> Parameters [x] Process Matching [Exact name of the process without arguments] [/opt/sapjvm_8/bin/java] [x] Name of the operating system user [Exact name of the operating system user] [sccadmin] [x] Levels for process count Critical below [1] processes Warning below [1] processes Warning above [1] processes Critical above [2] processes -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems]
Application Health Check
To implement a rudimentary monitoring of the SAP Cloud Connector application health, use the standard Check_MK check “Check HTTP service” to query the Health Check endpoint of the monitoring API provided by the SAP Cloud Connector. The “Check HTTP service” can be found in the WATO WebUI under:
-> Host & Service Parameters -> Active checks (HTTP, TCP, etc.) -> Check HTTP service -> Create rule in folder ... -> Rule Options Description: SAP Cloud Connector (SCC) -> Check HTTP service Name: SAP SCC [x] Check the URL [x] URI to fetch (default is /) [/exposed?action=ping] [x] TCP Port [8443] [x] Use SSL/HTTPS for the connection: [Use SSL with auto negotiation] -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems]
SSL Certificates
To monitor the validity of the SSL certificate of the SAP Cloud Connector WebUI, use the standard Check_MK check “Check HTTP service”. The “Check HTTP service” can be found in the WATO WebUI under:
-> Host & Service Parameters -> Active checks (HTTP, TCP, etc.) -> Check HTTP service -> Create rule in folder ... -> Rule Options Description: SAP Cloud Connector (SCC) -> Check HTTP service Name: SAP SCC Certificate [x] Check SSL Certificate Age Warning at or below [30] days Critical at or below [60] days [x] TCP Port [8443] -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems]
SAP Cloud Connector Agent
The new Check_MK package to monitor the status of the SAP Cloud Connector and its connections to the SAP Cloud Platform consists of three major parts – an agent plugin, two check plugins and several auxiliary files and plugins (WATO plugins, Perf-o-meter plugins, metrics plugins and man pages).
Prerequisites
The following prerequisites are necessary in order for the SAP Cloud Connector agent to work properly:
A SAP Cloud Connector application user must be created for the Check_MK agent to be able to authenticate against the SAP Cloud Connector and gain access to the protected monitoring API endpoints. See the article SAP Cloud Connector - Configuring Multiple Local Administrative Users on how to create a new application user.
A DNS alias or an additional IP address for the SAP Cloud Connector service.
An additional host in Check_MK for the SAP Cloud Connector service with the previously created DNS alias or IP address.
Installation of the Python
requests
library on the Check_MK server. This library is used in the Check_MK agent pluginagent_sapcc
to perform the authentication and the HTTP requests against the monitoring API of the SAP Cloud Connector. On e.g. RHEL based systems it can be installed with:root@host:# yum install python-requests
Installation of the new Check_MK package for the SAP Cloud Connector monitoring checks on the Check_MK server.
SAP Cloud Connector Agent Plugin
The Check_MK agent plugin agent_sapcc
is responsible for querying the endpoints of the monitoring API on the SAP Cloud Connector, which are described above. It transforms the data returned from the monitoring endpoints into a format digestible by Check_MK. The following example shows the – anonymized and abbreviated – agent plugin output for a SAP Cloud Connector system:
<<<check_mk>>> Version: 0.1 <<<sapcc_connections_backends:sep(59)>>> subaccounts,abcdefghi,locationID;Test Location subaccounts,abcdefghi,regionHost;hana.ondemand.com subaccounts,abcdefghi,subaccount;abcdefghi <<<sapcc_performance_backends:sep(59)>>> subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,1,minimumCallDurationMs;10 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,1,numberOfCalls;1 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,2,minimumCallDurationMs;20 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,2,numberOfCalls;36 [...] subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,20,minimumCallDurationMs;3000 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,21,minimumCallDurationMs;4000 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,buckets,22,minimumCallDurationMs;5000 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,name;PROTOCOL/sapecc.example.com:44300 subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,protocol;PROTOCOL subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,virtualHost;sapecc.example.com subaccounts,abcdefghi,backendPerformance,PROTOCOL/sapecc.example.com:PORT,virtualPort;44300 subaccounts,abcdefghi,locationID;Test Location subaccounts,abcdefghi,regionHost;hana.ondemand.com subaccounts,abcdefghi,sinceTime;2019-02-13T08:05:36.084 +0100 subaccounts,abcdefghi,subaccount;abcdefghi <<<sapcc_performance_toptimeconsumers:sep(59)>>> subaccounts,abcdefghi,locationID;Test Location subaccounts,abcdefghi,regionHost;hana.ondemand.com subaccounts,abcdefghi,requests,0,externalTime;373 subaccounts,abcdefghi,requests,0,id;932284302 subaccounts,abcdefghi,requests,0,internalBackend;sapecc.example.com:PORT subaccounts,abcdefghi,requests,0,openRemoteTime;121 subaccounts,abcdefghi,requests,0,protocol;PROTOCOL subaccounts,abcdefghi,requests,0,receivedBytes;264 subaccounts,abcdefghi,requests,0,resource;/sap-webservice-url/ subaccounts,abcdefghi,requests,0,sentBytes;4650 subaccounts,abcdefghi,requests,0,startTime;2019-02-13T11:31:59.113 +0100 subaccounts,abcdefghi,requests,0,totalTime;536 subaccounts,abcdefghi,requests,0,user;RFC_USER subaccounts,abcdefghi,requests,0,virtualBackend;sapecc.example.com:PORT subaccounts,abcdefghi,requests,1,externalTime;290 subaccounts,abcdefghi,requests,1,id;1882731830 subaccounts,abcdefghi,requests,1,internalBackend;sapecc.example.com:PORT subaccounts,abcdefghi,requests,1,latencyTime;77 subaccounts,abcdefghi,requests,1,openRemoteTime;129 subaccounts,abcdefghi,requests,1,protocol;PROTOCOL subaccounts,abcdefghi,requests,1,receivedBytes;264 subaccounts,abcdefghi,requests,1,resource;/sap-webservice-url/ subaccounts,abcdefghi,requests,1,sentBytes;4639 subaccounts,abcdefghi,requests,1,startTime;2019-02-13T11:31:59.114 +0100 subaccounts,abcdefghi,requests,1,totalTime;532 subaccounts,abcdefghi,requests,1,user;RFC_USER subaccounts,abcdefghi,requests,1,virtualBackend;sapecc.example.com:PORT [...] subaccounts,abcdefghi,requests,49,externalTime;128 subaccounts,abcdefghi,requests,49,id;1774317106 subaccounts,abcdefghi,requests,49,internalBackend;sapecc.example.com:PORT subaccounts,abcdefghi,requests,49,protocol;PROTOCOL subaccounts,abcdefghi,requests,49,receivedBytes;263 subaccounts,abcdefghi,requests,49,resource;/sap-webservice-url/ subaccounts,abcdefghi,requests,49,sentBytes;4660 subaccounts,abcdefghi,requests,49,startTime;2019-02-16T11:32:09.352 +0100 subaccounts,abcdefghi,requests,49,totalTime;130 subaccounts,abcdefghi,requests,49,user;RFC_USER subaccounts,abcdefghi,requests,49,virtualBackend;sapecc.example.com:PORT subaccounts,abcdefghi,sinceTime;2019-02-13T08:05:36.085 +0100 subaccounts,abcdefghi,subaccount;abcdefghi <<<sapcc_subaccounts:sep(59)>>> subaccounts,abcdefghi,displayName;Test Application subaccounts,abcdefghi,locationID;Test Location subaccounts,abcdefghi,regionHost;hana.ondemand.com subaccounts,abcdefghi,subaccount;abcdefghi subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,connectionCount;8 subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,name;abcdefg:hijklmnopqr subaccounts,abcdefghi,tunnel,applicationConnections,abcdefg:hijklmnopqr,type;JAVA subaccounts,abcdefghi,tunnel,connectedSince;2019-02-14T10:11:00.630 +0100 subaccounts,abcdefghi,tunnel,connections;8 subaccounts,abcdefghi,tunnel,state;Connected subaccounts,abcdefghi,tunnel,user;P123456
The agent plugin comes with a Check_MK check plugin of the same name, which is solely responsible for the construction of the command line arguments from the WATO configuration and passing it to the Check_MK agent plugin.
With the additional WATO plugin sapcc_agent.py
it is possible to configure the username and password for the SAP Cloud Connector application user which is used to connect to the monitoring API. It is also possible to configure the TCP port and the connection timeout for the connection to the monitoring API through the WATO WebUI and thus override the default values. The default value for the TCP port is 8443, the default value for the connection timeout is 30 seconds. The configuration options for the Check_MK agent plugin agent_sapcc
can be found in the WATO WebUI under:
-> Host & Service Parameters -> Datasource Programs -> SAP Cloud Connector systems -> Create rule in folder ... -> Rule Options Description: SAP Cloud Connector (SCC) -> SAP Cloud Connector systems SAP Cloud Connector user name: [username] SAP Cloud Connector password: [password] SAP Cloud Connector TCP port: [8443] -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems]
After saving the new rule, restarting Check_MK and doing an inventory on the additional host for the SAP Cloud Connector service in Check_MK, several new services starting with the name prefix SAP CC should appear.
The following image shows a status output example from the WATO WebUI with the service checks HTTP SAP SCC TLS and HTTP SAP SCC TLS Certificate from the Built-in Check_MK Monitoring described above. In addition to those, the example also shows the service checks based on the data from the SAP Cloud Connector Agent. The service checks SAP CC Application Connection, SAP CC Subaccount and SAP CC Tunnel are provided by the check plugin sapcc_subaccounts
, the service check SAP CC Perf Backend is provided by the plugin sapcc_performance_backends
:
SAP Cloud Connector Subaccount
The check plugin sapcc_subaccounts
implements the three sub-checks sapcc_subaccounts.app_conn
, sapcc_subaccounts.info
and sapcc_subaccounts.tunnel
.
Info
The sub-check sapcc_subaccounts.info
just gathers information on several configuration options for each subaccount on the SAP Cloud Connector and displays them in the status details of the check. These configuration options are the:
subaccount name on the SAP Cloud Platform to which the connection is made.
display name of the subaccount.
location ID of the subaccount.
the region host of the SAP Cloud Platform to which the SAP Cloud Connector establishes a connection.
The sub-check sapcc_subaccounts.info
always returns an OK
status. No performance data is currently reported by this check.
Tunnel
The sub-check sapcc_subaccounts.tunnel
is responsible for the monitoring of each tunnel connection for each subaccount on the SAP Cloud Connector. Upon inventory this sub-check creates a service check for each tunnel connection found on the SAP Cloud Connector. During normal check execution, the status of the tunnel connection is determined for each inventorized item. If the tunnel connection is not in the Connected
state, an alarm is raised accordingly. Additionally, the number of currently active connections over a tunnel as well as the elapsed time in seconds since the tunnel connection was established are determined for each inventorized item. If either the value of the currently active connections or the number of seconds since the connection was established are above or below the configured warning and critical threshold values, an alarm is raised accordingly. For both values, performance data is reported by the check.
With the additional WATO plugin sapcc_subaccounts.py
it is possible to configure the warning and critical levels for the sub-check sapcc_subaccounts.tunnel
through the WATO WebUI and thus override the following default values:
Metric | Warning Low Threshold | Critical Low Threshold | Warning High Threshold | Critical High Threshold |
---|---|---|---|---|
Number of connections | 0 | 0 | 30 | 40 |
Connection duration | 0 sec | 0 sec | 284012568 sec | 315569520 sec |
The configuration options for the tunnel connection levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Applications, Processes & Services -> SAP Cloud Connector Subaccounts -> Create Rule in Folder ... -> Rule Options Description: SAP Cloud Connector Subaccounts -> Parameters [x] Number of tunnel connections Warning if equal or below [0] connections Critical if equal or below [0] connections Warning if equal or above [30] connections Critical if equal or above [40] connections [x] Connection time of tunnel connections Warning if equal or below [0] seconds Critical if equal or below [0] seconds Warning if equal or above [284012568] seconds Critical if equal or above [315569520] seconds -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems] and/or Application Or Tunnel Name [x] Specify explicit values [Tunnel name]
The above image with a status output example from the WATO WebUI shows one sapcc_subaccounts.tunnel
service check as the last of the displayed items. The service name is prefixed by the string SAP CC Tunnel and followed by the subaccount name, which in this example is anonymized. For each tunnel connection the connection state, the overall number of application connections currently active over the tunnel, the time when the tunnel connection was established and the number of seconds elapsed since establishing the connection are shown. The overall number of currently active application connections is also visualized in the perf-o-meter, with a logarithmic scale growing from the left to the right.
The following image shows an example of the two metric graphs for a single sapcc_subaccounts.tunnel
service check:
The upper graph shows the time elapsed since the tunnel connection was established. The lower graph shows the overall number of application connections currently active over the tunnel connection. Both graphs would show warning and critical thresholds values, which in this example are currently outside the displayed range of values for the y-axis.
Application Connection
The sub-check sapcc_subaccounts.app_conn
is responsible for the monitoring of each applications connection through each tunnel connection for each subaccount on the SAP Cloud Connector. Upon inventory this sub-check creates a service check for each application connection found on the SAP Cloud Connector. During normal check execution, the number of currently active connections for each application is determined for each inventorized item. If the value of the currently active connections is above or below the configured warning and critical threshold values, an alarm is raised accordingly. For the number of currently active connections, performance data is reported by the check.
With the additional WATO plugin sapcc_subaccounts.py
it is possible to configure the warning and critical levels for the sub-check sapcc_subaccounts.app_conn
through the WATO WebUI and thus override the following default values:
Metric | Warning Low Threshold | Critical Low Threshold | Warning High Threshold | Critical High Threshold |
---|---|---|---|---|
Number of connections | 0 | 0 | 30 | 40 |
The configuration options for the tunnel connection levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Applications, Processes & Services -> SAP Cloud Connector Subaccounts -> Create Rule in Folder ... -> Rule Options Description: SAP Cloud Connector Subaccounts -> Parameters [x] Number of application connections Warning if equal or below [0] connections Critical if equal or below [0] connections Warning if equal or above [30] connections Critical if equal or above [40] connections -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems] and/or Application Or Tunnel Name [x] Specify explicit values [Application name]
The above image with a status output example from the WATO WebUI shows one sapcc_subaccounts.app_conn
service check as the 5th item from top of the displayed items. The service name is prefixed by the string SAP CC Application Connection and followed by the application name, which in this example is anonymized. For each application connection the number of currently active connections and the connection type are shown. The number of currently active application connections is also visualized in the perf-o-meter, with a logarithmic scale growing from the left to the right.
The following image shows an example of the metric graph for a single sapcc_subaccounts.app_conn
service check:
The graph shows the number of currently active application connections. The graph would show warning and critical thresholds values, which in this example are currently outside the displayed range of values for the y-axis.
SAP Cloud Connector Performance Backends
The check sapcc_performance_backends
is responsible for the monitoring of the performance of each (on-premise) backend system connected to the SAP Cloud Connector. Upon inventory this check creates a service check for each backend connection found on the SAP Cloud Connector. During normal check execution, the number of requests to the backend system, categorized in one of the 22 runtime buckets is determined for each inventorized item. From these raw values, the request rate in requests per second is derived for each of the 22 runtime buckets. Also from the raw values, the following four additional metrics are derived:
calls_total
: the total request rate over all of the 22 runtime buckets.calls_pct_ok
: the relative number of requests in percent with a runtime below a given runtime warning threshold.calls_pct_warn
: the relative number of requests in percent with a runtime above a given runtime warning threshold.calls_pct_crit
: the relative number of requests in percent with a runtime above a given runtime critical threshold.
If the relative number of requests is above the configured warning and critical threshold values, an alarm is raised accordingly. For each of the 22 runtime buckets, the total number of requests and the relative number of requests (calls_pct_ok
, calls_pct_warn
, calls_pct_crit
), performance data is reported by the check.
With the additional WATO plugin sapcc_performance_backends.py
it is possible to configure the warning and critical levels for the check sapcc_performance_backends
through the WATO WebUI and thus override the following default values:
Metric | Warning Threshold | Critical Threshold |
---|---|---|
Request runtime | 500 msec | 1000 msec |
Percentage of requests over request runtime thresholds | 10% | 5% |
The configuration options for the backend performance levels can be found in the WATO WebUI under:
-> Host & Service Parameters -> Parameters for discovered services -> Applications, Processes & Services -> SAP Cloud Connector Backend Performance -> Create Rule in Folder ... -> Rule Options Description: SAP Cloud Connector Backend Performance -> Parameters [x] Runtime bucket definition and calls per bucket in percent Warning if percentage of calls in warning bucket equal or above [10.00] % Assign calls to warning bucket if runtime equal or above [500] milliseconds Critical if percentage of calls in critical bucket equal or above [5.00] % Assign calls to critical bucket if runtime equal or above [1000] milliseconds -> Conditions Folder [The folder containing the SAP Cloud Connector systems] and/or Explicit hosts [x] Specify explicit host names [SAP Cloud Connector systems] and/or Backend Name [x] Specify explicit values [Backend name]
The above image with a status output example from the WATO WebUI shows one sapcc_performance_backends
service check as the 6th item from top of the displayed items. The service name is prefixed by the string SAP CC Perf Backend and followed by a string concatenated from the protocol, FQDN and TCP port of the backend system, which in this example is anonymized. For each backend connection the total number of requests, the total request rate, the percentage of requests below the runtime warning threshold, the percentage of requests above the runtime warning threshold and the percentage of requests above the runtime critical threshold are shown. The relative number of requests in percent are also visualized in the perf-o-meter.
The following image shows an example of the metric graph for the total request rate from the sapcc_performance_backends
service check:
The following image shows an example of the metric graph for the relative number of requests from the sapcc_performance_backends
service check:
The graph shows the percentage of requests below the runtime warning threshold in the color green at the bottom, the percentage of requests above the runtime warning threshold in the color yellow stacked above and the percentage of requests above the runtime critical threshold in the color red stacked at the top.
The following image shows an example of the combined metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends
service check:
To provide a better overview, the individual metrics are grouped together into three graphs. The first graph shows the request rate in the runtime buckets >=10ms, >=20ms, >=30ms, >=40ms, >=50ms, >=75ms and >=100ms. The second graph shows the request rate in the runtime buckets >=125ms, >=150ms, >=200ms, >=300ms, >=400ms, >=500ms, >=750ms and >=1000ms. The third and last graph shows the request rate in the runtime buckets >=1250ms, >=1500ms, >=2000ms, >=2500ms, >=3000ms, >=4000ms and >=5000ms.
The following image shows an example of the individual metric graphs for the request rates to a single backend system in each of the 22 runtime buckets from the sapcc_performance_backends
service check:
Each of the metric graphs shows exactly the same data as the previously show combined graphs. The combined metric graphs are actually based on the individual metric graphs for the request rates to a single backend system.
Conclusion
The newly introduced checks for the SAP Cloud Connector enables you to monitor several application specific aspects of the SAP Cloud Connector with your Check_MK Server. The combination of built-in Check_MK monitoring facilities and a new agent plugin for the SAP Cloud Connector complement each other in this regard. While the new SAP Cloud Connector agent plugin for Check_MK utilizes most of the data provided by the monitoring endpoints on the SAP Cloud Connector, a more in-depth monitoring could be achieved if the data from the Most Recent Requests metric would also be exposed over the monitoring API of SAP Cloud Connector. It hope this will be the case in a future release of the SAP Cloud Connector.
I hope you find the provided new check useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2019-01-06 // Installing Anitya Upstream Release Monitoring on RHEL 7
Anitya is a release monitoring project developed and provided by the Fedora (Fedora) project. It is used to monitor upstream software projects for new releases in a automated fashion. It is thus similar to the uscan
command from the Debian (Debian) devscripts
package. Unlike the uscan
command, Anitya is written in Python (Python) and comes with an integrated WebUI. Also unlike uscan
, Anitya uses the FedMsg (Federated Message Bus) in order to publish messages indicating a detected upstream release event. These messages can in turn be consumed by other systems in order to trigger various actions upon a new upstream release. Anitya can be used “as a service” through https://release-monitoring.org/ or be installed as a dedicated instance. This article describes the necessary steps to set up an own, dedicated Anitya instance on a RHEL 7 system.
According to the official Anitya installation and configuration documentation there are only two steps – pip install anitya
and editing /etc/anitya/anitya.toml
– needed to set up Anitya. I have found this a bit too optimistic in case of a RHEL 7 system which was the installation target in my case. Here is a more detailed description of the necessary steps to get Anitya up and running on RHEL 7:
Install some basic Anitya dependencies like Python, the Python installer, the Apache webserver, the Apache WSGI module and some development files:
root@host:# yum install python httpd mod_wsgi python2-pip python-devel zeromq-devel
Install Anitya and its dependencies:
root@host:# pip2 install anitya
Unfortunately, the Python 2 environment is necessary since the Apache WSGI module in RHEL 7 is compiled against Python 2.
Adjust the import of the Anitya classes in the WGSI application. Open the file
/usr/lib/python2.7/site-packages/anitya/wsgi.py
in an editor of your choice:root@host:# vi /usr/lib/python2.7/site-packages/anitya/wsgi.py
Alter its contents like shown in the following patch:
- wsgi.py
--- /usr/lib/python2.7/site-packages/anitya/wsgi.py.orig 2018-12-05 16:59:26.000000000 +0100 +++ /usr/lib/python2.7/site-packages/anitya/wsgi.py 2018-12-05 16:40:40.000000000 +0100 @@ -14,7 +14,7 @@ # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA -from .app import create +from anitya.app import create application = create()
Create and adjust the basic Anitya configuration.
Create the Anitya configuration directory:
root@host:# mkdir /etc/anitya
and download the sample configuration file from the Anitya GitHub repository:
root@host:# wget -O /etc/anitya/anitya.toml https://raw.githubusercontent.com/release-monitoring/anitya/master/files/anitya.toml.sample
Generate a secret key with e.g. the
pwgen
command:root@host:# pwgen -sncB 32 1
Adjust the several settings in the Anitya configuration file. Open the file
/etc/anitya/anitya.toml
in an editor of your choice:root@host:# vi /etc/anitya/anitya.toml
Alter the four configuration entries
admin_email
,secret_key
,db_url
anddefault_regex
like shown in the following example:- /etc/anitya/anitya.toml
admin_email = "<YOUR-EMAIL-ADDRESS>" secret_key = "<SECRET-KEY>" db_url = "sqlite:////var/www/html/anitya.sqlite" default_regex = '''(?i)%(name)s(?:[-_]?(?:minsrc|src|source))?[-_]([^-/_\s]+?)(?:[-_] (?:minsrc|src|source|asc|release))?\.(?:tar|t[bglx]z|tbz2|zip)'''
Substitute the placeholder
<YOUR-EMAIL-ADDRESS>
with the email address at which you want to receive emails about errors of the Anitya application. This address will also be used as part of the request headers which identify the Anitya HTTP user agent. Substitute the placeholder<SECRET-KEY>
with the secret key generated above.In case a SQLite database is used, make sure the value of the
db_url
entry points to a location within the filesystem that is reachable from the context of the webserver the Anitya application is running in. If e.g. the Apache webserver is used, the SQLite database should be located somewhere within theDocumentRoot
of the Apache VirtualHost.Be careful with the syntax of the configuration file and the configuration entries in there. It seems that the configuration parsing part of Anitya is not very robust. A single invalid or syntactically incorrect entry will cause Anitya to skip all other entries as well and fall back to its default values. It took me a while to figure out that the Anitya database was generated and searched in the default location
sqlite:////var/tmp/anitya-dev.sqlite
no matter which value was given for thedb_url
configuration entry, just because another configuration entry – specifically thedefault_regex
– had a syntactically incorrect value.Download the Anitya cron job from the Anitya GitHub repository:
root@host:# cd /usr/lib/python2.7/site-packages/anitya root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/<RELEASE>/files/anitya_cron.py
The Anitya cron job should match the installed Anitya release. To ensure this, substitute the placeholder
<RELEASE>
in the above URL with the installed Anitya version. E.g. for the Anitya version 0.13.2:root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/0.13.2/files/anitya_cron.py
Make the Anitya cron job executeable:
root@host:# chmod 755 /usr/lib/python2.7/site-packages/anitya/anitya_cron.py
and adjust the Python interpreter to the RHEL 7 standard by opening the file
/usr/lib/python2.7/site-packages/anitya/anitya_cron.py
in an editor of your choice:root@host:# vi /usr/lib/python2.7/site-packages/anitya/anitya_cron.py
Alter its contents like shown in the following patch:
- anitya_cron.py.patch
# diff -uwi anitya_cron.py.orig anitya_cron.py --- anitya_cron.py.orig 2018-12-23 19:52:31.000000000 +0100 +++ anitya_cron.py 2018-12-23 19:52:37.000000000 +0100 @@ -1,4 +1,4 @@ -#!/usr/bin/env python3 +#!/usr/bin/env python2 # -*- coding: utf-8 -*- import sys
Download the Anitya database creation and initialization script from the Anitya GitHub repository:
root@host:# cd /usr/lib/python2.7/site-packages/anitya root@host:# wget https://raw.githubusercontent.com/release-monitoring/anitya/master/createdb.py
Run the downloaded script in order to create and initialize the Anitya database:
root@host:# python createdb.py
To grant access to the webserver the Anitya application is running in, adjust the ownership and the permissions of the Anitya database file:
root@host:# chown apache:apache /var/www/html/anitya.sqlite root@host:# chmod 644 /var/www/html/anitya.sqlite
Reapply changes to the database from a missing database version.
Configure Alembic by opening the file
/usr/lib/python2.7/site-packages/anitya/alembic.ini
in an editor of your choice:root@host:# cd /usr/lib/python2.7/site-packages/anitya root@host:# vi alembic.ini
Insert the contents like shown in the following configuration example:
- alembic.ini
# A generic, single database configuration. [alembic] # path to migration scripts script_location = /usr/lib/python2.7/site-packages/anitya/db/migrations # template used to generate migration files # file_template = %%(rev)s_%%(slug)s # max length of characters to apply to the # "slug" field #truncate_slug_length = 40 # set to 'true' to run the environment during # the 'revision' command, regardless of autogenerate # revision_environment = false # set to 'true' to allow .pyc and .pyo files without # a source .py file to be detected as revisions in the # versions/ directory # sourceless = false sqlalchemy.url = sqlite:////var/www/html/anitya.sqlite # Logging configuration [loggers] keys = root,sqlalchemy,alembic [handlers] keys = console [formatters] keys = generic [logger_root] level = WARN handlers = console qualname = [logger_sqlalchemy] level = WARN handlers = qualname = sqlalchemy.engine [logger_alembic] level = INFO handlers = qualname = alembic [handler_console] class = StreamHandler args = (sys.stderr,) level = NOTSET formatter = generic [formatter_generic] format = %(levelname)-5.5s [%(name)s] %(message)s datefmt = %H:%M:%S
Initialize the Alembic table in the Anitya database and show the current database version:
root@host:# alembic current 7a8c4aa92678 (head)
Manually change the database version known to Alembic in order to trick it into again applying the missing change to the database:
root@host:# sqlite3 /var/www/html/anitya.sqlite sqlite> INSERT INTO alembic_version (version_num) VALUES ('a52d2fe99d4f'); sqlite> .quit
Reapply the change to the database from the missing database version
feeaa70ead67
:root@host:# alembic upgrade feeaa70ead67
Determine the current “head” database version and manually change the database version known to Alembic:
root@host:# alembic history 540bdcf7edbc -> 7a8c4aa92678 (head), Add missing GitHub owner/project pairs 27342bce1d0f -> 540bdcf7edbc, Convert GitHub URL to owner/project [...] root@host:# sqlite3 /var/www/html/anitya.sqlite sqlite> UPDATE alembic_version SET version_num = '7a8c4aa92678'; sqlite> .quit
Setup the Anitya cron job with the previously downloaded script by opening the file
/etc/cron.d/anitya
in an editor of your choice:root@host:# vi /etc/cron.d/anitya
Insert the contents like shown in the following example:
- /etc/cron.d/anitya
*/5 * * * * root /usr/lib/python2.7/site-packages/anitya/anitya_cron.py 1>>/var/log/anitya.log 2>&1
Adjust the execution interval according to your environment and needs.
Create a VirtualHost configuration for e.g. the Apache webserver in whose context the Anitya application will be running in and start the Apache webserver.
Create the VirtualHost configuration by opening the file
/etc/httpd/conf.d/anitya.conf
in an editor of your choice:root@host:# vi /etc/httpd/conf.d/anitya.conf
Insert the contents like shown in the following configuration example:
- /etc/httpd/conf.d/anitya.conf
<VirtualHost <IP>:<PORT>> DocumentRoot "/var/www/html/" ServerName <FQDN> <Directory /> Options FollowSymLinks AllowOverride None </Directory> WSGIDaemonProcess anitya user=apache group=apache processes=2 threads=25 python-path=/usr/lib/python2.7/site-packages WSGIProcessGroup anitya WSGIScriptAlias /anitya /usr/lib/python2.7/site-packages/anitya/wsgi.py process-group=anitya <Directory "/usr/lib/python2.7/site-packages/anitya"> <Files wsgi.py> Order deny,allow Allow from all Require all granted </Files> Order allow,deny Options Indexes FollowSymLinks Allow from all </Directory> Alias "/anitya/static" "/usr/lib/python2.7/site-packages/anitya/static" <Directory "/usr/lib/python2.7/site-packages/anitya/static"> Order allow,deny Options Indexes FollowSymLinks Allow from all Require all granted </Directory> </VirtualHost>
Substitute the placeholders
<IP>
and<PORT>
with the IP address and TCP port on which the virtual host should be listening. Substitute the placeholder<FQDN>
with the fully qualified domain name under which you want to reach the Anitya application with your browser.Enable and start the Apache webserver:
root@host:# systemctl enable httpd root@host:# systemctl start httpd
After performing the steps outlined above, your own, dedicated Anitya instance on a RHEL 7 system should be ready. The WebUI of the Anitya instance will be reachable with a browser under the following URL http://<FQDN>/anitya
, where the placeholder <FQDN>
has to be substituted with the fully qualified domain name given in the above Apache VirtualHost configuration.
After logging into Anitya and adding an upstream software project, the next run of the Anitya cron job should discover the initial and current versions of the upstream software project. This should be visible in two places, firstly in the logfile of the Anitya cron job /var/log/anitya.log
and secondly on the projects page of the Anitya WebUI. If this is not the case after the next execution interval of the Anitya cron job, review the logfile /var/log/anitya.log
for possible errors.
I hope you find the provided Anitya installation and configuration guide for a RHEL 7 target system useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with this guide. In future articles i will – hopefully soon – write about the neccessary steps to set up the FedMsg (Federated Message Bus) and connect Anitya to it, as well as an enhancement to Anitya itself.
2018-12-01 // Check_MK Monitoring - Brocade / Broadcom Fibre Channel Switches
This article provides patches for the standard Check_MK distribution in order to fix and enhance the support for the monitoring of Brocade / Broadcom Fibre Channel Switches.
Out of the box, there is currently already monitoring support available for Brocade / Broadcom Fibre Channel Switches in the standard Check_MK distribution. Unfortunately there are several issues in the check brocade_fcport
, which is used to monitor the status and several metrics of the fibre channel switch ports. Those issues prevent the check from working as intended and are fixed in the version provided in this article. Up until recently, there also was no support in the standard Check_MK distribution for monitoring CPU and memory metrics on a switch level and no support for monitoring SFP metrics on a port level. This article introduces the new checks brocade_cpu
, brocade_mem
and brocade_sfp
to cover those metrics. With the most recent upstream version 1.5.x of the Check_MK distribution, there are now the standard checks brocade_sys
(covering CPU and memory metrics) and brocade_sfp
(covering the SFP port metrics) available, providing basically the same functionality.
For the impatient and TL;DR here is the enhanced version of the brocade_fcport
check:
Enhanced version of the brocade_fcport check
And the new brocade_cpu
, brocade_mem
and brocade_sfp
checks:
The new brocade_cpu check
The new brocade_mem check
The new brocade_sfp check
The sources to the new and enhanced versions of all the checks can be found in my Check_MK Plugins repository on GitHub.
Additional Checks
CPU Usage
The Check_MK service check brocade_cpu
monitors the current CPU utilization of Brocade / Broadcom fibre channel switches. It uses the SNMP OID swCpuUsage
from the fibre channel switch MIB (SW-MIB
) in order to create one service check for each CPU found in the system. The current CPU utilization in percent is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the standard WATO plugin for CPU utilization it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of CPU utilization).
The following image shows a status output example for the brocade_cpu
service check from the WATO WebUI:
This example shows the current CPU utilization of the system in percent.
The following image shows an example of the service metrics graph for the brocade_cpu
service check:
The selected example graph shows the current CPU utilization of the system in percent as well as the default warning and critical threshold values.
Memory Usage
The Check_MK service check brocade_mem
monitors the current memory (RAM) usage on Brocade / Broadcom fibre channel switches. It uses the SNMP OID swMemUsage
from the fibre channel switch MIB (SW-MIB
) in order to create a service check for the overall memory usage on the system. The amount of currently used memory is compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values (warning: 80%; critical: 90% of used memory). The configuration options for the used memory levels can be found under:
-> Host & Service Parameters -> Parameters for discovered services -> Operating System Resources -> Brocade Fibre Channel Memory Usage -> Create rule in folder ... [x] Levels for memory usage
The following image shows a status output example for the brocade_mem
service check from the WATO WebUI:
This example shows the current memory utilization of the system in percent.
The following image shows an example of the service metrics graph for the brocade_mem
service check:
The selected example graph shows the current memory utilization of the system in percent as well as the default warning and critical threshold values.
SFP Health
The Check_MK service check brocade_sfp
monitors several metrics of the SFPs in all enabled and active ports of Brocade / Broadcom fibre channel switches. It uses several SNMP OIDs from the fibre channel switch MIB (SW-MIB
) and from the Fabric Alliance Extension MIB (FA-EXT-MIB
) in order to create one service check for each SFP found in an enabled and active port on the system. The OIDs from the fibre channel switch MIB (swFCPortSpecifier
, swFCPortName
, swFCPortPhyState
, swFCPortOpStatus
, swFCPortAdmStatus
) are used to determine the number, name and status of the switch port. The OIDs from the Fabric Alliance Extension MIB are used to determine the actual SFP metrics. Those metrics are the SFP temperature (swSfpTemperature
), voltage (swSfpVoltage
) and current (swSfpCurrent
), as well as the optical receive and transmit power (swSfpRxPower
and swSfpTxPower
) of the SFP. Unfortunately the SFP metrics are not available on all Brocade / Broadcom switch hardware platforms and FabricOS version. This check was verified to work with Gen6 hardware and FabricOS v8. Another limitation is, that the SFP metrics are not available in real-time, but are only gathered in a 5 minute interval by the switch from all the SFPs in the switch. This is most likely a precausion as not to overload the processing capacity on the SFPs with too many status requests.
The current temperature level as well as the optical receive and transmit power levels are compared to either the default or configured warning and critical threshold values, and an alarm is raised accordingly. With the added WATO plugin it is possible to configure the warning and critical levels through the WATO WebUI and thus override the default values:
Metric | Warning Threshold | Critical Threshold |
---|---|---|
System temperature | 55°C | 65°C |
Optical receive power | -7.0 dBm | -9.0 dBm |
Optical transmit power | -2.0 dBm | -3.0 dBm |
The configuration options for the used temperature and optical receive and transmit power levels can be found under:
-> Host & Service Parameters -> Parameters for discovered services -> Networking -> Brocade Fibre Channel SFP -> Create rule in folder ... [x] Temperature levels in degrees celcius [x] Receive power levels in dBm [x] Transmit power levels in dBm
Currently, the electrical current and voltage metrics of the SFPs are only used for long-term trends via the respective service metric template and thus are not used to raise any alarms. This is due to the fact that there is little to no information available on which precise electrical current and voltage levels would constitute as an indicator for an immediate or impending failure state.
The following image shows several status output examples for the brocade_sfp
service check from the WATO WebUI:
This example shows SFP Port service check items for eight consecutive ports on a switch. For each item the current temperature, voltage and current, as well as the optical receive and transmit power of the SFP are shown. The optical receive and transmit power levels are also visualized in the Perf-O-Meter, optical receive power growing from the middle to the left, optical transmit power growing from the middle to the right.
The following image shows an example of the service metrics graph for the brocade_sfp
service check:
The first graph shows the optical receive and transmit power of the SFP. The second shows the electrical current drawn by the SFP. The third graph shows the temperature of the SFP. The fourth and last graph shows the electrical voltage provided to the SFP.
Modified Check
Fibre Channel Port
The Check_MK service check brocade_fcport
has several issuse which are addressed by the following set of patches. The patch is actually a monolithic one and is just broken up into individual patches here for ease of discussion.
The first patch is a simple, but ugly workaround to prevent the conversion of OID_END
, which carries the port index information, from being treated as a BINARY
SNMP value:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -154,8 +135,8 @@ bbcredits = None if len(if64_info) > 0: fcmgmt_portstats = [] - for oidend, dummy, tx_elements, rx_elements, bbcredits_64 in if64_info: - if int(index) == int(oidend.split(".")[-1]): + for oidend, tx_elements, rx_elements, bbcredits_64 in if64_info: + if index == oidend.split(".")[-1]: fcmgmt_portstats = [ binstring_to_int(''.join(map(chr, tx_elements))) / 4, binstring_to_int(''.join(map(chr, rx_elements))) / 4, @@ -477,7 +426,6 @@ # Not every device supports that (".1.3.6.1.3.94.4.5.1", [ OID_END, - "1", # Dummy value, otherwise OID_END is also treated as a BINARY value BINARY("6"), # FCMGMT-MIB::connUnitPortStatCountTxElements BINARY("7"), # FCMGMT-MIB::connUnitPortStatCountRxElements BINARY("8"), # FCMGMT-MIB::connUnitPortStatCountBBCreditZero
This is achieved by simply inserting a dummy value into the list of SNMP OIDs in the snmp_info
variable. I haven't had time to dig out the root cause of this behaviour, but i guess it must be somewhere in the core SNMP components of Check_MK. The patch also implicitly addresses and fixes an issue where two variables with differently typed content are being compared. This is achieved by simply changing the line:
if index == oidend.split(".")[-1]:
to:
if int(index) == int(oidend.split(".")[-1]):
and thus forcing a conversion to integer values.
The next patch adds support for additional encoding schemes in the FC-1 layer, which are used for fibre channel beyond the speed of 8 GBit. Up to and including a speed of 8 GBit, fibre channel uses a 8/10b encoding. This means that for every 8 bits of data, 10 bits are actually send over the fibre channel link. The encoding scheme changes for speeds higher than 8 GBit. 16 GBit fibre channel – like 10 GBit ethernet – uses a 64/66b encoding scheme, 32 GBit fibre channel and higher use a 256/257b encoding scheme. Thus the wirespeed calculations of the brocade_fcport
service check are incorrect for speeds above 8 GBit. The following patch addresses and fixes this issue:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -283,15 +267,8 @@ output.append(speedmsg) - if gbit > 16: - # convert gbit netto link-rate to Byte/s (256/257 enc) - wirespeed = gbit * 1000000000.0 * ( 256 / 257 ) / 8 - elif gbit > 8: - # convert gbit netto link-rate to Byte/s (64/66 enc) - wirespeed = gbit * 1000000000.0 * ( 64 / 66 ) / 8 - else: # convert gbit netto link-rate to Byte/s (8/10 enc) - wirespeed = gbit * 1000000000.0 * ( 8 / 10 ) / 8 + wirespeed = gbit * 1000000000.0 * 0.8 / 8 in_bytes = 4 * get_rate("brocade_fcport.rxwords.%s" % index, this_time, rxwords) out_bytes = 4 * get_rate("brocade_fcport.txwords.%s" % index, this_time, txwords)
The third patch simply adds the notxcredits
counter and its value to the output of the service check. This information is currently missing from the status output of the check and is just added as a convenience:
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -408,9 +361,6 @@ summarystate = max(1, summarystate) text += "(!)" output.append(text) - else: - if counter == "notxcredits": - output.append(text) # P O R T S T A T E for dev_state, state_key, state_info, warn_states, state_map in [
The last patch addresses and fixes a more serious issue. The details are explained in the comment of the following code snippet. The digest here being, that the metric notxcredits
(“No TX buffer credits”) gathered from the SNMP OID swFCPortNoTxCredits
is being calculated wrong. This is due to the fact, that the brocade_fcport
service check treats this metric like all the other error metrics of a switchport (e.g. “CRC errors”, “ENC-Out”, “ENC-In” and “C3 discards”) and puts it in relation to the number of frames transmitted over a link. In reality though the definition of the metric behind the SNMP OID swFCPortNoTxCredits
is, that it is actually relative to time. This issue leads to false positives in certain edge cases. For example when a little utilized switch port sees an otherwise uncritical number of swFCPortNoTxCredits
due to the normal activity of the fibre channel flow control. In such a case, a relatively high number of swFCPortNoTxCredits
is put into relation to the sum of a relatively low number of frames transmitted over the link and again the relatively high number of swFCPortNoTxCredits
. See the last line of the code snippet below for this calculation. The result is a high value for the metric notxcredits
(“No TX buffer credits”) although the fibre channel flow control was working perfectly fine, the configured switch.edgeHoldTime
was most likely never reached and frames have never been dropped.
In order to address this issue, the following patch adds a few lines of code for a special treatment of the metric notxcredits
to the brocade_fcport
service check. This is done in the last section of the patch. The second section of the patch is just for the purpose of clarification, as it sets the metric that is being related to, to None
in case of the metric notxcredits
. The first section of the patch adjusts the default warning and critical thresholds to lower levels. The previous levels were rather high due to the miscalculation explained above. With the new calculation added by the third section of the patch, the default warning and critical threshold levels need to be much more sensitive.
- brocade_fcport.patch
--- a/checks/brocade_fcport 2018-11-25 18:06:04.674930057 +0100 +++ b/checks/brocade_fcport 2018-11-25 08:43:58.715721271 +0100 @@ -100,11 +100,11 @@ factory_settings["brocade_fcport_default_levels"] = { "rxcrcs": (3.0, 20.0), # allowed percentage of CRC errors "rxencoutframes": (3.0, 20.0), # allowed percentage of Enc-OUT Frames "rxencinframes": (3.0, 20.0), # allowed percentage of Enc-In Frames - "notxcredits": (1.0, 3.0), # allowed percentage of No Tx Credits + "notxcredits": (3.0, 20.0), # allowed percentage of No Tx Credits "c3discards": (3.0, 20.0), # allowed percentage of C3 discards "assumed_speed": 2.0, # used if speed not available in SNMP data } @@ -349,7 +327,7 @@ ("ENC-Out", "rxencoutframes", rxencoutframes, rxframes_rate), ("ENC-In", "rxencinframes", rxencinframes, rxframes_rate), ("C3 discards", "c3discards", c3discards, txframes_rate), - ("No TX buffer credits", "notxcredits", notxcredits, None), + ("No TX buffer credits", "notxcredits", notxcredits, txframes_rate), ]: per_sec = get_rate("brocade_fcport.%s.%s" % (counter, index), this_time, value) perfdata.append((counter, per_sec)) @@ -360,31 +338,6 @@ (counter, item), this_time, per_sec, average) perfdata.append( ("%s_avg" % counter, per_sec_avg ) ) - # Calculate error rates - if counter == "notxcredits": - # Calculate the error rate for "notxcredits" (buffer credit zero). Since this value - # is relative to time instead of the number of transmitted frames it needs special - # treatment. - # Semantics of the buffer credit zero value on Brocade / Broadcom devices: - # The switch ASIC checks the buffer credit value of a switch port every 2.5us and - # increments the buffer credit zero counter if the buffer credit value is zero. - # This means if in a one second interval the buffer credit zero counter increases - # by 400000 the link on this switch port is not allowed to send any frames. - # By default the edge hold time on a Brocade / Broadcom device is about 200ms: - # switch.edgeHoldTime:220 - # If a C3 frame remains in the switches queue for more than 220ms without being - # given any credits to be transmitted, it is subsequently dropped. Thus the buffer - # credit zero counter would optimally be correlated to the C3 discards counter. - # Unfortunately the Brocade / Broadcom devices have no egress buffering and do - # ingress buffering instead. Thus the C3 discards counters are increased on the - # ingress port, while the buffer credit zero counters are increased on the egress - # port. The trade-off is to correlate the buffer credit zero counters relative to - # the measured time interval. - if per_sec > 0: - rate = per_sec / 400000.00 - else: - rate = 0 - else: # compute error rate (errors in relation to number of frames) (from 0.0 to 1.0) if ref > 0 or per_sec > 0: rate = per_sec / (ref + per_sec)
Conclusion
Adding the three new checks for Brocade / Broadcom fibre channel switches to your Check_MK server enables you to monitor additional CPU and memory aspects as well as the SFPs of your Brocade / Broadcom fibre channel devices. More recent versions of Check_MK should already include the same functionality through the now included standard checks brocade_sys
and brocade_sfp
.
New Brocade / Broadcom fibre channel devices should pick up the additional service checks immediately. Existing Brocade / Broadcom fibre channel devices might need a Check_MK inventory to be run explicitly on them in order to pick up the additional service checks.
The enhanced version of the brocade_fcport
service check addresses and fixes several issues present in the standard Check_MK service check. It should provide you with a more complete and future-proof monitoring of your fibre channel network infrastructure.
I hope you find the provided new and enhanced checks useful and enjoyed reading this blog post. Please don't hesitate to drop me a note if you have any suggestions or run into any issues with the provided checks.
2018-11-18 // SAP Cloud Connector - Configuring Multiple Local Administrative Users
Out of the box, the SAP Cloud Connector does not support the creation of multiple local users for administration purposes through its WebUI. The standard and documented way provides only the use of LDAP in conjunction with an external user directory in case multiple administrative users are necessary. This in turn introduces an unnecessary overhead and external dependency. It also has rather strict limitations on the names of the groups (admin
or sccadmin
) that can used to authorize users for administrative tasks in the SAP Cloud Connector. There is a simple workaround for this limitation, which will be described in this blog post.
Since the SAP Cloud Connector uses an embedded Tomcat servlet container as an application server, it also uses some of the infrastructure provided by Tomcat. This includes the local file-based user directory. In order to use multiple, distinctly named administrative users, they simply need to be manually added to this local file-based user directory.
The first step though, is to create a password hash that is compatible with the Tomcat servlet container. This can be accomplished with the use of the Tomcat command line tool digest.sh
or digest.bat
. Depending on your operating system this tool might have other names, e.g. /usr/bin/tomcat-digest
on RedHat Enterprise Linux. On any system that has said command line tool available (e.g. on a RHEL 7 system with the RPM tomcat
installed), run the following command:
user@host:$ /usr/bin/tomcat-digest -a sha-256 '<PASSWORD>'
Substitute the placeholder <PASSWORD>
with the actual passwort that will be used for the new administrative user. The output of the above command will be in the form of <PASSWORD>:<HASH>
. For the following step, only the <HASH>
part of the output will be necessary.
The next step is to manually add the new adminstrative user to the local file-based user directory of the embedded Tomcat. This is achieved by editing the file config/users.xml
in your SAP Cloud Connector installation. In this example the installation was performed in the standard path for Linux systems, which is /opt/sap/scc
. Open the file config/users.xml
in an editor of your choice:
root@host:# vi /opt/sap/scc/config/users.xml
Alter its contents by adding a new line, like shown in the following example:
- /opt/sap/scc/config/users.xml
<?xml version='1.0' encoding='utf-8'?> <tomcat-users> <role rolename="admin"/> <group groupname="initial"/> <user username="Administrator" password="********" roles="admin"/> <user username="<USERNAME>" password="<HASH>" roles="admin"/> [...] </tomcat-users>
Substitute the placeholder <USERNAME>
with the desired username of the new administrative user and substitute the placeholder <HASH>
with the password hash generated above.
Restart the SAP Cloud Connector.
The newly created user should now be able to log in via the WebUI of the SAP Cloud Connector.
This kind of local file-based user directory is fairly simple to manage and to understand. I keep wondering why SAP would not document or even integrate this kind of user management in the WebUI of the SAP Cloud Connector. Another issue is the use of only one authorization role (admin
or sccadmin
), which is independent of the particular user directory used. This authorization role grants full administrative rights to the SAP Cloud Connector for a given user. A slightly more differentiated authorization scheme with roles for e.g. operations (connector restart and monitoring, no configuration changes) or monitoring (only monitoring, no connector restart, no configuration changes) purposes would be much more suitable for purposes in an enterprise environment.