Wednesday, March 16, 2011
Using correlated situations to make alerts more relevant
I was working with a customer recently who had an interesting requirement. In their shop they wanted an availability alert to track the status of FTP on their z/OS LPARs. Starting out we created a basic situation alert that used the 'Not Found' function to check to see if the FTP task was active on the system. If not there, then the situation would fire.
But the customer wanted to add a little more logic. If FTP was down on one system, that was a warning level alert. However, if FTP was down on both systems, that constituted a critical problem, so they wanted to see a critical level alert in that scenario. I considered various ways to get this accomplished, then I proceeded with the following, which I felt was the easiest way to get the job done.
First, we started with the basic situation check for the FTP task, and distributed that alert to each z/OS managed system (as I show in the example). These situations used the 'Not Found' function to see if FTPD1 (the FTP task) was not active on the system. We set the level for this situation to warning. The basic situations are shown here in the first part of the example.
We then added a second correlated situation. This second correlated situation would check to see if the basic not found situation was true on both systems. If it was true on both, then the correlated situation would fire, and show an alert level of critical. Here I show an example of the correlating situation, as well. Note that the correlated situation is distributed to the TEMS, where it needs to be evaluated.
There's a little bit more magic that has to happen to make the situation fire, but these are the first steps. I'll provide the additional detail shortly.