Friday, January 29, 2010

System z - Technology Summit

I will be presenting on IBM Tivoli Monitoring and the most effective uses of the TEP to manage z/OS as part of the System z - Technology Summit on Tuesday, February 9th in St Louis. This technology summit looks to be a nice event with multiple tracks on System z, Database, Application development, and of course Systems Management (that's me). If you're in the St Louis area, we would love to have you attend.

Here's a link on information, and how to register:

http://www-01.ibm.com/software/os/systemz/summit/

The System z - Technology Summit will also be in the following cities:
February 2, 2010 - Atlanta, GA
February 9, 2010 - St. Louis, MO
February 11, 2010 - New York, NY
February 18, 2010 - Toronto, Ontario
March 4, 2010 - -Chicago, IL
March 9, 2010 - Dallas, TX

Thursday, January 28, 2010

Take advantage of custom queries


Earlier I had shown some examples of what I call Management By Exception displays, one example of an IMS view and another that showed an integrated z/OS view. One portion of the z/OS management by exception workspace showed DASD with high MSR times. Because in many shops you may have thousands of DASD devices, you may want to consider your options when displaying DASD information.

The data that appears in the various tiles of the workspace are gathered via a mechanism called a query. When creating a workspace that includes queries that can request potentially large amounts of data, it may be more efficient to create and use custom queries to make the rendering of the workspace more efficient. The DASD High MSR portion of the workspace is a good example of this technique.

If you look at the example, you can start with the basic default query, and then click 'Create Another' to make a new query to optimize. You can then specify a filter option, as in this example, to get information only for DASD devices with MSR times higher than 8 ms. You can then click 'Advanced' and get a pop up to specify the data sort sequence, and also in this example we are only asking for the 10 worst devices.

By filtering at the level of the query the workspace will render more quickly and efficiently. use this technique for displays that may request large amounts of information.

Wednesday, January 27, 2010

The value of persistence


Persistence has value, especially when we are talking about monitoring and situation alerts. When creating situation alerts, different monitored attributes have different characteristics. When choosing attributes to use for creating situations, keep in mind that some attributes may be more volatile than others.

In other words, some monitored information may be more "spikey". For example, certain resources, such CPU rates, page rates, I/O rates, and DASD MSR times may fluctuate quite a bit during normal workload processing. It's often times useful to monitor, and perhaps alert on these types of items. But, you don't necessarily want to send out a bunch of alerts every time CPU usage or page rate spike up.

That's where the persistence option comes into play. Like in the example I show here, in the situation editor you click on the 'Advanced' button. This will show the persistence option. The default is one interval, but you can click this option to set the persistence option higher. In the example I show a persistence option of 5. What that means is the situation logic has to be true for 5 intervals before the situation will fire. This has the net effect of helping tune out spike alerts, and reducing the occurrence of phony alerts.

Tuesday, January 26, 2010

Check out this upcoming webcast

There is an interesting new webcast coming up shortly, on January 28th. "Exploring your z10 and zOS systems with OMEGAMON XE for zOS" will talk about some of the powerful features of OMEGAMON z/OS.

Among the topics covered will be how to use OMEGAMON to: Monitor processor usage at the system and address space level, Manage resources and workloads across LPAR clusters, Identify service classes that are missing their performance goals, Monitor enqueues across systems and Sysplexes, Identify and ease bottlenecks that prevent a workload from achieving its service goal.

The speaker is Joe Winterton. Joe knows his stuff, so I'm sure this will be an excellent event. Here is the URL to sign up and attend:

http://www-01.ibm.com/software/os/systemz/telecon/28jan/

Thursday, January 21, 2010

More on ITM 6.22

ITM 6.22 is the latest/greatest version of the IBM Tivoli Monitoring (ITM) infrastructure. While I presume you will find quite a few users running at the ITM 6.21 level, ITM 6.22 provides some interesting new features, some changes in how the TEP looks, and some useful new functons.

Here's a list of some of the enhancements provided by ITM 6.22:

The toolbar icons have been moved around in ITM 6.22

Historical data collection has been enhanced
The Historical Collection Configuration window has been redesigned, and looks quite a bit different. There is a new distribution method called Managed System (TEMA) that enables you to specify the managed systems that data collection will occur on. This is a biggie in that it gives you a lot more flexibility and control in terms of how and where history data will be gathered. There are also options for granular data collection with historical configuration object groups. Again, more granularity and control.

Managed system lists renamed to managed system groups

Modeling conditions for situations
Allows you to capture data from a query and use it to model possible threshold scenarios for situations.

Baselining added to charts for trend analysis
Bar charts, plot charts, and area charts have a new baseline tool. You now have support for such things as Statistical baselines, and Historical baselines.

Agent autonomy
Agent autonomy is the ability of a monitoring agent to execute independently of the TEMS. Aspects of this have been around for quite a while in the infrastructure. This just takes it to a new level in terms of having private situations and private history colelction that runs independent of the TEMS infrastructure.

Send SNMP traps
Send SNMP alerts from a TEMA directly to a receiver, without ever connecting to a monitoring server (pretty cool).

These are some interesting enhancements. If anyone is using ITM 6.22, I would like to hear about your experiences.

Wednesday, January 20, 2010

Integration of IBM System Automation (SA) and ITM

In prior posts I've discussed the Tivoli Enterprise Portal as an integration point for a variety of IBM Tivoli solutions: mainframe monitoring, distributed monitoring, ITCAM, storage solutions, NetView, you name it. One of the more interesting integration points is how IBM SA, System Automation, integrates with IBM Tivoli Monitoring (ITM) infrastructure. In IBM SA the INGOMX command has been enhanced to support sending commands in the form of XML requests to the SOAP server function of the TEMS (Tivoli Enterprise Monitoring Server). What this means is that from SA you can issue commands such as:

CT_Alert - Raise an existing situation
CT_Reset -Close a situation
CT_Acknowledge - Acknowledge a situation event
CT_Resurface - Change an acknowledged event to open
CT_WTO - Send a Universal Message
CT_Activate - Start a situation or policy
CT_Deactivate - Stop a situation or policy
CT_Execute - Execute a file on the TEMS server

This provides a very nice interface that allows SA to interact very directly with the monitoring infrastructure, and allows for more sophisticated and robust automation. The SOAP interface of the TEMS is a powerful and interesting feature that is well worth looking at.

Tuesday, January 19, 2010

More on OMEGAMON XE For CICS V4.2

I mentioned a while back that OMEGAMON XE for CICS V4.20 is now GA. With OMEGAMON CICS now at the V4.20 level that means all the core OMEGAMONs on z/OS have seen new versions be released in the past year.

Some of the new features of OMEGAMON XE For CICS V4.2 include:

Dynamic Terminal Integration Support. This is an extension of the Tivoli Enterprise Portal (TEP) client terminal emulator feature that allows you to launch from a Tivoli Enterprise Portal workspace to a related 3270 screen space in the target application. Predefined links have been provided with OMEGAMON XE for CICS on z/OS to OMEGAMON II for CICS screens.

Event forwarding and synchronization for Tivoli Enterprise Console® and Netcool/OMNIbus.

Improved migration. You can combine V3.1.0, V4.1.0 and V4.2.0 agents in your environment during migration.

ICAT Configuration tool improvements.

OMEGAMON XE for CICS on z/OS now contains the capability to monitor CICS Transaction Gateway V7.0.0 and higher for the mainframe. Now CICS TG support is bundled in the core OMEGAMON CICS tool, you don't need to purchase a separate tool to monitor CICS TG.

Consolidation of SMP/E FMIDs for OMEGAMON CICS installation.

Here's a link to the documentation for OMEGAMON CICS:

http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/index.jsp?topic=/com.ibm.omegamon.cics.doc_4.2.0/welcome_OMXEforCICS.htm

Friday, January 15, 2010

The return of the mainframe

An interesting article in the Economist magazine, of all places, on the IBM mainframe. The title is "The Return Of The Mainframe". It's a short, but interesting read. My only thought is, did the mainframe ever really go away? I've been in this business more years now than I care to admit to, and the mainframe has been purring away the whole time.


http://www.economist.com/businessfinance/displaystory.cfm?story_id=15276714

More on using policies in the TEP


Paul had a very good comment on the example that I used on policies managing the start/stop of situations. "ITM V6 has a new "Activity" located in the "Extensions" tab. It is called "Wait until a situation is False".I would add this to the policy after it starts the situation. Have it wait until situation EW_check_Prime_Time is False. This way , it will start the situation once and then wait. Without this, it will restart the situation each interval. (Interval of the situation EW_Check_Prime_time). This has two benefits. First, less overhead when you don't restart the situation every interval. Second, if the situation EW_Demo_DB2_Alert is true, restarting it would re-drive the event and make it fire True again."

Paul brings up a very valid point. If you referenced my article on policies (link on the right side of the page), I mention the loop aspect of policies and how it works relative to the interval defined on the situation. To control this loop mechanism, there is a feature I havent't really gone in to up until now, the "Wait Until xxx False" option. The example I show here demonstrates how it would be set up. In this example, we do the situation check, issue the command, then wait for the situation to go back to false before going back around to the start at the beginning of the policy again. This wait option is available from the "Extensions" tab in the policy editor.

Thursday, January 14, 2010

Another example of management by exception


Here's another example of a management by exception display. Monitoring IMS with the management by exception approach works well. Often when I'm monitoring an IMS system I'm looking for such things as workload that is queuing, and resources that have failed or stopped/abended altogether. The example I show here is a typical example of an IMS display I might create using OMEGAMON IMS and the TEP. In this example I show message regions over a certain percentage busy, transactions that have queued, and PSBs and databases that have stopped or are in a problem status. Just like with the other example, I'm focusing on potential problem scenarios, and tuning out the background 'noise'.

Friday, January 8, 2010

Be sure to set your JRE settings when using the TEP

If you are logging on to the Tivoli Enterprise Portal (TEP) and you get this informational pop-up, don't tune it out and just click 'Continue'. It's telling you that you need to set some of your Java JRE settings for optimal use of the TEP. If you click on 'Help' it will take you to the online documentation that will spell out exactly what you need to do with the Java JRE settings. While it may not seem like a big deal, it can make a big difference in how well the TEP runs on your desktop.


Thursday, January 7, 2010

Use Response Time (RTA) groups with OMEGAMON IMS and CICS

OMEGAMON XE for IMS and OMEGAMON XE for CICS provide powerful and useful Response Analysis and Bottleneck Analysis features. OMEGAMON XE for IMS and OMEGAMON XE for CICS each provide a mechanism to group together related transactions and programs, based upon the needs of the user and the nature of the applications being monitored. What this means is the user is able to optimize OMEGAMON to monitor and analyze CICS and IMS workload from the application perspective.
For example, the picture shows an example of how OMEGAMON IMS Bottleneck Analysis data is shown for transactions and programs related to the users ATM application. This means that when there is an issue with the ATM application, the user is able to determine the source of the potential bottleneck more quickly and precisely.

Definition of groups is driven by macro mechanisms, such as KOIGBLxx in OMEGAMON IMS, or may be specified in real time using OMEGAMON commands for 'on the fly' analysis.

The recommendation is to take advantage of this feature of the product to add more application relevance to the performance data. Many users simply do not exploit this feature.


Tuesday, January 5, 2010

Exploit policies to manage OMEGAMON situations


Policies extend alert command and control concepts established with OMEGAMON situations and add additional functionality to the Tivoli Enterprise Portal. While situations remain the essential starting point for alerts and automation, policies add essential function and flexibility to situation capabilities. Policies are probably one of the potentially most powerful, and yet most under utilized features of OMEGAMON and the Tivoli Enterprise Portal.

Policies are not a replacement for an automation engine, such as System Automation or AF/Operator. But they do provide a valuable extension to the command functions of situations, and enable the user to do things like issue mutiple commands, or sequences of commands based upon a situation being true. And, what is nice is this can be done using a GUI interface, and does not require coding things such as REXX code to get the job done.

One very interesting use of policies is as an overseer mechanism to manage the start/stop of situations within the Tivoli monitoring environment. Some situation alerts are sensitive to certain times of day or day of week considerations, due to operational or off-hours processing issues. In addition, some issues may be critical during prime time and less critical during off hours. You can reduce monitoring overhead and eliminate unnecessary alerts by running situations only when needed. The picture shows an example of an ‘Overseer’ policy, which manages situation start and stop. Using an overseer policy can simplify the coding and maintenance of the underlying situations, because the policy will be able to handle the time sensitivity logic.
If you want to find out more about policies, you can check out my article on policies (see the link on the right side of the blog page).