Thursday, April 15, 2010

Optimizing the cost of monitoring on z/OS

One of the things I commonly work on with my customers is optimizing the cost of monitoring. When you think about it, there is an inherent balancing act between the cost of monitoring, and the analytic value of the information collected and presented by the monitoring suite. If you ever read the classic 1966 Robert Heinlein novel "The Moon Is A Harsh Mistress" (one of my favorite books when I was a kid), the saying was "Tanstaafl", roughly meaning "there ain't no free lunch". Monitoring and performance management methods may adhere to this saying, as well.

I often get asked questions like "what is the monitoring overhead for the tool?", or "what is the cost if I enable a given function of the tool?". The answer is the eternal running joke of all technology vendors, "it depends". The answer, while often being a bit of a dodge, is actually a truism. The cost and overhead of the tool is often in direct correspondence to how the user chooses to employ and use the tool.

Now that I've done the "it depends" statement, there are some general rules of thumb that may be applied. Having a long standing DB2 and relational database background, there are certain analogies I sometimes draw between database optimization, and monitoring optimization. One of the key ones is, the more data I request and the more data I store and/or act on, the will result often times be a higher cost of collection, and potentially greater overhead. If I'm coding a DB2 Select call that results in a large tablespace scan than pulls back a large multi-million row result set, that will usually run longer and cost more than a singleton select that does index access and only pulls out a single row.

You can apply the same logic to monitoring. From a realtime perspective, if I'm monitoring the MSR (millisecond response time) for thousands of DASD volumes on an ongoing basis, that will be more expensive than if I just look at a single volume in detail, as needed. From a history perspective, the more history data I gather, and the more data I store as a result, the result may be a higher cost of history collection. But let's not forget alerting. The more alerts, and the more information I alert on, and the larger the number of managed systems I alert on, the result will potentially be a higher cost of alerting.

What I plan to do over the next few weeks is do a series of postings on this balancing act. In essence to help you answer the question: "what is the cost of monitoring, versus the diagnostic value of the data?". I will be covering all the core OMEGAMON tools on z/OS, and we will look at real time collection, historical collection, and alert management.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.