Exploring the Wonder: monitoring

Showing posts with label monitoring. Show all posts

Tuesday, January 15, 2013

Writing PowerCLI for Performance

I was building a piece of PowerCLU code to check on my vCD Deployed vAPPs for certain errors and because it’s me I was curious what performance difference there was between using array based operations verses piped operations in powershell. Here’s what I found out.

Here is the array based operation – time to completion 2 minutes 30 seconds.

$myVAPP = @(Get-CIVApp | Select-Object Org, Name, Status)
        foreach ($myVAPP.Name in $myVAPP)
            {
                IF ($myVAPP.Status -eq "FailedCreation")
                    {
                        Write-Host 'vAPP' $myVAPP.Name' - Status is' $myVAPP.Status
                    }
                ELSE
                    {
                            IF ($myVAPP.Status -eq "Unresolved")
                            {
                                Write-Host 'vAPP' $myVAPP.Name' - Status is' $myVAPP.Status
                            }
                            else
                            {
                                IF ($myVAPP.Status -eq "Unknown")
                                {
                                    Write-Host 'vAPP' $myVAPP.Name' - Status is' $myVAPP.Status
                                }
                                else
                                {
                                }
                            }
                    }
            }

Here is the pipe based operation which is recommended for reasons that I now can testify for first hand.

Get-CIVApp -Status "Unresolved", "FailedCreation", "Unknown", "Unrecognized" | Select-Object Org, Name, Status

This operation returned the same results as the array based operation in 14 seconds, that’s 9% of the original time of using an array…

Friday, January 11, 2013

vFabric Hyperic Server Heartbeat

I wrote a quick safety feature for my Hyperic server that monitors the Hyperic Server process every minute and if the Server stops or dies the script will attempt to restart it. Nothing fancy but it’s a nice little feature… plus you can change a couple lines and make one to monitor your Hyperic Agents as well…

#!/usr/bin/perl
# ---------- Hyperic 5.0.0 Server Heartbeat Check ------------
# This script is used to verify that the Hyperic 5.0.0 Server is running and
#    restarts it upon failure. To schedule it to run automatically every
#    minute on linux run the following:
#         crontab -e */1 * * * * <Path to this script>

# NOT SUPPORTED OR PROVIDED BY VMWARE AND HAS NO GUARENTEES

$cmd = "/opt/hyperic/server-5.0.0-EE/bin/hq-server.sh status";
$out = `$cmd`;
# print "Output $cmd\n";
# print "Output Check 1 - $out\n";
if (index($out, "HQ Server is not running") != -1)
        {
                # print "\nHQ Server is dead\n";
# Restart the HQ Service
                $cmd = "/opt/hyperic/server-5.0.0-EE/bin/hq-server.sh start" ;
                $out = `$cmd`;
                # print "Restarting the Service\n $out \n ";
        }
        else
        {
                # print "\nServer is running\n";
        }

Getting Started with vFabric Hyperic Monitoring and Alerting

During the past couple days I have been working with Hyperic to setup basic alerting functionality for things like disk space thresholds, Windows Services and memory usage by process. Here is a quick getting started to setup basic disk monitoring as well as an intro to the Hyperic Escalation Schemes which allow tiered alerting.

First off let’s create an Escalation Scheme. To do this go to Administration > Escalation Schemes Configuration.

Next Let’s go ahead and build the process of who gets alerted and when.

Now that we have setup the notification process let’s actually setup some alerts…For this example I will be setting up test disk space alerts on a subset of my servers. To get here you need to understand a few things:

1. Under the Resources Tab is where you are going to find all your things that can be monitored. Here comes the confusing part:

a. Platforms is the Server that your Hyperic Agent is running on.

b. Servers are things like .net, Apache Tomcat and MSSQL Server.

c. Services are things that you monitor like HTTP, Disks, CPU, RAM and Windows Services.

d. Compatible Groups/Clusters are groups of the same thing (ie, Disks only)/

e. Mixed Groups are groups that contain a mix of things like Disks and RAM.

Once you grasp this it will make your experience with Hyperic much easier.

Ok, so I have navigated to the Services Tab and filtered by “FileServer Mount” aka Disk information. Now I want to select my subset of disks and click on the Group button.

Next I click on "Add to a New Group” to create a new group. Because this group contains all like items it will create a new Compatible Group.

Now you should see your new group.

From there click on the Alerts tab and click on Configure

Now I want to create a new alert against this group of servers. My normal if statement is “IF more THAN 0 of the Resources” because I want alerts if any of them go down.

Here’s an important note: if you select Total Bytes Avail or several other metrics your alert will NOT work by default, here’s why:

Go to Administration > Monitoring Defaults and find FileServer Mount and click Edit Metric Template.

This should bring up a screen like the below. Notice that the Default On is set to No for the Total Bytes Avail… if you build your alert on that setting and don’t check to make sure that valid data is coming in you might be lulled into a false sense of security… Word to the wise, make sure your monitors are green after you create them.

Congrats, you are now ready to use vFabric Hyperic to start basic monitoring in your environment.

Tuesday, December 18, 2012

Agentless / Network Monitoring with Hyperic

So… I’ve installed Hyperic and am using it to monitor a bunch of stuff (Disk Space, RAM, CPU, Services) on boxes that I have installed the Hyperic Agent on but I really want to just do a simple ping against a bunch of network devices for network monitoring…. Looking at the UI it is not at all obvious how I would accomplish such a task. The answer is you have to run the job against an agent, and here’s how you do that:

1. Install a Hyperic Agent that is going to do the actual pings for you. In my case I just installed it on my Hyperic Server. NOTE: You must install the agent as root

2. Browse to that “Platform”, i.e. the sever you installed the agent on.

3. In the little “Tools Menu” dropdown select “New Platform Service”

4. In the next window give this monitor a name and select the Service Type of “InetAddress Ping”:

5. The next screen is going to display a little banner that says “The resource has not been configured”. Click on the “Configuration Properties” hyperlink.

6. On this last step provide the FQDN or IP of the device that you want your agent to ping.

There you go, you are now ready to monitor your network connectivity using Hyperic. One last piece of advice; you might want to change the test interval to a smaller number than the default. You can do that from Administration > Monitoring Defaults > InetAddress Ping (click on the Edit Metric Template on the right side).

Wednesday, March 14, 2012

Using VMware vCenter Operations Manager

As part of my day to day routine this morning I ran into a quick use case that offers a perfect introduction to VMware vCOPS and what it can do for your environment to help detect issues before they become issues as well as finding the root causes of performance problems. If you have never seen vCOPS before here is a funny video that pretty much explains what the product is all about: http://www.youtube.com/watch?v=mwYjwrE81eg

This example is a real life use case that happened to me this morning and demonstrates the value of real time, intelligent monitoring of dynamic environments.

As you can see from the below screenshot my vCOPS instance is monitoring 1700+ VMs. Some are in a production environment and some are in a lab environment. The really important part here is it took 3 seconds to recognize one of those 1,759 VMs had an issue... just 3 seconds...

Ok, so obviously red is not cool so let's click on it and see what information we are presented with...

A single click shows me where exactly the VM is located and also shows that this is an issue that is only effecting a single VM:

And I click on it again and I get all the information that matters to me: What is wrong, when it went wrong and what all is it affecting. In this case I see that 85 Anomalies were detected and the biggest indicators that something is wrong is that the CPU usage is up as well as the Memory. It also tells me that this machine has been working fine in the past and this is a new occurrence.

Ok, that's all nice and everything, but what is vCOPS actually looking at? Let's click on the Orange Anomalies Badge and see what comes up:

As you can see it has symptoms that it is alerting on and you can click on individual symptoms to get more details.

Interested in more details? How about letting you chose the metrics you want and getting them on a timeline? Sure! Just click on the All Metrics tab and you are presented with a list of metrics that are alerting and you can select the ones you want to get a pretty sweet datasheet like the one below:

So there you have it, how one piece of software can tell you what is wrong, what is affected and give you an idea of what needs done to fix it. All in a real-time, efficient and intelligent manner. The entire exercise took about 5 minutes to do a complete health check on 1,700+ VMs and figure out what was wrong with the one that I have covered here. If I can do the math right that means I did a complete health check on my environment at a rate of 586 VMs a second (totally ignoring the hosts and storage which were also checked) to figure out if I had an issue... and within a minute knew what was wrong with the VM having an issue... now that is pretty awesome!