||Add To My Personal Library
August 27, 2010
Vol.32 Issue 18|
Page(s) 38 in print issue
Solve Server Monitoring Problems
Challenges To Keeping An Eagle Eye On Data Center Assets
ï»¿ Administrators must assess how their equipment is running in order to make improvements, troubleshoot problems, or budget priorities for repairs or replacements. Because servers comprise the lionâ€™s share of data center equipment, it stands to reason that server monitoring is a key area for administrators who need to keep a watchful eye on data center assets.
But, monitoring is not as simple as capturing data on a few parameters and responding to alarms when they occur. Administrators must ensure server monitoring is effective and provides relevant, useful information. A key part of this effort is mitigating problems that can arise and interfere with effective server monitoring.
One of the problems associated with server monitoring is that many tools provide a flood of data but not much information. Without actionable information, administrators can waste precious time attempting to separate pertinent information from â€œnoise.â€
A key to solving this server monitoring problem is trending. Mark Hinkle, com-munity vice president at Zenoss (www .zenoss.org), says administrators only using server monitoring to handle â€œbreak-fixâ€ situations are unnecessarily affecting end users. Monitoring disk usage can point to capacity issues before outages occur, Hinkle says. For example, some monitoring solutions include trend analysis tools that use usage patterns to forecast points at which storage capacity limits are reached.
Steve Francis, founder and CEO of LogicMonitor (www.logicmonitor.com), says many systems rely solely on threshold-based monitoring and have few trending capabilities. Everything that is monitored, Francis emphasizes, should be trended. In fact, many things should be trended, without alerts, to provide information that can help resolve issues. Francis cites the example of a new application release: If application performance slows and triggers a monitoring alert, administrators should be able to assess whether the new release is causing a sudden increase in application performance, or whether the application has gradually slowed with increasing load.
Select Appropriate Monitoring
Administrators and data center personnel are well aware of the fact that monitoring tools can flood users with data. Some of this data may be useful; some may not. Another aspect to resolving this problem is not only to trend, but also to select appropriate monitoring metrics.
A big consideration for efficient and productive server management is to ensure monitoring tools only report on key indicators that provide the best information about server health, says Mike Alley, practice director for outsourcing solutions at Logicalis (www.logicalis.com). Most tools generate too many events out of the box, which can flood a monitoring console and take away focus from truly critical events, Alley says.
Good places to start, he adds, include performance metrics for CPU, memory, network, and storage. Administrators should also monitor events detected by hardware-level management products that come with a server, system log files, and system processes. Also, administrators should periodically review reported events and winnow out those that donâ€™t require remediation or have a negative impact on users. Of course, any undetected events that impact users should be reviewed, and specific indicators that detect this event should be monitored.
Kenneth Cheung, a solutions architect at Uptime Software (www.uptimesoftware .com), says it is very common to have a flood of metrics streaming out of various tools. The key is to find a monitoring solution that rapidly correlates outages and incidents to infrastructure and applications. In addition, he adds, the monitoring tool should point toward issues and/or infrastructure that require priority attention. With this capability, administrators can instantly identify which issue needs immediate attention.
A monitoring tool that doesnâ€™t use automation to streamline the alerting process can waste time by introducing manual processes that take time to complete. This can prolong an outage or make it worse.
Sending a page or other alert when a failure occurs usually results in the following chain of events: An administrator receives a page, logs on to the server, and then diagnoses the problem, Hinkle says. This process may take several minutes or longer. In many cases, he adds, the monitoring tool could be used to start a process that fixes the problem automatically. For example, he says, a monitoring tool could detect a server failure and use an automation tool to reboot the server, resulting in a faster recovery time.
â€œUnless your monitoring system automatically detects changes to servers, applications, and infrastructure, you donâ€™t have monitoring,â€ says LogicMonitorâ€™s Francis. The reason is there are so many changes done to servers and systems, often in the heat of crisis, that not all changes will make it into monitoring if administrators rely on a manual process.
Relate Monitoring To End Users
At the end of the day, the goal of server monitoring is to ensure critical business applications continue to run without problems. This means server monitoring, albeit indirectly, is related to the end-user experience.
UUptime Softwareâ€™s Cheung says administrators should start relating server and software metrics to end-user applications. Administrators need to have visibility into how servers are performing and how software is running on those servers, but the most important aspect is to connect those metrics to what the users care about, which is whether their applications are running. Taking an application-centric perspective, Cheung says, allows problem solvers to focus attention on what users are saying and allows for the creation of alerting and automated actions that are more relevant and targeted.
by Sixto Ortiz Jr.
Best Tip: Share The Wealth |
Mark Hinkle, community vice president at Zenoss (www.zenoss.org), says administrators should share the wealth and avoid be-coming the only ones who see monitoring data. For example, providing application developers with feedback showing how their products function can help them improve their applications. Showing resource utilization and other points of failure can help improve the design of hosted applications. For example, a Web application that leaks memory may work fine but negatively impacts utilization of hardware resources by increasing utilization.
Most Practical Tip: Use Reporting |
Kenneth Cheung, a solutions architect at Uptime Software (www.uptimesoftware.com), says administrators should use tools that can report and alert on all monitored data. This kind of tool can overcome the problems that arise when an event occurs and administrators are frantically fishing for problems in the infrastructure stack. A useful monitoring tool, he adds, should provide reports that facilitate ongoing initiatives across all different stacks (physical, virtual, cloud). Reporting saves time for administrators, preventing the need to run point tools to get specific metric streams and reduce the fishing needed to find application-specific, capacity-planning, or incident-prioritization issues.
Bonus Tips: |
Use mobile monitoring solutions. Administrators should take advantage of mobile network monitoring solutions so they can keep an eye on their network at all hours, even while on the go, says Dirk Paessler, CEO of Paessler (www.paessler.com).
Use Windows Management Instrumentation. WMI is a mighty tool for system administrators to monitor and manage a Windows network from one central point, Paessler says. WMI allows administrators to start processes, read the Event Log history, and even send commands to reboot a system for all computers in a network.