Say the word "downtime," and most people might think of weak wireless signals, network snafus, or equipment trouble. According to research firm Gartner Group, 80% of all unplanned downtime is caused by people and process issues, not hardware and software failure.
For SMEs (small and medium-sized enterprises) that have limited budget resources and overtaxed staff, throwing money at infrastructure improvements with products and services might solve the problem but usually doesn't present a long-term solution.
Instead, when downtime seems rampant, it's important to take a step back and understand that downtime is about more than a company's network.
"You have to have a holistic view," says Troy Dixler, director of network solutions for Forsythe, a technology and business consultancy. "Instead of stovepiping the network guys and making them responsible for small, specific tasks, you have to look at the connections among all the folks involved in making sure there's no downtime."
To prevent downtime, Dixler recommends a reorganization of IT departments to make technology professionals think more like a service group. That means that rather than assigning one person to network maintenance, another to storage, and another to tech support, an IT team should be unified so that communication can be shared more readily.
As a team, IT will be more effective at capacity planning, says Dixler. "You have to think strategically rather than tactically," he notes. "That requires strong management, and it's not easy. People are resistant to change, and it takes some time to make them see the importance of breaking old habits."
Dixler points to a model that's working well in Europe: the ITIL (Information Technology Infrastructure Library), created by the British government. A set of best practices for delivering IT services, the ITIL is an approach to the management and delivery of IT services. It currently boasts more than 100,000 certified professionals, mainly in Europe, Australia, and Canada.
The ITIL frames all IT activity under two areas: service support and service delivery, without a focus on technology itself. Some large U.S. companies have adopted the practices, including IBM, Caterpillar, and the IRS.
Even if a company doesn't want to jump into a full-scale model such as the ITIL, creating a more service-oriented IT department can help prevent downtime because it addresses the larger needs of a company, rather than just its component parts. "Downtime is as much about business as it is about IT," says Dixler. "Understanding links within a company helps to create a reliable infrastructure."
When operating on a more nitty-gritty tactical level, SMEs can combat downtime by understanding how their networks are being used and by documenting network flows. If certain individuals within the company seem to be plagued by downtime, IT should examine not just each specific case, but how they might be related to each other and to the larger company. Establishing a reporting procedure that links these incidents can keep bigger problems from appearing in the future.
Also vital for ensuring network health is to establish what's normal, as opposed to what's wrong, according to Douglas Smith, president of Network Instruments. He suggests examining networks and processes when everything is flowing smoothly and well.
He says, "Asking how you can prevent network downtime is like saying, ‘How can I prevent auto accidents?'." Although such incidents can't be completely eliminated, they can be examined to find out what went wrong and what can be improved to prevent them from happening again.
"If you use the proper processes and tools, you can't prevent downtime," says Smith. "But you can make it easy to see a small problem that could potentially become a larger problem." He suggests implementing a regular maintenance procedure that uses trending tools to create a baseline, which can be compared to a net work that seems to be causing problems. If everything looks the same, then technology can be eliminated as a downtime factor.
Establishing a solid process for monitoring, with appropriate schedules, is necessary for network health because it takes a company's specific network needs into consideration. "Every network is different, just like every company is different," Smith notes.
He adds that implementing infrastructure-monitoring techniques can give an IT department a good view of when downtime is caused by technology and when it's not. He says, "Putting relevant processes into place is just as important as making sure your equipment is up-to-date."
As a company works to streamline its management and processes, there are also some products that can come in handy. A tool that's growing in popularity is the ITVerify from Innovativ Systems Design which can provide infrastructure monitoring that's designed to help companies alleviate unplanned downtime.
The product gives companies a centralized view of system, network, and application information and provides change surveillance and asset management, as well as policy-based compliance. Innovativ CTO Dave Nocera says, "Our purpose is to simplify someone's job. We took the concept of agentless technology and configured it to improve how IT can solve its problems."
Infrastructure monitoring services such as ITVerify can help to automate downtime prevention because they bring together what used to be separate applications, such as asset control and environment virtualization. Several options exist depending on a company's needs and budget. Some, such as BMC Software Infrastructure Management's services, target specific areas; BMC addresses mainframes and distributed systems. Others are designed for certain industries, such as HP's OpenView TeMIP, a set of assurance applications for mobile providers.
Some products, such as IBM's Tivoli, include such features as part of a larger suite of services, but Nocera notes that SMEs usually skip such high-end offerings due to cost.
That could be a risky move, Nocera adds, with the abundance of regulatory issues that have been cropping up. "Making sure a network is always up is simply good business," he says. "Beyond mak ing a company compliant with regulations, it reduces business risk and improves security."
by Elizabeth Millard
Operator Error |
According to ITresearch firm Gartner, unplanned downtime can be the result of poor IT strategy. The firm suggests these steps to reduce the problem:
1. Mature operations to a more process-oriented and documented approach that doesn't require that specific people be available to perform tasks.
2. Hire competent people and train them, as well as vendors, on the company's specific IT process and procedures.
3. Automate the process wherever possible to reduce the chance of errors.
4. Improve change and problem management processes related to IT infrastructure and facilities.