After Y2K, all is not A-OK

greenspun.com : LUSENET : Grassroots Information Coordination Center (GICC) : One Thread

After Y2K, all is not A-OK
By Beverley Head
A year ago, the developed world was preparing for its greatest downtime challenge. The prospect of computer and communications networks failing because of a year-2000 software glitch galvanised big business and government into action - overhauling software, updating systems and creating contingency plans.

Hindsight proves that the year-2000 problem could be overcome, but that should not induce complacency among computer users.
Companies, governments and individuals are more reliant than ever on the uninterrupted operation of connected computers. They need reliable power, reliable telecommunications networks, reliable computer networks, reliable hardware and software, reliable security to thwart hackers and others, and reliable staff.
It is a long chain of reliance, easily snapped by one weak link. Analysts like to talk of information technology taking on the mantle of a utility. In some respects, this seems to be the case, particularly as more large corporations and governments develop outsourcing arrangements with IT vendors, and smaller organisations turn to application service providers (ASP) for many of their computing needs. The utility model can, however, leave customers even more vulnerable to downtime.
In November, 10,000 electricity users in the eastern suburbs of Sydney were without power after a fire in a substation. Two days later, EnergyAustralia was still experiencing problems in some areas of the city and there was a possibility that it would take "some weeks until permanent repairs are carried out and the network is in its most stable position".
Not only did domestic electrical appliances such as refrigerators and freezers stop working, leaving suburban garbage bins filled with melted icecream and evil-smelling chicken, but so did mains-connected computers and other electronic devices. Had there been a broader spread of suppliers of electrical services to the suburbs, the problem might have been reduced as neighbors shared fridges and hung power leads over back fences.
As information technology and telecommunications (IT&T) becomes more of a utility, there will be a greater chance of similarly widespread downtime potentially affecting entire regional or national economies, unless the networks of computers and communications underpinning these utility IT&T services are resilient and replicated.
The extent to which EnergyAustralia will compensate its customers is not known, but last month it was revealed in Federal Parliament that IBM Global Services Australia (IBM GSA) was forced to pay a service credit of $200,000 to one of its clients, the Health Insurance Commission, after a 20-hour service interruption earlier this year. IBM GSA is essentially the IT&T utility for the commission. A spokeswoman confirmed that the service credit was made, even though the disruption had not led to "business failure" because the commission simply reverted to manual processes for the 20 hours.
But manual processes cannot be used when a business relies on the internet for communications with other companies and consumers. It was not business as usual for many internet users in November, when the SEA-ME-WE3 (South-East Asia-Middle East-Western Europe) undersea telecommunications cable was damaged. The problem was fixed, and much traffic re-routed to other cables, but not before access had been disrupted for many users.
On the same day that the telecommunications cable was damaged, the Australian Stock Exchange (ASX) suffered its own, unrelated, systems problems for more than three hours. The previous month, chairman Maurice Newman, in his address to the annual general meeting, had said that in 1999 "markets were maintained at levels of approximately 99.8% availability for the year".
With the ASX having also suffered a communications glitch in September, which halted usual trading for about an hour, it is not clear whether it can match that 99.8% performance this year. November's downtime led to ASX equities trading being suspended, because of a computer hardware fault, between 12.16pm and 3.25pm.
Hardware had also caused a problem for Westpac the previous weekend, when it closed its internet banking site because of a hardware upgrade glitch. The shutdown had the potential to affect all of the bank's 470,000 internet banking customers.
All these events took place in a single month, and represent only a fraction of the problems experienced across business and government. Many hours of downtime go unreported and over a year the effects are much greater and costlier than generally believed.
How much one weekend's internet shutdown cost Westpac will probably never be known. It is possible, although unlikely, that the bank may lose some customers to rivals over the incident. Repeated failures, however, would assuredly lose it some loyalty and, ultimately, revenue.
Although Westpac only had to shut down its internet site, a firm that specialises in disaster recovery, Contingency Planning Research (CPR), says that the average cost of downtime to a retail bank is $US1 million an hour, and for a retail brokerage, an hour's downtime can cost a staggering $US6.5 million. It would not take long to go out of business at those rates.
Business had a bellyful of contingency planning in 1999 as it prepared for the new millennium, but there is a continuing need for vigilance. Admittedly, computer hardware is increasingly fault-tolerant and communications networks have a high degree of redundancy built in, but nothing boasts a cast-iron guarantee.
Before hiring an application service provider or signing an outsourcing partner, or even installing a new server, organisations need to recognise the tightrope they tread and explore their safety net options.
http://www.brw.com.au/stories/20001208/8256.htm

-- Martin Thompson (mthom1927@aol.com), December 08, 2000

Moderation questions? read the FAQ