Embedded System Failures

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Examples of Failures - Embedded Systems

While most embedded systems suppliers are working to produce compliant systems, the sheer number of different systems any one manufacturing concern must deal with, in a short time, is daunting. Find them all and fix them in time.

--------------------------------------------------------------------------------

Keywords: Embedded Systems, Y2K, embedded chips, microcontrollers, microchips, chips, data acquisition, SCADA, programmable logic controller (PLC), process control, manufacturing automation,

--------------------------------------------------------------------------------

Chemical Safety and Hazard Investigation Board - PIPELINE SAFETY ADVISORY BULLETIN July 7, 1999 The Office of Pipeline Safety, U.S. Department of Transportation, has issued a Pipeline Safety Advisory Bulletin following a June incident in Washington State which claimed three lives.

Background: During an Office of Pipeline Safety (OPS) investigation of a recent pipeline incident, OPS inspectors identified inadequate SCADA performance as an operational safety concern. Immediately prior to and during the incident, the SCADA system exhibited poor performance that inhibited the pipeline controllers from seeing and reacting to the development of an abnormal pipeline operation Preliminary review of the SCADA system indicates that the processor load (a measure of computer performance utilization) was at 65 to 70 percent during normal operations. Immediately prior to an upset condition occurring on the pipeline, the SCADA encountered an internal database error. The system attempted to reconcile the problem at the expense of other processing tasks. The database error, coupled with the increased data processing burden of the upset condition, hampered controller operations. In fact, key operator command functions were unable to be processed immediately prior to and during the abnormal operation. It is possible that post installation modifications may have hampered the system's ability to function appropriately. The combination of the database error, the inadequate reserve capacity of the SCADA processor, and the unusually dynamic changes that occurred during the upset condition, appear to have combined and temporarily overburdened the SCADA computer system. This may have prevented the pipeline controllers from reacting and controlling the upset condition on their pipeline as promptly as would have been expected. For further information, contact Chris Hoidal, Director, OPS Western Region at 303-231-5701

-- G Bailey (glbailey1@excite.com), November 21, 1999

Answers

prevented the pipeline controllers from reacting and controlling the upset condition on their pipeline as promptly as would have been expected. For further information, contact Chris Hoidal, Director, OPS Western Region at 303-231-5701.

The Institution of Electrical Engineers (IEE.org.uk) The Millennium Problem in Embedded Systems - Casebook

Much has been written about Year 2000 problems in embedded systems, but the emphasis has been principally on the process of investigation, with little information about real cases of failure. While the incidence of Year 2000 problems in embedded systems has been found to be relatively low, the impact of the problems has in some cases business been business threatening. Action 2000 in conjunction with The Institution of Electrical Engineers has undertaken a data collection initiative to collate facts about actual Year 2000 failures in a wide range of embedded systems. Action 2000 through the IEE requested leading consultant engineering companies in the UK to list the occurrence of actual faults found in equipment. Because of the range of specialisms and industries worked in by these companies, a good representative sample is thought to have been found. - AEA Technology, BSC Consulting, ERA, IBM, ICS, Real Time Engineering, The Houndscroft Partnership The equipment categories used in the collection of data for the non- computer entries (60% of the total) were: Logging / monitoring Other PLC SCADA Smart Instruments Stand alone instrument The areas reported as having most problems with non-computer based systems are (in decreasing order): Calibration, monitoring, data logging, detectors, analysers Building management, including HVAC, fire and security systems Manufacturing and process systems (SCADA, PLC, DCS) Telecommunications and networking Other The dates which caused the problems were: millennium rollover 71% leap year problem 9% multiple date problems 6% other dates or unknown 14% Tava Technologies : A White Paper that Discusses the Significance of the Effect of the Millennium Bug (Y2K) on Process Control, Factory Automation & Embedded Systems in Manufacturing Companies. Feb 98. (pdf) "

"To date, with plant floor Y2K experience at over 400 sites, the company has yet to find a single site that did not require some degree of remediation; and, to date, having researched tens of thousands of manufacturing automation systems and components for Y2K readiness, the company has found more than 20% to be either non- compliant or "suspect", that is non-compliant under certain circumstances." Problems range from major operational nuisances to erratic production shortages to complete plant shutdowns. But, perhaps the worst case of all will be systems that continue to work but make bad decisions effecting product yields. It may be on January 1, 2000, or it may be days or even months later." Industry Wakes Up to the Year 2000 Menace Fortune article

Ralph J. Szygenda, chief information officer at General Motors, whose staff is now feverishly correcting what he calls "catastrophic problems" in every GM plant. In March the automaker disclosed that it expects to spend $400 million to $550 million to fix year 2000 problems in factories as well as engineering labs and offices. Rob Baxter, Honeywell's vice president in charge of making his company's line of industrial control products "year 2000 compliant" From what he has seen among Honeywell customers, Baxter fears that "some plants will have trouble operating and will have to shut down. Some will run at a reduced scope. I expect considerable system outages during December 1999 through February 2000." Manufacturing's task is compounded by the multiplicity of its computer programs. Below the layers of more or less standard software is a vast range of equipment run directly by built-in chips and programs, which outnumber those in the rest of business by a factor of ten. General Motors - "At each one of our factories there are catastrophic problems," says the blunt-talking executive. "Amazingly enough, machines on the factory floor are far more sensitive to incorrect dates than we ever anticipated. When we tested robotic devices for transition into the year 2000, for example, they just froze and stopped operating." Only a few companies offer software that can deal with factory problems. Among them are Raytheon Engineers & Constructors, Fluor- Daniel, and Peritus Software Services of Billerica, Mass., as well as the service operations of companies that sell industrial controls, such as Foxboro and Honeywell. Tava Technologies. Its Plant Y2kOne software includes a database on 10,000 microprocessors, related control devices, and software from more than 1,000 vendors that is used on the factory floor. Among other things, Plant Y2kOne can check out software in robots, PCs, and PLCs; operating systems such as Unix, DOS, and Windows NT; and embedded software such as a program used to guide automated vehicles. Leap-year snafus damaged production lines when programmers failed to account for the extra day in February 1996. At a small U.S. manufacturer of industrial solutions that prefers to remain unnamed, production ground to a halt on Jan. 1, 1997. Before workers could remedy the situation, the liquids hardened in the pipelines, which had to be replaced at a cost of $1 million. That caused late deliveries and the loss of three customers. A similar leap-year oversight caused $1 million of damage at Comalco's aluminum refinery in Tasmania, when controls at all smelting-pot lines shut down, damaging five pot cells beyond repair. Year 2000 Problem Sightings ( http://info.cv.nrao.edu/y2k/sighting.htm ) Excellent source for general Y2K failures

report Anesthesia machines non-compliant - supplier tries to sell new systems report Congressional Subcommittee survey Phillips Petroleum Y2K test - an oil rig hydrogen sulfide detector system stopped working. Chrysler plant lock out NORAD Y2K - total system blackout Cara Corporation Embedded Systems Specialist David C. Hall stated that there are over 40 billion microprocessors worldwide, and anywhere from one to ten percent may be impacted by the date change. Hall described an oil company that has determined the need to replace thousands of chips controlling an oil dispensation system. The chips, he said, do not fit on the existing motherboards and new motherboards do not fit into existing valves. As a result, the valves themselves will have to be replaced, Hall said report Users Demand Y2K Lemon Aid, Control Magazine Y2K failure rate in semiconductor plants - 3.3 billion micro- controllers embedded in the automation infrastructure, 50 million will have Y2K anomalies. As a reference point, Woll reviewed the Dept. of Defense Year 2000 project inventory report. He said of 3,962 applicable systems, 582 were OK, 623 were being renovated, 628 were retired, and the balance of 1,900 was being assessed. The numbers suggested that about 25% of all the systems would require some level of fixing. Patrick Meehan, Y2K program manager, DuPont Operations, presented the large-user perspective. "Let's face it, there's not much upside and a lot of downside," he offered. He sees that 50% of DuPont's work will be with process control devices and systems and his current estimate is that, while 100% will be examined, 10-15% will need remediation. "Towards the end of 1998, those who haven't yet worried about Y2K will find themselves forced to. If they don't, Y2K becomes the best thing that happened to lawyers since divorce." http://www.xs4all.nl/~zooko/Y2k-real-life.html

full story General Motors tested robotic devices - they "just froze and stopped working" full story control valve for generator cooling integrated over time for smoothing full story Chrylsler plant test locks the doors on testers We're pretty sure our first tier will work," Chrysler President Thomas Stallkamp said of his company's largest suppliers. "It's the second and third and fourth tier who supply not just our industry but others. As you get further down the food chain, you've got a guy making widgets for us as well as for Boeing and Maytag, and those guys are the ones we're worried about." "We got lots of surprises," said Chrysler Chairman Robert Eaton. "Nobody could get out of the plant. The security system absolutely shut down and wouldn't let anybody in or out. And you obviously couldn't have paid people, because the time-clock systems didn't work." http://www.euy2k.com/reallife.htm

a power plant in the United Kingdom - control valve for generator cooling is integrated over time for smoothing ITRON meter reader decks and associated upload/download equipment fail on 2000 NRC-NEI Meeting (If a plant can be shut down because flooding prevents proper emergency response, then Y2K failures of emergency procedures could require shutdowns) details Hawaiian Electric Company Western Power - Many of the control systems represented in power systems, have dates associated with them. These could be reclosers, Voltage regulators, Governors, PLCs etc. The list is endless. You then have a swathe of actual 'applications' involved in the delivery of electicity such as your Distributed Control Systems and your SCADA (System Control and DATA (eg.dates) Acquisition) systems, all of which have dates associated with them. Much of what happens throughout the process of generating and delivering electricity is 'DATE AND TIME STAMPED' http://www.sysmod.com/embexamp.htm

North Sea Expro (Shell-Exxon JV) Platform, Pipeline and Gas Plants - 12% failure rate Alcoa Steel Plants : 50% of control systems will fail BP Refinery - vendor not found for 20, 3 will fail, 2 will cause shutdown Capelrig Millennium Test Centre for Shell demonstates how failing system controlling an oil rig pump would float the platform oil rig typically has 8000-10000 embedded systems details Hawaiian Electric Company energy management system (EMS) failure would haveresulted in HECo's transmission network crashing, and by default, a major power outage and loss of all generating capacity Programmable thermostats fail, one cannot be restarted. Chip failure would cut off cooling system and cause explosion in chemical plant Fossil power plant control and downstream PLC clock mismatch would trip plant Gas pipleline metering failure PLC's locking up due to Year field overflow Sewage controls fail to track tide tables properly http://www.year2000.com/archive/similar.html (Computer problems similar to Y2K)

telephone outage that occurred in New York on September 17, 1991 Gulf War Patriot missile system had an unrecognized clock drift over a 100-hour period - tracking error of 678 meters the software for the F-16 fighter would cause the plane to fly upside down whenever it crossed the equator Berlin 1993, two trains collided - the track was set on the holiday two-way traffic setting Cement factory chip failure drops rocks on cars 99 year old man's blood count judged by infant norms In Colorado Springs one child was killed, another injured - the traffic light systems continued in weekend mode and ignored the school schedule -failure getting the time transmitted to them from the atomic clock in Boulder Several leap year problems noted including aluminum smelter http://www.granite.ab.ca/year2000/incidents.htm

The Tiwai Pt, New Zealand] aluminium smelter, PCMH Biomedical Department - Hamilton ventilator failure UK National Health Service problems Credit card failure "a major, catastrophic problem" in ICBM launch controls Bank merges due to Year 2000 problem Robot has the wrong date Therac-25 X-ray system kills six patients (Details on non- y2k "software" problem More Visa card problems

---------------------------------------------------------------------- ----------

Embedded SystemS Problem (ESSP) Ltd

Embedded systems are used extensively to control and monitor engineering and manufacturing processes. They underpin the whole of the worlds manufacturing and engineering base. Energy (oil, coal, gas, nuclear), planes, ships, pharmaceutical industries .. food, drink and clean water ...car manufacturing, national and international defence, railway networks, telecommunications, medical equipment, broadcast media. Washing machines, microwave ovens, video recorders, alarms/intruder detection systems and central heating controllers. control temperature, lighting, air conditioning and security access in many offices. And they also support point of sale equipment, cash dispensers and traffic management in a typical High Street. During 1995, more than 200 million PCs were shipped worldwide. In the same period, the number of embedded systems shipped exceeded 3 billion. According to research conducted over the past year, around 5% of simple embedded systems were found to fail Millennium Bug tests. For more sophisticated embedded systems, failure rates of between 50% and 80% have been reported (Action 2000 UK Government Taskforce). In our own experience however we have found it closer to 15%-20% in processor intensive industries. http://www.ccta.gov.uk/mill/embed.htm

http://www.state.id.us/y2k/solemb.htm Process control software http://www.compinfo.co.uk/y2k/scada.htm Embedded Industrial Control Systems http://www.iee.org.uk/2000risk/toc.htm Chip, Microcontroller, Microprocessor - Hitex Resources (not Y2K) Golden Y2K Immunization Rules by ICONOCLAST, CORUM Research Group, Geneva - 16 February 1999



-- G Bailey (glbailey1@excite.com), November 21, 1999.


http://www.y2k-status.org/EmbeddedFailures.htm

-- G Bailey (glbailey1@excite.com), November 21, 1999.

A bit of information I recall from the Marine Corp discussion, was that so many out of so many had incorrect dates, and thus could fail at an unknown date though rollover related.

Are we waiting for Dec 31-Jan 1? Isn't the answer to that question a part of The Big Unknown?

-- Paula (chowbabe@pacbell.net), November 21, 1999.


U.S. Dept. of Transportation Office of Pipeline Safety PIPELINE SAFETY ADVISORY BULLETIN

ADVISORY BULLETIN: ADB-99-03 Date: July 7, 1999

To: Owners and Operators of Hazardous Liquid and Natural Gas Pipeline Facilities

Subject: Potential Service Interruptions in Supervisory Control and Data Acquisition Systems

Purpose: Inform pipeline system owners and operators of potential operational limitations associated with Supervisory Control and Data Acquisition (SCADA) systems and the possibility of those problems leading to or aggravating pipeline releases.

Advisory: Each pipeline operator should review the capacity of its SCADA system to ensure that the system has resources to accommodate normal and abnormal operations on its pipeline system. In addition, SCADA configuration and operating parameters should be periodically reviewed, and adjusted if necessary, to assure that the SCADA computers are functioning as intended. Further, operators should assure system modifications do not adversely affect overall performance of the SCADA system. We recommend that the operator consult with the original system designer.

Background: During an Office of Pipeline Safety (OPS) investigation of a recent pipeline incident, OPS inspectors identified inadequate SCADA performance as an operational safety concern. Immediately prior to and during the incident, the SCADA system exhibited poor performance that inhibited the pipeline controllers from seeing and reacting to the development of an abnormal pipeline operation.

Preliminary review of the SCADA system indicates that the processor load (a measure of computer performance utilization) was at 65 to 70 percent during normal operations. Immediately prior to an upset condition occurring on the pipeline, the SCADA encountered an internal database error. The system attempted to reconcile the problem at the expense of other processing tasks. The database error, coupled with the increased data processing burden of the upset condition, hampered controller operations. In fact, key operator command functions were unable to be processed immediately prior to and during the abnormal operation. It is possible that post installation modifications may have hampered the system's ability to function appropriately.

The combination of the database error, the inadequate reserve capacity of the SCADA processor, and the unusually dynamic changes that occurred during the upset condition, appear to have combined and temporarily overburdened the SCADA computer system. This may have prevented the pipeline controllers from reacting and controlling the upset condition on their pipeline as promptly as would have been expected. For further information, contact Chris Hoidal, Director, OPS Western Region at 303-231-5701

-- G Bailey (glbailey1@excite.com), November 21, 1999.


With less than 20 Federal work days left, it ain't gonna be pretty.



-- K. Stevens (kstevens@ It's ALL going away in January.com), November 21, 1999.



Fact mixed with fiction and old spculation. Overall a gross exageration of Y2k in embedded systems. If you see David Hall, Tava- Beck (y2k services for hire), and the UK IEE "casebook" in a y2k post, its gonna be "doomsday".

Regards,

-- FactFinder (FactFinder@bzn.com), November 21, 1999.


OK, so subtract the speculation and fiction. There are still quite a few real problems that happened. Interestingly (why am I not surprised), I've seen articles specifically noting that several (and probably all) of these problems have been corrected, and *demonstrated* to be now working properly. Of course, G Bailey doesn't bother showing us any of those updates. Maybe he didn't consider them newsworthy enough to collect? Maybe he feels that documenting the reality of remediation will give the "wrong" impression?

Nonetheless, the actual problems show that embedded failures were not mythical -- they're real. I would expect nearly all significant embedded problems that have been discovered have been addressed, on the grounds that even the dumbest engineer would see the necessity. How many significant embedded problems haven't been found, or even looked for? We have almost no basis for speculation in most industries except power generation and distribution, where even Rick Cowles now concedes that even the *potential* for problems has been reduced to manageable levels, much less their actual incidence.

The increasing reports of planned orderly shutdowns and startups spanning rollover seems to show that many are very nervous, probably for good reason. Again, I would expect this strategy to be applied where it makes sense -- where the errors are looking at intervals, and only the single interval spanning the rollover will be wrong. Best to skip that one, very carefully.

-- Flint (flintc@mindspring.com), November 21, 1999.


Moderation questions? read the FAQ