An Engineer's Observations: Ongoing Y2K-Related Problems

greenspun.com : LUSENET : Grassroots Information Coordination Center (GICC) : One Thread

A cross-posting of mine if possible interest posted 1/30/2001 at

http://pub5.ezboard.com/fyourdontimebomb2000.showMessage?topicID=21122.topic

1/30/2001 Summary of an Engineer's Observations Regarding the Status of Ongoing Y2K-Related Embedded Systems and Complex Integrated Systems Problems

Preface: I am attaching a summary of observations that I have drafted. The summary is based on observations that an engineer has shared with me. I am sharing these summarized observations for several reasons:

1) because of the relative absence of first hand accounts concerning what is actually going on regarding Y2K-related embedded systems and complex integrated systems problems;

2) because I have heard some similar off the record accounts from other engineers; and

3) because I feel that right now such off the record observations provide the best information and roadmap to further inquiry that we have.

Perhaps, those who are in a position to do so, will come forward, at least off the record, and help enlighten the public and those in positions of public and private sector trust and responsibility concerning the significant role that Y2K-related embedded systems and complex integrated systems problems are having in a variety of sectors, including the energy sector.

*******

During the last week of January 2001, I received some information from a seasoned engineer who has been working "on the frontlines". The identify of the engineer cannot be disclosed since the individual's job security could be jeopardized.

The individual shared information concerning the many Y2K-related problems that he is continuing to see. (I have not met the engineer in person and do not know his or her real name and will refer to "him" as "he" in this summary.)

Also, rather than quote the individual directly, I am summarizing most of the information that he shared with me.

~ Several of the companies that he has worked with have had extremely serious data corruption problems. After much effort and temporary successes in dealing with these problems, the data becomes corrupted again.

~ With respect to the grid, he feels certain that the energy crisis will become increasingly apparent this summer. In his view there have been large numbers of failures involving energy systems. In these instances, he says that workarounds are often not possible. He notes that turning clocks back and going to manual have resulted in some cascading failures and time delays.

~ He notes increasing reports of problems with dirty power and low power and instances of involving the total failure of electrical equipment.

~ He also talks about what he feels is a direct correlation between solar storms and hardware failures.

~ He says that those working "on the frontlines" are being threatened with the loss of their jobs if they speak up about what they know.

~ I had told him that it was my sense that people at the top of private sector organizations do not seem to comprehend the extent of their Y2K-related embedded systems and complex integrated system problems. He said that of the persons he comes across, less than 20% of those who work with complex systems understand the systems and keep up with changes and that only a small percent is able to address problems effectively. The others don't really understand what is going wrong with their systems.

~ I asked him how large a role he thought Y2K-related embedded systems and complex integrated systems problems were currently playing in the evolving energy crisis. He said that he estimated that 70% of the failures involving the energy sector, and communications (among others) are directly the result of Y2K. He estimated that 20% of the failures could be due to human error on the part of those trying to deal with the problems. He said that those individuals often only have enough ability to deal with normal activities and that they have insufficient understanding to deal with anything that departs from the norm. He estimates that the other 10% of the problems is owing to normal hardware failure, user problems, and environmental issues.

~ He said that manual override and date resetting have been used when automated production systems and SCADA systems have failed. He said that it is not uncommon when he is replacing a system to be told by the client that he has to put in an old date or the application will not run. He added that many of these applications are old and that large networks over the past decade can be composed of a mix of upgrades, networks, and applications that are out of sync. Owing to these problems, he estimates that the country is running at 65% to 70% of last year's production rates on the average.

~ I asked him about problems in all of the high hazard sectors: oil rigs, refineries, oil and gas pipe lines, nuclear power plants, nuclear reactors, chemical plants, hazardous material facilities and sites, electric power plants, water purification plants, waste treatment plants, trains, planes. He responded that most of these have fixed what they could; fixed the rest on failure when possible; or, if the expertise is missing, attempted to make the failing system work manually. In situations where a system is run 24 by 7 and where there is an apparent problem, he says that there is only a narrow window of time during which the system can be analyzed and repaired. Sometimes when there is an apparent problem, but where no hard errors have occurred, he has been asked to replace hardware. When new hardware does not fix the problem, going to partial manual override becomes the only remaining option. He also noted that in many networked environments, date/time is sent in packets and when there are systems broadcasting an old date along with current dates, the data can be corrupted or miscalculated.

~ He said that he has not found anyone who is willing to talk about what is happening, even off the record. He said that some of his more aware customers are asking him what he is seeing and asking questions about the power crisis. He thinks that they are beginning to catch on.

~ I asked him if he knew of any cases involving high hazard sectors where the problems are being publicly recognized AND linked to Y2K? He said that Y2K is never mentioned in explanations as a cause of problems. Instead "silly" explanations are offered and most people take these explanations as fact.

~ I asked him what his prognosis was for nuclear power plants. He said that he was told prior to the rollover by someone in a position to know that in instances that his information source knew about, clocks were turned back where there was a possibility of potential problems and failures. He said that this only works for a time as the interconnectness of these system runs too fast for individuals to keep them going. In his view, the production task has become very costly negating most, if not all profit. In addition mechanical/electronic failures are extremely costly. He said that he felt that many nuclear power plants were running well below capacity due to the failures and owing to manual operations. He feels that they do not seem to be making much progress getting back to normal and that in the end those plants will become too expensive to run.

~ I said that I have been hearing about shortages in the pharmaceutical industry and ask him if he thought this might be related to problems with manufacturing processes. He said that there are manufacturing problems and that too many bugs have slowed manufacturing processes. He added that there is a major shortage of computer components and that the parts that are available are often parts that have been put back in stock even though they do not work. He said he has found the same to be the case when it comes to other technology companies and parts vendors.

~ Regarding health care system problems, he said that they are having all kinds of issues, including claims that are getting rejected for no valid reason, accounts that are coming up blank, or billing where charges and services are being doubled.

~ Regarding air travel, he said that air travel is having its share of Y2K issues. He also feels that solar storms are having an impact on air travel and that Y2K coupled with solar storms have triggered many of the problems that have been occurring.

~ I asked him what he thought about the possibility that manhole cover explosions might be caused by irregularities in transmission. He said that the manhole issue is a very interesting one and that he feels that it is due to electrical power cables overheating and creating a gas that results in an explosion. He thinks that this is probably due to the use of manual power overrides.

~ He said that every time there are major solar flares, he notes an increase in CPU, memory and disk drive failures. He notes that the incidence of failing modules is very high owing to their density, a factor that makes them more sensitive to the effects of solar storms.

~ I asked him if he knew of any cases where problems involving data degradation were being publicly recognized AND linked to Y2K. He said that not one company is going public. The usual explanation is that the company is having "computer problems" and that "the system is new."

-- Paula Gordon (pgordon@erols.com), January 30, 2001

Answers

Paula Thanks for the update. I was on the y2k remediation team for my company in 97-00. After the roll-over our company (multiple locations} experienced significant down-time. IST period we saw > 3.2mm in losses. Example...10 Digital DECNET servers controling an entire process in one of the plants went down, screwing production for two weeks. Official response was "a welder hit one of the PLC'S during maint. I was there, and there were no welders even close! Production to this day is compromised...Things go down hard and suddenly, without apparent reason...I've been at this location 32 yr's and haven't seen such crazyness since we sarted up in 1969. Nobody talks...I can't..retirements too close. Our kids will read about the cover-up, and wonder.

-- inthesameboat (cantsay@dot.com), January 30, 2001.

Dear "In the same boat"

Thanks so much for sharing your first hand experiences. I think these kinds of first hand reports will be helpful to others.

Regards,

-- Paula Gordon (pgordon@erols.com), January 30, 2001.


To me it seems obvious. Think back, when did oil prices suddenly explode, when did the drug companies suddenly start having shortages, when did electricity suddenly become short in supply?

Remember all the refinery explosions and fires that occured immediately after roll over? Train accidents were rare before 1/1/2000. Prior to this the media always reported them. After roll over they be came so common they were no longer reported.

Remember the surge in airline problems? Think back, remember all the problems. But as with pre-roll over, things were not reported, they were explained away, blamed on something else. It couldn't be Y2K. Thirteen months later and people are still afraid to even think about what could have happened. Instead the survivalist are mocked and shamed into silence.

Can you really believe that a sudden explosion in the population occured in one year? That the population grew enough in one year to create the problems and shortages we are seeing today.

Remember how we were warned about cascading problems. Yes, Y2K did cause problems, perhaps not as dramatic as expected but significant problems none the less. Perhaps things didn't cascade as fast as feared but they are cascading, and they will continue to cascade. Why will they continue to cascade, because to this day, nobody will admit that it was, and continues to be a serious problem. Instead it's a welders torch.

If you don't believe it is cascading, read the news. Numerous states that did not deregulate utility companies are being warned that they too may face fuel and electricity shortages this summer. Without fuel and elctricity, food shortages are not far behind. We do exist with in a very fragile system.

Just a little prediction on my part, it's going to be a hot summer with brownouts becoming a common occurance, in many if not most regions. We will not be able to build new plants by then. We will not be able to fix the plants we have by then. Next summer will be very chilly indeed.

Always be prepared.

-- Tom Flook (tflook@earthlink.net), January 30, 2001.


You are all right about what is going on but you are to narrow in scope. All this is a warning for those who can see. What is coming you cannot physically prepare for, Oh, you might be materially prepared for a little while, but in the end it won't matter.

-- Phil Maley (maley@cnw.com), January 30, 2001.

As a cautionary addition to this thread, I repost below a submission to the Computer Risks listserv about a problem that might have appeared to be a "Y2K bug," but wasn't. (Let me hasten to add that I have no doubt--including based on experiences at work which I do not feel free to discuss--that many more truly Y2K-related misadventures are occurring than we will ever know, but I thought this example would be of interest.)

***************************************************

Source: RISKS DIGEST 21.21, 25 January 2001 [see source info at end of this post]

Date: Thu, 18 Jan 2001 14:59:50 -0800 From: "George C. Kaplan" Subject: Another Y2K+1 glitch -- sorta

The Extreme Ultraviolet Explorer (EUVE) satellite was launched in Jun 1992 to do astronomical observations in the extreme ultraviolet (100 – 1000 angstroms). Its primary mission was planned for something like 18 months, but a series of extensions has kept the satellite running ever since, operated by UC Berkeley and NASA. Money is finally running out, and it's scheduled to shut down on 31 Jan 2001.

On 1 Jan 2001, a planning system that checks observing plans against operational constraints suddenly failed. A Y2K+1 bug? Not quite. Many of the constraints are based on the relative positions of the sun, moon, and planets. (e.g. "Don't point the telescopes at the sun.") A solar/lunar/planetary (SLP) ephemeris file which provides this information to the planning system was valid only through 31 Dec 2000.

OK, someone forgot to do the annual update, right? Nope. Solar system motions are well-known and predictable over long time periods. The SLP file covered a 10-year period; it was the only one ever used by the mission. No provision was made for updating the file, since at the time EUVE was launched, nobody expected the mission (even with extensions) to last through 2000.

So it's a classic problem of legacy software and data. The original programmers are long-gone. Nobody knows quite where the original file came from, and the (binary) format is different from SLP data used on more recent missions operating with similar constraints.

At this point it's unlikely that an updated file will be available before the mission shuts down, so the operations team at UC Berkeley is just bypassing the SLP checks. That's a risky choice, but reasonable, given that they have only a couple of more weeks of operations. You have to wonder what they would have done if the mission had been extended for another year, though.

George C. Kaplan, Communication & Network Services, University of California at Berkeley 1-510-643-0496 gckaplan@ack.berkeley.edu

------------------------------

Date: 26 Dec 2000 (LAST-MODIFIED) From: RISKS-request@csl.sri.com Subject: Abridged info on RISKS (comp.risks)

The RISKS Forum is a MODERATED digest. Its Usenet equivalent is comp.risks.

=> SUBSCRIPTIONS: PLEASE read RISKS as a newsgroup (comp.risks or equivalent) if possible and convenient for you. Alternatively, via majordomo, SEND DIRECT E-MAIL REQUESTS to with one-line, SUBSCRIBE (or UNSUBSCRIBE) [with net address if different from FROM:] or INFO [for unabridged version of RISKS information] .MIL users should contact (Dennis Rears). .UK users should contact .

=> The INFO file (submissions, default disclaimers, archive sites, copyright policy, PRIVACY digests, etc.) is also obtainable from http://www.CSL.sri.com/risksinfo.html ftp://www.CSL.sri.com/pub/risks.info The full info file will appear now and then in future issues. *** All contributors are assumed to have read the full info file for guidelines. ***

=> SUBMISSIONS: to risks@CSL.sri.com with meaningful SUBJECT: line.

=> ARCHIVES are available: ftp://ftp.sri.com/risks or ftp ftp.sri.comlogin anonymous[YourNetAddress]cd risks [volume-summary issues are in risks-*.00] [back volumes have their own subdirectories, e.g., "cd 20" for volume 20] http://catless.ncl.ac.uk/Risks/VL.IS.html [i.e., VoLume, ISsue]. http://the.wiretapped.net/security/info/textfiles/risks-digest/ .

=> PGN's comprehensive historical Illustrative Risks summary of one liners: http://www.csl.sri.com/illustrative.html for browsing, http://www.csl.sri.com/illustrative.pdf or .ps for printing

-- Andre Weltman (aweltman@state.pa.us), January 31, 2001.



Andre,

A solar/lunar/planetary (SLP) ephemeris file
which provides this information to the planning
system was valid only through 31 Dec 2000.

This may too have been a Y2K bug. If the programmer
was using 2-digit dates he would have wrote the program
for that time period thinking that it would be replaced
before the end of the millennium. BTW I used to do
solar and planetary programs in BASIC in the ‘80s,
i.e., 1980s ::::-§

-- spider (spider0@usa.net), January 31, 2001.


Sorry, my mistake ::::-§

-- spider (spider0@usa.net), January 31, 2001.

spider,

The way I read the original post about the EUVE satellite, I thought the ephemeris file was a "library" of information that ended after 31 Dec 2000, rather than a calculation. As if, say, I needed to look up the entry for "volcano" in a multi-volume encyclopedia but I only had the books through "ukulele." But I could be mis-reading the post.

-- Andre Weltman (aweltman@state.pa.us), January 31, 2001.


Disregard my post, I was day-dreaming ::::-§

-- spider (spider0@usa.net), January 31, 2001.

1/30/2001 Summary of an Engineer's Observations Regarding the Status of Ongoing Y2K-Related Embedded Systems and Complex Integrated Systems Problems (With minor edit 2/1/2001)

(I first posted this Summary on 1/30/2001. The word "module" has now been added in the eighth bulleted item. ~ PG)

Preface:

I am attaching a summary of observations that I have drafted. The summary is based on observations that an engineer has shared with me. I am sharing these summarized observations for several reasons:

1) because of the relative absence of first hand accounts concerning what is actually going on regarding Y2K-related embedded systems and complex integrated systems problems;

2) because I have heard some similar off the record accounts from other engineers; and

3) because I feel that right now such off the record observations provide the best information and roadmap to further inquiry that we have.

Perhaps, those who are in a position to do so, will come forward, at least off the record, and help enlighten the public and those in positions of public and private sector trust and responsibility concerning the significant role that Y2K-related embedded systems and complex integrated systems problems are having in a variety of sectors, including the energy sector.

**********************************************************************

During the last week of January 2001, I received some information from a seasoned engineer who has been working "on the frontlines". The identify of the engineer cannot be disclosed since the individual's job security could be jeopardized.

The individual shared information concerning the many Y2K-related problems that he is continuing to see. (I have not met the engineer in person and do not know his or her real name and will refer to "him" as "he" in this summary.)

Also, rather than quote the individual directly, I am summarizing most of the information that he shared with me.

~ Several of the companies that he has worked with have had extremely serious data corruption problems. After much effort and temporary successes in dealing with these problems, the data becomes corrupted again.

~ With respect to the grid, he feels certain that the energy crisis will become increasingly apparent this summer. In his view there have been large numbers of failures involving energy systems. In these instances, he says that workarounds are often not possible. He notes that turning clocks back and going to manual have resulted in some cascading failures and time delays.

~ He notes increasing reports of problems with dirty power and low power and instances of involving the total failure of electrical equipment.

~ He also talks about what he feels is a direct correlation between solar storms and hardware failures.

~ He says that those working "on the frontlines" are being threatened with the loss of their jobs if they speak up about what they know.

~ I had told him that it was my sense that people at the top of private sector organizations do not seem to comprehend the extent of their Y2K-related embedded systems and complex integrated system problems. He said that of the persons he comes across, less than 20% of those who work with complex systems understand the systems and keep up with changes and that only a small percent is able to address problems effectively. The others don't really understand what is going wrong with their systems.

~ I asked him how large a role he thought Y2K-related embedded systems and complex integrated systems problems were currently playing in the evolving energy crisis. He said that he estimated that 70% of the failures involving the energy sector, and communications (among others) are directly the result of Y2K. He estimated that 20% of the failures could be due to human error on the part of those trying to deal with the problems. He said that those individuals often only have enough ability to deal with normal activities and that they have insufficient understanding to deal with anything that departs from the norm. He estimates that the other 10% of the problems is owing to normal hardware failure, user problems, and environmental issues.

~ He said that manual override and date resetting have been used when automated production systems and SCADA systems have failed. He said that it is not uncommon when he is replacing a system module to be told by the client that he has to put in an old date or the application will not run. He added that many of these applications are old and that large networks over the past decade can be composed of a mix of upgrades, networks, and applications that are out of sync. Owing to these problems, he estimates that the country is running at 65% to 70% of last year's production rates on the average.

~ I asked him about problems in all of the high hazard sectors: oil rigs, refineries, oil and gas pipe lines, nuclear power plants, nuclear reactors, chemical plants, hazardous material facilities and sites, electric power plants, water purification plants, waste treatment plants, trains, planes. He responded that most of these have fixed what they could; fixed the rest on failure when possible; or, if the expertise is missing, attempted to make the failing system work manually. In situations where a system is run 24 by 7 and where there is an apparent problem, he says that there is only a narrow window of time during which the system can be analyzed and repaired. Sometimes when there is an apparent problem, but where no hard errors have occurred, he has been asked to replace hardware. When new hardware does not fix the problem, going to partial manual override becomes the only remaining option. He also noted that in many networked environments, date/time is sent in packets and when there are systems broadcasting an old date along with current dates, the data can be corrupted or miscalculated.

~ He said that he has not found anyone who is willing to talk about what is happening, even off the record. He said that some of his more aware customers are asking him what he is seeing and asking questions about the power crisis. He thinks that they are beginning to catch on.

~ I asked him if he knew of any cases involving high hazard sectors where the problems are being publicly recognized AND linked to Y2K? He said that Y2K is never mentioned in explanations as a cause of problems. Instead "silly" explanations are offered and most people take these explanations as fact.

~ I asked him what his prognosis was for nuclear power plants. He said that he was told prior to the rollover by someone in a position to know that in instances that his information source knew about, clocks were turned back where there was a possibility of potential problems and failures. He said that this only works for a time as the interconnectness of these system runs too fast for individuals to keep them going. In his view, the production task has become very costly negating most, if not all profit. In addition mechanical/electronic failures are extremely costly. He said that he felt that many nuclear power plants were running well below capacity due to the failures and owing to manual operations. He feels that they do not seem to be making much progress getting back to normal and that in the end those plants will become too expensive to run.

~ I said that I have been hearing about shortages in the pharmaceutical industry and ask him if he thought this might be related to problems with manufacturing processes. He said that there are manufacturing problems and that too many bugs have slowed manufacturing processes. He added that there is a major shortage of computer components and that the parts that are available are often parts that have been put back in stock even though they do not work. He said he has found the same to be the case when it comes to other technology companies and parts vendors.

~ Regarding health care system problems, he said that they are having all kinds of issues, including claims that are getting rejected for no valid reason, accounts that are coming up blank, or billing where charges and services are being doubled.

~ Regarding air travel, he said that air travel is having its share of Y2K issues. He also feels that solar storms are having an impact on air travel and that Y2K coupled with solar storms have triggered many of the problems that have been occurring.

~ I asked him what he thought about the possibility that manhole cover explosions might be caused by irregularities in transmission. He said that the manhole issue is a very interesting one and that he feels that it is due to electrical power cables overheating and creating a gas that results in an explosion. He thinks that this is probably due to the use of manual power overrides.

~ He said that every time there are major solar flares, he notes an increase in CPU, memory and disk drive failures. He notes that the incidence of failing modules is very high owing to their density, a factor that makes them more sensitive to the effects of solar storms.

~ I asked him if he knew of any cases where problems involving data degradation were being publicly recognized AND linked to Y2K. He said that not one company is going public. The usual explanation is that the company is having "computer problems" and that "the system is new".



-- Paula Gordon (pgordon@erols.com), February 01, 2001.



Moderation questions? read the FAQ