Am I right? Do embedded systems mean near certain TEOTWAWKI??

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Can someone tell me if my latest assessment is correct? Here it is in a nutshell...

The embedded system/chip problem was originally though to be VERY serious as the rate of chip failures was initially thought to be very high. Then around late 1998 to mid-1999, we began to hear reports that embedds actually have a much lower failure rate than we initially thought, something less than 1%. However, in late November of this year, we learned that while overall embedded systems failures were low, the embedded systems that are most important, namely those in utilities, chemical plants, oil refineries, etc, do in fact have a high rate of failure. Not just hight, but incredibly high; somewhere in the vicinity of between 5 and 10 percent failure rate. This being the case, come Jan 1st, 5% or more of chips in the above mentioned facilities will fail. Now if these chips do fail, then does not the company in question have to find, replace (if there is a replacement), test, etc, in order to get their systems back up and running? Also, isn't it safe to say, that a 5% failure would almost certainly mean that the utility/refinery/chemical plant/etc. would cease to function until at least a majority of critical chips could be replaced?

Unless the above assessment is flawed (if it is, please tell me), then it seems that the only logical conclusion is that TEOTWAWKI is not only a possibility, or even a probability...it is a certainty.

-- Orson Wells (wells@whitebulb.com), December 22, 1999

Answers

I second that emotion. I predict its going to be a lot worse than the majority of GI's here think. No Y2K welcoming committee folks, but a sledgehammer across the knuckles....

-- wondering (had@to.ask), December 22, 1999.

The embedded problem was originally no problem at all. I was working on Y2k for a couple years before I even HEARD of embedded systems, and the embedded problem was at first dismissed as some kind of a hoax or bad joke.

Nothing in this can of worms is a certainty, except that in 200 hours and change, we're gonna be entering a new year. My guess is that we're facing a depression at best. But how many embedded systems will actually become dysfunctional or nonfunctional? I don't know. And that's likely to have no relation to the number that have clocks, or compute dates.

I still think the info systems failures will do more damage than the embedded failures. Look at the totally bollixed railroad mergers in the last few years, leaving food rotting on sidings, etc. But that's just my view because I've got a big iron batch background (although I'm a PC guru and OO whiz now, you betcha), and don't know much about embedded devices.

"the only logical conclusion is that TEOTWAWKI ... is a certainty"? Nope, no certainties at all.

-- bw (home@puget.sound), December 22, 1999.


C'mon you guys

Orson has a legitimate thread. Give him a break take your cynical, emotional outburst outside and duke it out.

Please TB2000 community this is too important to let a little fist fight destroy its intent and focus.

I agree with Mr. Wells, if Paula and R.C. and Shakey (are you around shakey). By the way folks, this week Mr. Shakey responded to a question by mad monk about when plants shut down for the roll over what will happen when they start up. Mr. shakey compounded Mr. Welles question by stating that some imbeds once taken off line over the date change will not come back up when the plants comes back on line.

So please somebody...... what is the conclusion ...any Quid enum demonstratum...

-- d----- (dciinc@aol.com), December 22, 1999.


Interesting that you say systems failures will do more damage than the embedds. Unfortunely for us, it's not EITHER the systems failures OR the embedded systems failures, but rather BOTH. Kind of a bummer...

-- Orson Wells (wells@whitebulb.comf), December 22, 1999.

Thanks for nothing BW. Now, I'm no puter geek, but I can read. And I can smell a lie from here to there. Let's see. Hmmmmm, power systems go out across entire cities, no apparent reason. Could it be the embeddeds? Could it be the programming. Who Cares! The fact is the power went out. And its going to go out, out, out. All you computer experts really don't know diddly, which seems to make things worse than better.

-- wondering (had@to.ask), December 22, 1999.


Sorry bw, wrongo dude, ya haven't done yer homework laddie :o)

Orson - I hear you - the latest I'm hearing is also a 5-10% failure rate which is quite astounding - but verifiable, just ask RC...

Check this out. Remember that your average oil platform in the Gulf of Mexico has up to 10,000 embedded chips - even a 1% fail rate means 100 chips down.

Same in the North Sea - remember these rigs in the North Sea were built 25 years ago and many chips are inaccesible without costly submersibles being used - $500,000 per 8 hrs...

"Dr. Paula Gordon Responds to R.C. ----- ----------------------------

R.C. I wanted to respond to a few of the very thoughtful comments you made.

Quoting you:

"My point is that if I, as a layperson inquiring and educating myself in my spare time can figure this out (and I did so way back in time) surely those whose job called for such an understanding should have done so way back then. It is nearly inconceivable to me that those tasked with the issues of Y2K would have not aggressively pursued a thorough education on this important aspect of Y2K. I know if I were in that position of responsibility I would exhaustively research the issue. In my case as a layman, I didn't have to get exhausted to pick up the necessary elements to realize that embeddeds needed special monitoring and that self-reporting was untrustworthy."

Even comments about embedded systems that Bill Gates has made reflect a failure on his part to comprehend the problems that embedded systems failures can pose, including multiple, near simultaneous failures. One must spend at a bare minimum at least ten to twenty minutes with someone who not only knows about the nature and scope of the problem, but who is also able to communicate in understandable terms. If a person with Bill Gates' intelligence has not applied himself to learning about embedded systems, it becomes less of a surprise that persons in the government who had little or no expertise in technology to begin with, would do so. They might not even know what questions to begin to ask beyond: "What's an embedded system?" Indeed, I talked with several people in roles of responsibility in the Federal government in 1998 who have asked me that very question. One, in fact, was in the Office of the Vice President.

As for a sense of responsibility, I certainly agree that one just assumes that surely everyone who serves in government has such a sense of responsibility. It is a tragedy for our country that there are persons serving in government today who do not have such a sense of responsibility.

Quoting you again:

"It is hard for me to conceive that presidential advisors would not have had intelligence personnel extensively research all aspects of embeddeds issues and that the results would have been so benign in their conclusions. IF this is so, then it speaks volumes concerning the levels of incompetence within the Intelligence community. I find that a little hard to believe that they didn't know. I can accept the notion that intelligence sources knew and simply failed to educate top leadership personnel effectively. I can see that political minds would attempt to ignore it because they prefer to focus upon "happy" thoughts and thus ignore vital issues that might confuse and derail their "positive" approach to solutions."

I can personally attest to the fact that there are people who are knowledgeable concerning embedded systems who have either tried or volunteered to "educate" the President and the Vice President concerning the entire Y2K problem, including embedded systems. To my knowledge, such efforts began as long ago as 1995. That would be educators were not successful may say far more about the capacity or interest of the President and the Vice President to grapple with this subject, than it does with the competency of those who tried to educate them.

I personally know that the closest advisors of both the President and the Vice President have been provided materials on the subject of embedded systems since July of 1998. I have no idea whether those materials were ever read, or if read, if they were understood. I know that offers of technical briefings and invitations to panels on embedded systems were given to Presidential advisors beginning in the early summer of 1998 and again in December 1998, as well as several times in early 1999, including as late as May. The specific offers and invitations that I know about were not accepted.

In May of 1999 I learned that the President's Council had asked the National Institute for Standards and Technology to provide the Council with a kind of definitive review of embedded systems issues. Mr. Koskinen was seeking clarification concerning some specific issues. Several of these same issues turned out to be ones that I had brought up with him. Mr. Koskinen released a statement summarizing the concerns discussed at the November 9th meeting. That summary covered many of the issues that I, among others, had raised with him.

Quoting you again:

"In looking at your historical recap (and its very good, I take no issue with your work) I am amazed that leadership didn't call these people into the loop until last month. These people should have been identified by intelligence sources, contacted and brought forward by 1995 at the first Presidential briefing."

The President's Council's decision to seek clarification on embedded systems concerns did not start in November. The November 9th meeting was the culmination of efforts begun in April or May. I would add, however, that such efforts were long overdue even in April or May, let alone November.

Again quoting you:

"I guess the bottom line here Paula is this:

If the top levels of the Administration and Congress were unaware of the embeddeds problems until 1999 then there has been gross dereliction of duty and a complete failure by top leaders including the President. They were unworthy of the responsibilities entrusted to them. This applies equally to members of both parties. This is really a case of gross negligence and incompetence from the top all the way down to the bottom."

Your conclusions are quite similar to my own. Senator Bennett seemed to understand the embedded issue for a time between June and July of 1998 and early in 1999. Then in early 1999 the Senator became convinced by corporate leaders that he spoke with that embedded systems problems were not as great as he had previously been led to believe. For a variety of reasons, including, apparently, the political riskiness of holding onto such an unpopular point of view, Senator Bennett accepted the more sanguine appraisal and at times, seems to have all but declared a premature victory. Meanwhile the President does not seem to have comprehended the problem of embedded systems and the Vice President and his highest level staff seem to understand the significance of embedded systems even less.

Quoting you:

"Frankly, I suspect that the buck passing started long ago. If so, as it seems by Koskinen's statements, then that these people really have known all along but refused to admit it and have been spinning excuses for their failures.

If the gov't is just now finding out the full scope of embedded systems issues it only points to gross negligence on their part. They deserve the blame. IF, however, they knew long ago and even then realized it was hopeless and suppressed that information then we have another instance of gross incompetence and mismanagement in regards to preparations."

From my explorations of relevant background material here, I think that we are looking less at the suppression of information and far more at the failure to both gather the most pertinent information and assess its full significance. One needs to first grasp the significance of information before making a decision to suppress it. They truly did not grasp the significance of the information. True, this is unbelievable. It is nonetheless the conclusion that my information, knowledge, and experience compel me to believe. Quoting you:

"No matter how I look at it. The government is at fault here as much if not more so than industry. It's really the case of what happens when a government gets corrupted by big business in a bi-partisan manner."

I think that the failure to understand the complexities of the threats and challenges that face the nation and the world is more attributable to simple ignorance and lack of effort to try to understand. There has been a widespread failure on the part numerous people in key roles of responsibility to learn about the nature and scope of the technological aspects of the problems facing us. They have not gathered persons around them who have adequate technical expertise. The strategies that they have developed for addressing those problems are consequently less than adequate. The problem definition is simply an inadequate one. Even now the level of comprehension of technological complexities, let along organizational, sociological, psychological, and managerial aspects of the Y2K and embedded systems crisis are greating wanting. Perhaps, if the problem being faced were likely to have only a level 1 or 2 impact on the impact scale, the approach they have advocated would have been appropriate. Perhaps, too, their optimism might have some nearterm justification if there were no embedded systems issues, and if all the remediation and testing of all information systems, mission critical and non-mission critical, in both the public and private sectors were complete and tested. Such a sanguine view, however, would not be justified over the long run since the rest of the world has dealt with the remediation of information systems far less effectively than we have. The impacts that will be felt in other parts of the world will have major ripple effects that will have the most profound impacts on the U.S.

You wrote:

"Based upon the way this government has conducted itself so far, I am not optimistic that they will provide correct leadership to solve the problems that are about to descend upon us."

In Part 5 of my White Paper, I describe some alternatives in the event that the current leadership of the Federal government does not rise to the occasion. There is still an outside chance that they could. It would require incredible integrity to admit their failure to understand the nature and scope of the problem to date. It would require including persons in the decisionmaking process who possessed needed kinds of expertise. It would require applying adequate resources to addressing the set of problems facing us now and for the coming months and years.

You wrote:

"On a different note: What was your opinion of John Koskinen's recent comments at a National Press Club appearance regarding the embeddeds memo he recently made?"

If you are referring to the questions that were asked of Mr. Koskinen on November 10th (during and after the Press Conference), I felt that his comments demonstrated that he had taken a major step in the right direction. However, this new found realization that the embedded problem is far more serious than previously recognized, does not as yet appear to have been incorporated into any plan of action or change in overall strategy. Such a plan of action is going to have to be undertaken sooner or later if potential future impacts are to be significantly prevented or minimized over the coming months and years. There are embedded systems that will continue to be "ticking timebombs" until they are remediated. The high hazard areas have to be addressed proactively and head-on. There are still many opportunities to avoid having to "fix on failure". In fact, such opportunities will continue to exist for months, if not years to come.

A major task right now is that of getting the powers that be to recognize that we are not facing a single period of time when there are apt to be disruptions, unless you consider that the rollover period as a period of time that could last for months or years.

I hope to have a chance to address many of the issues discussed here in Part 7 of my White Paper. I will likely use there as well some of the thoughts developed here.

Best wishes,

-- Paula Gordon (pgordon@erols.com), December 21, 1999."

-- Andy (2000EOD@prodigy.net), December 22, 1999.


Wondering, you are not thinking things out properly here - if power goes out there is a CRUCIAL aspect to the power going out... I'll tell you why...

"Thanks for nothing BW. Now, I'm no puter geek, but I can read. And I can smell a lie from here to there. Let's see. Hmmmmm, power systems go out across entire cities, no apparent reason. Could it be the embeddeds? Could it be the programming. Who Cares! The fact is the power went out. And its going to go out, out, out. All you computer experts really don't know diddly, which seems to make things worse than better.

-- wondering (had@to.ask), December 22, 1999."

you said...

"Could it be the embeddeds? Could it be the programming. Who Cares!"

I CARE!

If it's the embeddeds for example in a power station they could cause SERIOUS damage to turbines - think about it - even if the embeddeds are fixed 3 days later the turbines or generators or whatever may well have been destroyed. Samegoes for nuke plants. Some equipment, valves, pipelines, may be destroyed.

Same goes very much so for refineries - ever looked at refinery wondering??? - it's a freakin' nightmare of pipes and embedded control systems - almost completely automated.

If embeddeds fail in this environmemnt you will have KABOOOOM situations.

Refineries may have to be rebuilt.

Think about it.

Orson is dead on right!

-- Andy (2000EOD@prodigy.net), December 22, 1999.


I disagree the embedds problem points to worse than advertised but not necesarily TEOTWAWKI. In some cases there are workarounds and "jumpering" that can be done. This will limit control and lower the safety but there are always risks. The truth is this is a world wide operational live test. The results will be a suprise but it is not for every system an all or nothing question.

-- Squid (ItsDark@down.here), December 22, 1999.

"This being the case, come Jan 1st, 5% or more of chips in the above mentioned facilities will fail."

One problem with that scenario -- sort of a minor paradox.

The only chips that will fail on Jan 1 (at the implied strike of midnight) will be those that are *known* to exist -- and regularly calibrated to accurate date/time, to compensate for the inevitable drift.

Among those destined to fail, those that are *unknown* -- or, known, but *ignored* -- will fail at random dates and times in a bell curve distribution leading up to, and after, 1/1/0.

The paradox? Those most likely to fail at the precise stroke of midnight on The Day are *also* those most likely to be remediated, as they're the *known* systems, and receiving regular attention.

The unknown and the ignored are the *real* problem, and they'll strike whenever they damn well feel like striking.

-- Ron Schwarz (rs@clubvb.com.delete.this), December 22, 1999.


Lying? Thinking that info systems will be worse than embeddeds is lying? Wow, Wondering, don't know what buttons I pushed for you, but I don't tell lies for anybody. And yes, we experts each have our areas of ignorance, as you probably do, too.

Orson, you are correct, it's both of them. And it really won't make much difference to the average person, when something goes belly up, which of the two it was that broke. Bummer, fer sure. It's like the autopsy results -- you care what killed him because you can learn from it, but he's still dead dead dead.

Andy - what was it that you found wrong in my homework, or lack thereof? Was it the historical note, when I said we started out by thinking that embedded were no problem? Or was it that I think they will be less of a problem than info systems? Maybe it was just sort of a breezy greeting, on your way to talk to Orson?

-- bw (home@puget.sound), December 22, 1999.



bw, Thanks for your inputs - I see nothing wrong with your logic.

If there is a failure in mainframe systems, how long might repair take? My understanding is given the source code, sufficient documentation and the original author onhand of the program, the problem can be located and fixed fairly quickly with the system put back into service provided no other problems develop.

If there is a failure in embedded control system, failure could mean a loss of control or safety which causes/manadates a shutdown possibly with copllateral physical damage. Fixing an embedded failure may require redesign or repair of the system in question.

Based on my limited understanding, I suspect an embedded failure with collateral damage wouldbe the worse event.

-- Bill P (porterwn@one.net), December 22, 1999.


The real question is: how extensive will be the damage caused by embedded chips, and how will it take to fix them? If the chips cause widespread damage across the board, and no work arounds are possible, then it's a matter of whether or not existing inventories of supplies will last enough while chip replacements are secured.

I still think Infomagic was right all along,....it's going to be a 10.

-- Sure M. Worried (SureMWorried@bout.Y2K.coming), December 22, 1999.


From the 1st time I read Infomagic last year I have looked at every possibility in the context of his devolution theory. I become more convinced daily that he was right.

-- Porky (Porky@in.cellblockD), December 22, 1999.

just a correction to the idea that source code can be repaired quickly even if the author is on hand at the time that it breaks. Many of the enterprise class of software constructed over the last 5 years is distributed. These systems span OS's and platforms. The result of the failure may well be in front of you (on the client) but the offending source code may be who knows where. And after it is fixed, then comes the testing.....

let me inform as one who has suffered, repairs to mature code never fail to cause ripples, side effects or cascades of other errors.

then there is the testing. and then the data changes. and all this assumes that the programmer sitting on his ass slaving away in some compiler ide is in his usual environment. What if she/he is cold, spotty electricity, thirsty......et cetera ad infinitum

and finding the programmer to begin with... a lot i know are relocated from where they were doing their y2k work.

so, embedded may jackie chan us in the face, but the large dbms and enterprise code will be waiting to suck us into hell when we try and stand up.

-- pliney the younger (pliney@vallier.com), December 22, 1999.


I wonder if the embedded problem did not exist, if it could still be TEOTWAWKI. I mean, if the problem of cascading cross defaults is real, then we are screwed regardless of the emebedds!

-- Orson Wells (wells@whitebulb.com), December 22, 1999.


Hi Ron,

It's quite possible for there to be "hidden" date-sensitive chips, when they're networked. There are many tiny network protocols that are now used (and have been used for a decade or more) where a date/time is passed over the network. Chips that supposedly don't need a date/time could pick it up from the network without any hint in the specs.

What that means is that even though the control computer and "known" chips have been made compliant, the hidden chips could pick up the date, now maybe remediated into the wrong format. (I know of several networks in production environments, which combine PC's, PLC's and embeddeds, that could have this problem, as a date is passed between the PC's and PLC's -- but supposedly not the embeddeds.)

-- Dean -- from (almost) Duh Moines (dtmiller@midiowa.net), December 23, 1999.


Pliney the Younger. LOL! That name kills me.

PS--Calling bw a liar sounds a little harsh whether the statements are right or not. "Can't we just all het along?" (LOL again.)

-- Dave (aaa@aaa.com), December 23, 1999.


Bill, the big questions on fixing a mainframe system (or anything that is not an embedded system, I suppose) are (1) is the source code available or even determinable, and (2) is this a data design problem as opposed to a logic issue.

If a program messes up, it might take a couple days just to figure out that the problem is in a certain assembler subroutine called by a single module that is in turn called by every program in the company. This multi-level nesting makes programs easy to maintain, since key pieces of logic have to be coded only once. It also makes that piece of logic critical yet paradoxically hard to find. Typically, it also means that piece of logic is old, and the programmer who wrote it is gone, and no one uses that language any more because it's no longer cool. New programmers like GUI stuff and don't want to do bit-flipping ASM. So once you figure out where the error is, it's possible that you won't even recognize the source code when you find it, because nobody onsite can read the language. Hence the "determinable" issue - where is it and what does it look like.

Second, if the problem is logic, it can often be fixed fairly quickly. Patch a program, try it out with real cursory testing, slap it into production. When the company is bleeding to death, this is how you fix it. Before the disaster (say 1998 and early 1999) we set up parallel test rigs, do rigorous testing, all that good stuff. But now we shove it in and see if it works. That's the 3-day fix. On the other hand, if the error is a 2-digit year in a sort key of a terabyte database, you have to unload and reload the data. That means (in an oldstyle non-SQL database) writing and testing several unload/reload programs, which takes a few days or a week. Then you do the unload/reload, which might take a week or two depending, and then you have to modify/test every program that uses that database. Massive effort that can take months.

Embeddeds are a big danger, no doubt. But (let me get a little elliptical here) the fear of embeddeds is a separate issue from the embeddeds themselves. It seems that Embedded Fear is sort of the Fear Of The Day, and people are dismissing the risks posed by the big iron system failures. Read Cory Hamasaki's stuff - he's old-time big-iron, and every word he says rings true for me. Companies will go under simply because their billing system has a 2-digit year.

As Orson said, we have to survive both of these problems.

-- bw (home@puget.sound), December 23, 1999.


Moderation questions? read the FAQ