NIST *****EMBEDDED SYSTEMS AND THE YEAR 2000 PROBLEM*****greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread |
Embedded Systems and the Year 2000 Problem
National Institute of Standards and Technology
NIST program questions: Public Inquiries Unit,
(301) 975-NIST, TTY (301) 975-8295.
NIST, 100 Bureau Drive, Gaithersburg, MD 20899-0001.
NIST is an agency of the
U.S. Department of Commerce's
Technology AdministrationEMBEDDED SYSTEMS AND THE YEAR 2000 PROBLEM
Gary E. Fisher
Computer Scientist
National Institute of Standards and Technology
Gaithersburg, Maryland
Michael Cherry
President
CenturyCorp.com
Century City, CaliforniaINTRODUCTION
Embedded systems control much of the infrastructure of the industrialized world. If not properly addressed, problems related to the Year 2000 transition will degrade embedded systems and therefore have negative effects in the infrastructure. The most commonly recognized problem occurs when the date supplied by the real-time clock calendar (RTCC) contains a 2-digit year. This may cause a problem on or after midnight, December 31, 1999, when the date rolls over to January 1, 2000.In those systems with this date problem, any software layer in control of the embedded system may interpret the 00 in the date as 1900, not 2000, which can potentially throw off time or date computations. 00 may also be rejected as an invalid year. In these cases, embedded systems with such problems may not operate as designed and thus cause control problems or malfunctions. If the embedded system is involved in a mission-critical or safety-critical system, there may be severe repercussions due to an embedded system failure. Proper testing is the most reliable way to identify potential embedded system Year 2000 problems. This article describes issues and priorities for testing these systems.
BACKGROUND
There are two primary areas where embedded systems typically have date problems: 1) the calculation of elapsed times, and 2) date information transmitted from one embedded system to another or to an external system.The elapsed time problem centers on two methods for performing these calculations. One may use date information and the other may not. In method one, elapsed time is computed by subtracting a start time from an end time, e.g., 12:15 p.m. 12:00 noon = 15 minutes. If the elapsed time rolls over midnight, then the start and end dates are also required to complete the calculation, e.g., 12/30/99 12:15 a.m. 12/29/99 12:00 midnight = 15 minutes. At midnight on December 31, 1999, the rules change. In a 2-digit year, 99 becomes 00 and the computation is now 01/01/00 12:15 a.m. 12/31/99 12:00 midnight = ? One of several answers may result depending on how the date computation was carried out. On a system programmed to recognize 00 as 2000, the computation may be performed correctly. On many embedded systems, this may not be the case. It is impossible to say what the answer will be since the program controlling the embedded device may not be available.
The second method of elapsed time calculation relies on an epoch, or base date, and a time counter, which is added to the epoch to arrive at a particular date and time. An example of this method is presented elsewhere in this report.
Another problem of Year 2000 testing stems from three areas where date information is transmitted between devices. The first area concerns proprietary data encoding used in many embedded devices built as single units operating with other embedded devices from the same manufacturer. In these cases, the data passed from one embedded device to another cannot be read by testing instruments that were not specifically made by the same manufacturer for testing the devices in question. Third party testing instruments often do not detect the presence of dates in data transmissions that are encoded in proprietary codes. Hence, if a date is not detected, the embedded device may not be tested according to the testing policies used by some large organizations.
The second area involves the windowing solution used in remediation. In windowing, a pivot year is defined to express the interpretation of 2-digit years that belong in the 20th or 21st centuries. For example, a pivot year of 1950 states that all 2-digit years in the range 50 to 99 belong in the 20th century and all 2-digit years in the range 00 through 49 belong in the 21st century. If a sending device uses 1950 as the pivot year, but a receiving device uses a different pivot year, say 1990, then problems arise in the interpretation of the 2-digit years transmitted as part of a date. The sending device may see 89 as belonging in the 20th century, but the receiving device may decide that it belongs in the 21st century, thus throwing any date and time calculations off by a wide margin. The effects of this problem may be exhibited immediately as a failure if future dates are involved in the computations.
The third area is based on the premise that repairs may have been made to an embedded device to correct its date and time processing, but the repairs may not have been made or may not have been made in a compatible way to a device receiving information from the repaired device. This can happen in systems where one embedded system controls or synchronizes the operation of other embedded devices, even if not all of the embedded devices have real-time clock calendars.
The effect is that data properly formatted to correct for the Year 2000 problem do not line up with the format expected by the receiving embedded device. For example, an embedded device may transmit a 4-digit year to a device that only understands 2-digit years. A corollary to this problem is the data offset problem whereby the last 2 digits of the year may appear to be valid to the receiving device, but the rest of the information is pushed off alignment by the 2 extra digits representing the century in the expanded date. This situation may not be caught immediately and may have long-term consequences weeks or months after the data becomes corrupted.
There are several layers in which date and time can come into play in an embedded system. These might include the application software controlling the embedded device, the interface between the real-time clock calendar and the operating system on the embedded device, and data transfers between embedded systems and other devices or external systems. An exhaustive check would include each standalone embedded device and all connected embedded devices or external sources of dates in an on-line end-to-end test. In some cases, such as in the interface between the real-time clock calendar and the operating system, special-purpose testers may be required.
THE PROBLEMS OF TESTING EMBEDDED SYSTEMS
Embedded systems testing is not an easy task to accomplish. Various factors play into this including the following:This last factor is especially pernicious since many embedded devices use real-time clock calendars that were developed during the 1960s, 70s, 80s, and early 90s when the date and time were used as a single string consisting of year, month, day, hours, minutes, seconds, etc. in the form YYMMDDhhmmss. Using this type of embedded device, calculating elapsed times required the use of the date in addition to time.Later devices provided the date in the form of an epoch or base date, and a counter of elapsed units of time, typically seconds, since the epoch. For example, with a base date of 01/01/80 and a counter with the value 31,536,000 seconds, we could compute the date as 01/01/81. Elapsed time calculations using the counter were performed with a straightforward subtraction of the start count from the end count. The date played no part in elapsed time calculations.
- Unknown embedded devices located in sealed units and components within components.
- Devices with known problems that have not yet been remediated.
- Difficulty in working on embedded devices because of the environment, such as those located within hazardous areas.
- Hard-wired embedded components that cannot be replaced due to design issues or lack of replacement parts.
- Firmware or software that has been patched, but not documented.
- Lack of source code for software used in the embedded device.
- Lack of a means of setting date and time, i.e., no apparent real-time clock calendar or no data entry mechanism.
- Date usage that is not apparent and consequently overlooked.
Embedded devices that do not apparently use dates in elapsed time calculations are being ignored in some embedded systems testing. This is a major oversight in the testing process since there are still embedded devices in existence that do not use the epoch and counter method, but the older date and time method.
TESTING EMBEDDED SYSTEMS
The elapsed time problem and the data transmission problems often cannot be detected in standalone device testing. Unless the tester has designed cases to test specifically for these situations, there is no guarantee that experimental end-to-end testing will detect these problems. The most accurate way to find Year 2000 problems in embedded systems is to perform on-line end-to-end testing. This is not likely to happen for several reasons.Because there are so many embedded systems in existence, not every system can be tested before January 1, 2000. In addition, testing individual or connected embedded systems and external sources of dates is a very complex proposition. The fear of damaging systems in an on-line test is probably the greatest deterrent to performing embedded systems tests, but there are methods that can be used to provide a sense of the risk involved in not testing. A suggested method entails triage by assigning a very high priority to those embedded systems in mission-critical or safety-critical applications.
A confusing aspect of the embedded system problem is that embedded systems with real-time clocks are often used to monitor and control other embedded systems that may or may not have their own RTCC. If the controlling system fails because of a Year 2000 problem, then the controlled devices fail by definition. The question remains, did the controlled device fail due to the controlling device failure, or did it fail due to a problem in its own RTCC? There is no way to know without testing.
Some guidelines to use in testing embedded devices and sources for dates include the following:
In any of these cases, if a date or real-time clock calendar, or access to either, is found, the next step is to proceed to remediation.
- Test embedded systems individually and also in concert with other embedded systems and external sources of dates in on-line end-to-end tests. In end-to-end tests, be aware of synchronization issues where multiple embedded devices are controlled by an external device or other embedded system. If complete end-to-end testing is not possible, individual subsystems can be tested to minimize any risk of system downtime.
- Physically check for the existence of a real-time clock calendar. We recommend that manufacturers' statements of Year 2000 compliance or readiness be used only as a last resort to make any determinations since a manufacturer's definition of compliance or readiness may not meet the requirements of a particular environment.
- Physical testing can be accomplished through several means appropriate to the device in question. This may be accomplished through external testing instruments, signal analyzers, or test software designed specifically to look for date problems.
- An indirect method of end-to-end testing involves setting machine test parameters and observing how the functioning of the machine changes after the embedded system senses these settings. If unexpected results occur, one or more devices may have problems.
- Embedded systems can communicate or interoperate with other devices or with external date sources, such as PCs, workstations, databases, user input, or LANs and WANs. The data transfers between the embedded system and the external devices, users, or systems must be checked to determine if dates are being sent to or from the device in question.
- Both mainframe and embedded systems should be tested including devices with identical model numbers, even if they were manufactured recently.
- Mainframe and embedded system date length compatibility should be tested though 10/10/2000. This is a primary situation in which modifying an embedded system by moving from a 2-digit to 4-digit year may cause problems in alignment of the data read by an application program. Experience in conducting embedded systems repairs found this situation to be one of the major causes of problems after fixes were effected.
- Different platforms may use different time and date formats and different methods of computing date/time measurements. Therefore, interactions between different types of platforms should be tested.
CONCLUSION
The task of finding, testing, and fixing embedded systems with Year 2000 problems is a complex issue. If an organization waits to perform these tasks after December 31, 1999, then the costs can be much greater. These costs can include repairing collateral damage to systems and equipment from cascading problems and the expense in time and resources needed to find the real cause of the problem. Since there is no way to determine what combinations of factors will actually cause a failure, it may be difficult to determine when a failure has actually occurred. If a determination can be made, it may be possible to fix the problem if repair parts and technicians can be located, and the environment is amenable to making the repairs.Testing embedded systems can be costly and time consuming, but it must be done. Not all systems have to be tested immediately. The priority should be placed on mission-critical and safety-critical systems. Each embedded system should be tested individually and in concert with interoperating systems and external sources of dates. Testing can be accomplished by looking for and physically testing existing real-time clock calendars, date processing routines in application software, device drivers that process dates, and dates from other external sources that may be communicating with the device under test through local and wide area networks. Applying the guidelines described in this article may give organizations a means of achieving a high degree of confidence in their systems.
Mike Cherry can be reached by e-mail at mc@CenturyCorp.com.
Gary Fisher can be reached by e-mail at gary.fisher@nist.gov.
-- Alerting The Crowd (WOW@POLLIES.SILLYPEOPLE), November 24, 1999
This is a damn fine piece of work. The most realistic, accurate, fully descriptive and easily understandable summary of the problem I have seen in my 18 months of Y2K research. Although it is not likely to change the approach of those responsible for the readiness of their systems, it may help to wake up some of the public who still don't get it. Any sane person with half of a brain who honestly dedicates 5 minutes of their full attention to reading this should be able to understand why we still have a very high likelihood of many system failures coming soon. The hard part is trying to find people who are still sane, half at least a half of a brain, and are willing to spend 5 minutes reading this. Nonetheless, it is a good one to print and give it one last shot with those who are at least willing to read it. Thanks.
-- Hawk (flyin@high.again), November 24, 1999.
This report that was just released (NIST reports to the Secretary of Commerce) and is the reason that Secretary of Commerce Daley JUST came out and stated the potential problem with embedded systems.He asked folks to check into their systems and make any corrections before 1/1/2000 as if it was a simple problem to correct.
Ray
-- RAy (ray@totacc.com), November 24, 1999.
It is a terrible piece of misinformation and cut and paste from outdated websites that has been posted in the receint past.It is a case of a politicial jumping on the Y2K bandwagon and having his IT write about systems he obviosly has no background in. Most of it is familiar, been around for years, and even seems to have a hint of Bruce Beach's 3rd clock involved.
The methods for finding and fixing the embedded systems are not realistic, or even reasonable.
I'm afraid there are going to be a bunch of "new" websites popping up now that every politician or entity jumps on the "awareness" bandwagon and wants to show that they are "hip" to Y2K.
Be prepared for lots of "new" reports that look familiarily like the stuff on old websites, even the stuff doomers have debunked.
Geeze, RTCC...Real Time Clock Calender? No such animal.
-- Cherri (sams@brigadoon.com), November 24, 1999.
Hawk,This is a damn fine piece of work. The most realistic, accurate, fully descriptive and easily understandable summary of the problem I have seen in my 18 months of Y2K research. It would help to have at least half a brain to see that it is cut and pasted and not understood in the least by the guy who wrote it.
-- Cherri (sams@brigadoon.com), November 24, 1999.
Hmmm, this information doesn't square with Factfinder's assessment. I guess it goes to show us that Factfinder just doesn't have all the facts.Cherri,
In the oil biz... I've got remediators out there telling me that people like you are just blowing smoke, because they've been out there doing the job. Lots of them will fail. No one knows just what the ramifications of failure will be. They expect at least some systems (maybe most or all) to shut down. One thing is for sure. You don't know what you're talking about. The above article is yet another example to prove it. Was this the article and info that prompted the Sec of Commerce to make his plea for more new testing??? I don't know for sure, but something caused him to do so. Meanwhile, Cherri have a nice rollover.
-- R.C. (racambab@mailcity.com), November 24, 1999.
Cherri,I have sent the following e-mail to Gary Fisher & Mike Cherry. If and when I receive a reply I'll post it here.
I am writing in regards to your article on NIST EMBEDDED SYSTEMS AND THE YEAR 2000 PROBLEM located at the URL site http://www.nist.gov/y2k/embeddedarticle.htm.
This article was pasted on a thread on the Ed Yourdon bulletin board. The following was an answer in regards to this article by a woman who is a proclaimed expert on embedded chips and systems. I would be interested if you could clarify anything for those of us who would like to know what the actual facts are on this complicated and highly debatable subject. It is a "wild card" in the Y2k scene and has either been over-hyped or under-played. Any light that you can shed on this subject would be greatly appreciated.
Anxiously awaiting your reply
Sincerely,
Cary McXXXXXXXX
Copy of Cherri's reply was attached
-- Cary Mc from Tx (Caretha@compuserve.com), November 24, 1999.
The report states: "We recommend that manufacturers' statements of Year 2000 compliance or readiness be used only as a last resort to make any determinations since a manufacturer's definition of compliance or readiness may not meet the requirements of a particular environment."Many folks out there are using vendor statements as their total embedded system remediation effort. Pretty scary
-- Brian Bretzke (bretzke@tir.com), November 24, 1999.
Cary:Please let us know what their response is. If they debunk Cherri's criticism, then it will let us all know that Cherri is someone not to be listened to. If not, then Cherri has credibility.
-- haha (haha@haha.com), November 24, 1999.
It would help to have at least half a brain to see that it is cut and pasted and not understood in the least by the guy who wrote it.-- Cherri (sams@brigadoon.com), November 24, 1999.
Do you have any idea what the NIST is? No, I suppose not. That's okay, though; all a reader has to do is read the article and then your response to see who is missing more than half a brain.
-- Steve Heller (stheller@koyote.com), November 24, 1999.
HaHa,I hope you realize what a dim-witted polly you are. If you don't have any thing of substance to add please refrain. For all we know Cherri rides on the back of a garbage truck collecting trinkets.
National Institute of Standards and Technology
NIST program questions: Public Inquiries Unit, (301) 975-NIST, TTY (301) 975-8295. NIST, 100 Bureau Drive, Gaithersburg, MD 20899-0001. NIST is an agency of the U.S. Department of Commerce's Technology Administration
EMBEDDED SYSTEMS AND THE YEAR 2000 PROBLEM
Gary E. Fisher Computer Scientist National Institute of Standards and Technology Gaithersburg, Maryland
"Here is all the Credibility you need.......
-- kevin (innxxs@yahoo.com), November 24, 1999.
Thanks Alerting...Question: Why is Cherri even acknowledged after all the info. we've read to the contrary of this denial polly position?
This is just another confirmation that embedded systems are a huge threat and the situation will take infinitely longer to fix than three days.
-- PJC (paulchri@msn.com), November 24, 1999.
Cherri,for us to take your above statement with any seriousness at all, you must provide us with your background bio, and write a critique essay of this NIST paper. With facts to back up your claims, not simply "this is old news, this has been debunked" etc. If it's old news and has been debunked, show us by whom and why.
-- (PutUP@r.shut.up), November 24, 1999.
All --First, let me state that *I* thought it was a pretty good article. Well written, and at a level that the 'non-geek' segment of society could understand. Wasn't loaded up with a lot of jargon.
Second, the term "Real Time Clock Calendar" may not be one used generally in the industry. This is apparently an attempt by the authors to include all the myriad devices that serve this sort of function.
Third, in response to Cherri's comments, I don't know if Cary Mc will get a response to his letter, but here is a little experiment you may try on your own. Just as a way for you to take your own measure of Cherri's worth as a 'self-proclaimed' expert.
Go to your favorite web browser. Find your favorite search tool. Using that tools methods for searching for *Exact matches including all terms given*, do a search for "Real Time Clock Calendar". Just using this, looking for an item that Cherri claims
"Geeze, RTCC...Real Time Clock Calender? No such animal."
I found over 500 entries for companies that include this term as an *EXACT MATCH* on their website. Most of the entries (I didn't read all, just 3 or 4 selected at random), are for companies that make chips.
This is a simple test, and you don't have to be a 'Real-time embedded expert' to do it. However, in my opinion, it absolutely gives the lie to Cherri's claims of 'expertise' in the area.
In other words, IMHO, she's blowing smoke up our butts, for reasons know only to herself. I sure fail to see why someone would do this. Or, I suppose, it could be a case of true ignorance. Sort of, "Well, *I've* done two projects and *I* never used no such thing, which, since *I* am *GOD* that no such thing ever got done, anywhere by anybody." Of course, that pretty much shoots down her claims of 'expertise' too.
-- just another (another@engineer.com), November 24, 1999.
PutUP,You said to Cherri, "for us to take your above statement with any seriousness at all, you must provide us with your background bio, and write a critique essay of this NIST paper."
I couldn't agree with you more. I'm very tired of Cherri's terse posts that provide zero facts and only arrogant innuendos of her "expertise". Its way past time that she begin to let everyone know why she is the "expert" and we should listen to her.
So Cherri, what are your credentials? Do they supercede NIST or the IEEE field's of knowledge and their IT industry prestige? If I hear no reply from you substantiating your credentials, I will be forced to come to the conclusion that you are only inflated by an ego that has no basis in reality, and that further posts regarding the Y2k embedded chip problem by you, must be totally discounted by all.
As I said before, if and when I recieve a reply from Gary Fisher and/or Mike Cherry, I will post it to the board.
-- Cary Mc from Tx (Caretha@compuserve.com), November 24, 1999.
I think the problem is that Cherri has been brainwashed by the no- problem crowd at deBUNGy. <:)=
-- Sysman (y2kboard@yahoo.com), November 24, 1999.
Here's an example of real-time clock calendar (RTCC) from a manufacturer at http://www.rigelcorp.com/51 5ctk.htm"The R-515JC also has CAN, Controller Area Network with the physical layer on-board. The board comes standard with 32K RAM and 32K EPROM, but will accept up-to 512K RAM and 128K EPROM. There is a Y2K Real Time Clock / Calendar (RTCC) which may be populated as well as a battery back-up system for both the RAM and RTCC. The processor supports 10-bit A / D, as well as having 48 I / O lines, 3 16-bit timers and a watchdog timer."
Cherri has wasted enough of our time. Lets move on.
-- (PutUP@r.shut.up), November 24, 1999.
You want Cherri to do a 'critique essay'??Give me a break.....what a double standard. Less than 1% of any of the stuff posted here by the extremist doomers in any way could be labelled 'critique essay' quality. Most of it is stuff along the lines of 'Y2k cannot be fixed.....we're all gonna die!!'
The fact is, most chips do NOT care about what calendar date it is. They do not calculate time differences by means of a calendar date. Any chip that did use this method would have to have an external means of setting the time or else it would be useless anyway.
Without an external interface to change the time, what good would it be? Would the chips internal clock be set to the correct time as it was being made in the factory.....wouldn't make sense......the factory does not know in what time zone the chip would be used in when they make it. Also, during down times, the chip would not function leading the internal clock to stop. Therefore it would not roll over to '00 until possibly a few months or even many years after the calendar roll-over date.
Very few chips or systems care about the actual dates. Some that do care are in things such as traffic lights and elevators that actually have some use in knowing the difference between a weekday and a weekend.
Certainly there will be some surprises and some exceptions that have to be dealt with. However in plants and factories where this could cause great danger they have always had to deal with with such potential problems......we've already seen a lot of explosions and fires for example at chemical and oil processing facilities so far this year.....and we will see a few more as a side-effect of Y2K. However it will not ALL collapse.
I've said it before and I'll say it again.......by and large, the professionals and engineers that know these systems are not freaking out about the rollover....they have some concerns and they are prepared to manage any problems just as they have done for decades. This extreme fear of embedded chips is not based on fact but rather is a smokescreen used by those that would champion fear-mongering as a way to try and 'prove' their end-of-the-world belief system one way or another.
Now rather than sulk off in the corner shouting 'Polly' to try and silence me, can anyone give me some specific examples of chips or systems in a chemical or oil facility that WILL fail because they use the calendar date in their calculations?? Good luck, you might as well try to find the elusive unicorn!!
-- Craig (craig@ccinet.ab.ca), November 24, 1999.
Cherri,
Are you ever going to write the essay for Cory Hamasaki that you promised us?
Sincerely,
Stan Faryna
Ready for Y2K? Got 14 days of water, food, way to keep warm and cook?
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001o cfCooperative Preps : Have you checked out the deals we can get on preps?
One time deal on a inexpensive grain mill
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001q SwWater filters for less than suggested retail
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001q T8Gas masks, potassium iodide, solar ovens, etc
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001q TKAladdins: the kerosene lamp for readers
http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001p 1v
-- Stan Faryna (faryna@groupmail.com), November 24, 1999.
Bottom line: Do NOT relax about preping, not yet.
-- Lilly (homesteader145@yahoo.com), November 24, 1999.
Cherri? Any non-perjorative, substantive rebuttals to the responses above? I, for one, am curious and still open-minded on this issue... -scott-
-- Scott Johnson (scojo@yahoo.com), November 24, 1999.
Cherri NEVERS offers ANYTHING of substance!!! NEVER!!! NEVER-EVER!!!! NEVER-EVER-NEVER-NEVER!!!!!!!! Only terse posts equivalent to, "It ain't so. 'Cause I don't want it to be so."
Plus, she appears to be the model for the expression, "ditsy broad".
-- King of Spain (madrid@aol.cum), November 24, 1999.
Kevin you dumbass:I'm looking to debunk Cherri, idiot. Read my post.
-- haha (haha@haha.com), November 24, 1999.
Uhm Jackass Kevin:By the way, where in my post did you see anything to imply that I'm a polly or I don't believe in preparing for possible Y2k problems? Where is that stated in my post? And don't try to change the subject. You're caught reading your own opinion into my post. Is asking a valid question indicative of a polly mentality? Or can't someone ask a valid question on this board, buttwipe?
Bottom line dear butthole Kevin: Don't put words into my mouth, jackass.
-- haha (haha@haha.com), November 24, 1999.
While the more pessimistic-minded are asking for Cherri's credentials, I will ask, politely, once again for R.C.'s credentials and that he "prove" them (in the same way that Dan the Power Man was asked for and provided proof).And R.C., I don't believe I had an answer to my query asking for clarification on you early August 1999 post about potential problems with the 9/9/98 date issue.
-- Johnny Canuck (j_canuck@hotmail.com), November 24, 1999.
HaHa........... cool your heels a little. The mere suggestion of Cherri having credibility at the expense of National Institute of Standards and Technology left you hanging. The majority of us here know the extent of Cherri's experience and knowledge can easily be copied on the back of a matchbook. by the way i'll take back the polly notion earlier............ ;-)
-- kevin (innxxs@yahoo.com), November 24, 1999.
Kevin:Ok, sorry. For the record, I found Cherri's post lacking in facts. I WANT to believe that things will go well, but Cherri certainly didn't prove her position too well. Hope that clears it up. Sorry for the name-calling everyone. And everyone have a great thanksgiving holiday.
-- haha (haha@haha.com), November 24, 1999.
[OK, I'll take a crack at this.Executive summary: Misuse of the date by embedded systems poses potential problems. The practical extent of this potential must be determined through proper testing. Some systems are difficult to test properly.]
INTRODUCTION
Embedded systems control much of the infrastructure of the industrialized world. If not properly addressed, problems related to the Year 2000 transition will degrade embedded systems and therefore have negative effects in the infrastructure. The most commonly recognized problem occurs when the date supplied by the real-time clock calendar (RTCC) contains a 2-digit year. This may cause a problem on or after midnight, December 31, 1999, when the date rolls over to January 1, 2000.
In those systems with this date problem, any software layer in control of the embedded system may interpret the 00 in the date as 1900, not 2000, which can potentially throw off time or date computations. 00 may also be rejected as an invalid year. In these cases, embedded systems with such problems may not operate as designed and thus cause control problems or malfunctions. If the embedded system is involved in a mission-critical or safety-critical system, there may be severe repercussions due to an embedded system failure. Proper testing is the most reliable way to identify potential embedded system Year 2000 problems. This article describes issues andpriorities for testing these systems.
[Translation: If the system has a RTC AND if the code uses the date from that RTC AND if the code uses that date incorrectly AND if the result is a functional (not cosmetic) problem AND if the system is critical AND if the functional problem is not easily addressed (depending on the details of the system and the failure), THEN we have a problem. OK, we know this.]
BACKGROUND
There are two primary areas where embedded systems typically have date problems: 1) the calculation of elapsed times, and 2) date information transmitted from one embedded system to another or to an external system.
[This might be misleading. Incorrect calculations might cause trouble. If date information is simply transmitted, that shouldn't be a problem. If the code receiving the transmission doesn't use it properly, that's just a miscalculation by someone else.]
The elapsed time problem centers on two methods for performing these calculations. One may use date information and the other may not. In method one, elapsed time is computed by subtracting a start time from an end time, e.g., 12:15 p.m. - 12:00 noon = 15 minutes. If the elapsed time rolls over midnight, then the start and end dates are also required to complete the calculation, e.g., 12/30/99 12:15 a.m. - 12/29/99 12:00 midnight = 15 minutes. At midnight on December 31, 1999, the rules change. In a 2-digit year, 99 becomes 00 and the computation is now 01/01/00 12:15 a.m. - 12/31/99 12:00 midnight = ? One of several answers may result depending on how the date computation was carried out. On a system programmed to recognize 00 as 2000, thecomputation may be performed correctly. On many embedded systems, this may not be the case. It is impossible to say what the answer will be since the program controlling the embedded device may not be available.
[Yes, very true. This is typically a one-time bug, since it happens only on the single calculation spanning the rollover *from the perspective of the embedded clock*. However, in most such systems the actual date is irrelevant, and never set. This all depends on the system. Many such elapsed-time systems don't even have a battery for the RTC, since the real time is not important, only the interval. The failed calculation can still happen, but *when* it will happen could be anytime within a 100-year period. So this isn't a y2k problem UNLESS the date is ALSO used for some other purpose that requires it to remain synchronized with the real time and date (such as logging).]
The second method of elapsed time calculation relies on an epoch, or base date, and a time counter, which is added to the epoch to arrive at a particular date and time. An example of this method is presented elsewhere in this report.
[But this approach doesn't really use the date, so doesn't necessarily present a danger.]
Another problem of Year 2000 testing stems from three areas where date information is transmitted between devices. The first area concerns proprietary data encoding used in many embedded devices built as single units operating with other embedded devices from the same manufacturer. In these cases, the data passed from one embedded device to another cannot be read by testing instruments that were not specifically made by the same manufacturer for testing the devices in question. Third party testing instruments often do not detect the presence of dates in data transmissions that are encoded in proprietary codes. Hence, if a date is not detected, the embedded device may not be tested according to the testing policies used by some large organizations.
[Confusing! Yes, date data can be encoded in proprietary protocols, difficult to interpret. But the problem doesn't lie with the SOURCE of the date, it lies in the USE of the date. We don't really care WHERE the date came from, since that's not where the error lies. There is no functional date error until the date is misused in a calculation or decision. Even if we can read the encoded date data, this is STILL no guarantee that the receiving device makes any use of it at all. So if there is a problem, it lies with the receiver's use of the date, not with the source or encoding of the date.]
The second area involves the windowing solution used in remediation. In windowing, a pivot year is defined to express the interpretation of 2-digit years that belong in the 20th or 21st centuries. For example, a pivot year of 1950 states that all 2-digit years in the range 50 to 99 belong in the 20th century and all 2-digit years in the range 00 through 49 belong in the 21st century. If a sending device uses 1950 as the pivot year, but a receiving device uses a different pivot year, say 1990, then problems arise in the interpretation of the 2-digit years transmitted as part of a date. The sending device may see 89 as belonging in the 20th century, but the receiving device may decide that it belongs in the 21st century, thus throwing any date and time calculations off by a wide margin. The effects of this problem may be exhibited immediately as a failure if future dates are involved in the computations.
[No, this doesn't stand up to examination. I think the writer is lost. Windowing is a technique for expanding a 2-digit year to a 4- digit year by applying a pivot year. In the above example it's explicitly stated that the transmitted date is 2 digits. Yes, it's possible that the transmitter expanded the date one way, and the receiver another. Since the transmitter is only sending a 2-digit year, there is no reason for the transmitter to window that year unless it USES that year for something. The receiver does his own expansion, and uses the year for something else. If either (or both) sides use an incorrect pivot year and expand improperly, whichever side does so risks an error. But the communication between devices is not involved in any way! If the transmitter windows the year and transmits a *4-digit* year, then the receiver expects a 4-digit year and there is no confusion (though the year may be incorrect). If only the receiver does the windowing, then the transmitter has no window, which perforce cannot be different!]
The third area is based on the premise that repairs may have been made to an embedded device to correct its date and time processing, but the repairs may not have been made or may not have been made in a compatible way to a device receiving information from the repaired device. This can happen in systems where one embedded system controls or synchronizes the operation of other embedded devices, even if not all of the embedded devices have real-time clock calendars.
[I think I'd need an example to know what that paragraph is talking about. Certainly if the sending device is modified to send a larger year size, the receiving device must be modified to *accept* this new size. Otherwise NOTHING will be communicated. It's a well known truism that an interface MUST be agreed on by both sides.]
The effect is that data properly formatted to correct for the Year 2000 problem do not line up with the format expected by the receiving embedded device. For example, an embedded device may transmit a 4- digit year to a device that only understands 2-digit years. A corollary to this problem is the data offset problem whereby the last 2 digits of the year may appear to be valid to the receiving device, but the rest of the information is pushed off alignment by the 2 extra digits representing the century in the expanded date. This situation may not be caught immediately and may have long-term consequences weeks or months after the data becomes corrupted.
[But not very likely! Most embedded devices are acting in real time, and corrupting an interface (such that communications are garbled) is almost always instantly obvious. Also, give some credit for at least minimum competence to those involved. Even a child understands that teamwork implies that everyone follow the same rules.]
There are several layers in which date and time can come into play in an embedded system. These might include the application software controlling the embedded device, the interface between the real-time clock calendar and the operating system on the embedded device, and data transfers between embedded systems and other devices or external systems. An exhaustive check would include each standalone embedded device and all connected embedded devices or external sources of dates in an on-line end-to-end test. In some cases, such as in the interface between the real-time clock calendar and the operating system, special-purpose testers may be required.
[Whoa! Beyond the truism that all parties must agree on a common interface, whatever code uses the date in a calculation or decision must window that date, either explicitly (by expanding it to 4 digits) or implicitly (with logic that says IF current time is before prior time, do something reasonable). The task is to find where the date is USED, and make sure rollover won't lead to a wrong calculation or decision. The on-line end-to-end test can be used both t0 FIND who uses the date, and to make sure all such uses are done properly.]
THE PROBLEMS OF TESTING EMBEDDED SYSTEMS
Embedded systems testing is not an easy task to accomplish. Various factors play into this including the following:
Unknown embedded devices located in sealed units and components within components.
Devices with known problems that have not yet been remediated.
Difficulty in working on embedded devices because of the environment, such as those located within hazardous areas.
Hard-wired embedded components that cannot be replaced due to design issues or lack of replacement parts.
Firmware or software that has been patched, but not documented.
Lack of source code for software used in the embedded device.
Lack of a means of setting date and time, i.e., no apparent real-time clock calendar or no data entry mechanism.
[While this is all true, no effort has been made here to address the incidence of such issues. How many systems are unknown? How much date functionality lies in hazardous areas? And so on. Listing all possible problems doesn't tell us how common they are. You could make a HUGE list of all the things you could run into if you fell asleep at the wheel. But making that list longer doesn't increase the odds of falling asleep!]
Date usage that is not apparent and consequently overlooked.
This last factor is especially pernicious since many embedded devices use real-time clock calendars that were developed during the 1960s, 70s, 80s, and early 90s when the date and time were used as a single string consisting of year, month, day, hours, minutes, seconds, etc. in the form YYMMDDhhmmss. Using this type of embedded device, calculating elapsed times required the use of the date in addition to time. Later devices provided the date in the form of an epoch or base date, and a counter of elapsed units of time, typically seconds, since the epoch. For example, with a base date of 01/01/80 and a counter with the value 31,536,000 seconds, we couldcompute the date as 01/01/81. Elapsed time calculations using the counter were performed with a straightforward subtraction of the start count from the end count. The date played no part in elapsed time calculations.
[Well, no. The actual RTC hardware has individual registers containing year (2 digits), month, day of month, hours, minutes, and seconds. The operating system or application decides what to do with the values in these registers. The OS or app can create a string, or a count since the epoch, or anything else it damn well pleases. HOW it does so is hardware-independent, a function SOLELY of the software. If the hardware year source ONLY provides 2 digits, and if the year is used in a calculation then that year must be windowed explicity or implicitly. And even if the epoch and a counter is used, the counter must be STARTED based on the 2-digit year from the hardware. Windowing cannot be avoided if the hardware only supplies a 2-digit year.]
Embedded devices that do not apparently use dates in elapsed time calculations are being ignored in some embedded systems testing. This is a major oversight in the testing process since there are still embedded devices in existence that do not use the epoch and counter method, but the older date and time method.
[As I said, the epoch method requires windowing to set the original second count when the device is powered up and the OS initializes. Second, the REPRESENTATION of the time/date is totally irrelevant to the USE of the time/date. How obvious it is that a given device uses the date has NOTHING to do with the internal bit representation used for the date. And third, whether overlooking devices that make nonobvious use of the date is a *major* problem depends on how common such "hidden functionality" is. And it isn't very common. The author here really doesn't understand what's going on under the hood.]
TESTING EMBEDDED SYSTEMS
The elapsed time problem and the data transmission problems often cannot be detected in standalone device testing. Unless the tester has designed cases to test specifically for these situations, there is no guarantee that experimental end-to-end testing will detect these problems.The most accurate way to find Year 2000 problems in embedded systems is to perform on-line end-to-end testing. This is not likely to happen for several reasons.
[Very true. If a device is connected to other devices, testing must be performed on all of the connected devices. This is what Gartner called a Large Scale Embedded System. Fixes may only apply to a small part of the system, but the entire system must be tested.]
Because there are so many embedded systems in existence, not every system can be tested before January 1, 2000. In addition, testing individual or connected embedded systems and external sources of dates is a very complex proposition. The fear of damaging systems in an on-line test is probably the greatest deterrent to performing embedded systems tests, but there are methods that can be used to provide a sense of the risk involved in not testing. A suggested method entails triage by assigning a very high priority to those embedded systems in mission-critical or safety-critical applications.
[Yes, this has been done. And not all embedded systems are anywhere near this complicated or interconnected. And type-testing widely used devices is an effective method, and need not be done in situ (which may be dangerous).]
A confusing aspect of the embedded system problem is that embedded systems with real-time clocks are often used to monitor and control other embedded systems that may or may not have their own RTCC. If the controlling system fails because of a Year 2000 problem, then the controlled devices fail by definition. The question remains, did the controlled device fail due to the controlling device failure, or did it fail due to a problem in its own RTCC? There is no way to know without testing.
[If ALL the controlled devices fail at once, I'd start looking at the controller. if ANY of the controlled devices continues to operate correctly, I'd look at the controlled devices that failed.]
Some guidelines to use in testing embedded devices and sources for dates include the following:
1.Test embedded systems individually and also in concert with other embedded systems and external sources of dates in on-line end-to-end tests. In end-to-end tests, be aware of synchronization issues where multiple embedded devices are controlled by an external device or other embedded system. If complete end-to-end testing is not possible, individual subsystems can be tested to minimize any risk of system downtime.
2.Physically check for the existence of a real-time clock calendar. We recommend that manufacturers' statements of Year 2000 compliance or readiness be used only as a last resort to make any determinations since a manufacturer's definition of compliance or readiness may not meet the requirements of a particular environment.
[Not my reading, exactly. Read the available failure reports or the warnings at the websites of the manufacturers. They explain what goes wrong under what circumstances. The don't try to define "compliance" or "readiness". They say, "This device does THIS". It's up to you to decide whether the documented failure mode is critical to you.]
3.Physical testing can be accomplished through several means appropriate to the device in question. This may be accomplished through external testing instruments, signal analyzers, or test software designed specifically to look for date problems.
4.An indirect method of end-to-end testing involves setting machine test parameters and observing how the functioning of the machine changes after the embedded system senses these settings. If unexpected results occur, one or more devices may have problems.
5.Embedded systems can communicate or interoperate with other devices or with external date sources, such as PCs, workstations, databases, user input, or LANs and WANs. The data transfers between the embedded system and the external devices, users, or systems must be checked to determine if dates are being sent to or from the device in question.
[Vendors are, despite what's said here, an excellent source of such information.]
6.Both mainframe and embedded systems should be tested including devices with identical model numbers, even if they were manufactured recently.
[Easy for them to say. And maybe, somewhere, there may be a case where there is a real bug and no way to distinguish the device from apparently identical devices. The industry practice is to clearly label every firmware version. Software gets slipstreamed.]
7.Mainframe and embedded system date length compatibility should be tested though 10/10/2000. This is a primary situation in which modifying an embedded system by moving from a 2-digit to 4-digit year may cause problems in alignment of the data read by an application program. Experience in conducting embedded systems repairs found this situation to be one of the major causes of problems after fixes were effected.
[I can well believe it. You don't change interfaces lightly.]
8.Different platforms may use different time and date formats and different methods of computing date/time measurements. Therefore, interactions between different types of platforms should be tested.
[Well, these are being tested just by normal operation, if formats differ. I guess this applies to changing whole platforms that communicate with other platforms. Changeouts at that level should NEVER be tested on-line, and test coverage must be complete.]
In any of these cases, if a date or real-time clock calendar, or access to either, is found, the next step is to proceed to remediation.
[Huh? You remediate if you find errors. The mere presence of a RTC is no guarantee of error.]
CONCLUSION
The task of finding, testing, and fixing embedded systems with Year 2000 problems is a complex issue. If an organization waits to perform these tasks after December 31, 1999, then the costs can be much greater. These costs can include repairing collateral damage to systems and equipment from cascading problems and the expense in time and resources needed to find the real cause of the problem. Since there is no way to determine what combinations of factors will actually cause a failure, it may be difficult to determine when a failure has actually occurred. If a determination can be made, it may be possible to fix the problem if repair parts and technicians can be located, and the environment is amenable to making the repairs.
Testing embedded systems can be costly and time consuming, but it must be done. Not all systems have to be tested immediately. The priority should be placed on mission-critical and safety-critical systems. Each embedded system should be tested individually and in concert with interoperating systems and external sources of dates. Testing can be accomplished by looking for and physically testing existing real-time clock calendars, date processing routines in application software, device drivers that process dates, and dates from other external sources that may be communicating with the device under test through local and wide area networks. Applying the guidelines described in this article may give organizations a means of achieving a high degree of confidence in their systems.
[First, most of what's said here is common sense. Testing is important. Start with the most critical systems. The actual bugs can be hard to find in Large Scale Embedded Systems.
However, NOTHING in this article addresses the scope of the problem, only the nature of the problem for the worst cases (most complex systems). We can't logically conclude that we're in Big Trouble because hard things are hard, without knowing:
1) How numerous are these hard problems?
2) How severe are the failure modes we're talking about? This depends on both the nature of the failure and the use to which the system is put.
3) How many such problems have already been resolved? We know GM's assembly lines are working now, for example. A lot has been accomplished.
The article makes some mistakes, but it's pretty good. Just bear in mind that the purpose of this article is to address the issue of testing embedded systems. A clearer understanding of the *nature* of (part of) the problem doesn't contribute toward any understanding of the *size* of the problem. Which is not addressed here.]
-- Flint (flintc@mindspring.com), November 24, 1999.
37 days.
-- Jack (jsprat@eld.~net), November 24, 1999.
Jack,
Fortunately, we don't have to fix Y2K. The calendar is SUPPOSED to change to the year 2000 after december of this year. It's the computers and programs and chips we have to fix. Y2K isn't broken.
-- walt (walt@lcs.k12.ne.us), November 24, 1999.
Cherri,This report is very coherent and consistent and it appears to based on scientific research of the problem, not just cut and pasted from other sources. Even if it is taken from other sources, does that make the information any less accurate? I don't think so!
I don't understand how you can so easily dismiss these facts without even attempting to prove them incorrect. It looks like you are trying to protect your own interests, or maybe you are being paid by someone else to debunk these facts.
Flint,
Of course you're right that no one really knows the size of the problem, but I think this statement says an awful lot...
"The fear of damaging systems in an on-line test is probably the greatest deterrent to performing embedded systems tests.."
..and it's why I believe there are a large percentage of companies who chose to cross their fingers and wait, rather than disrupt their normal flow of business by trying to find and test the embeddeds.
-- Hawk (flying@high.again), November 24, 1999.
Flint,Thanks for the excellent post! I really appreciate you taking the time to try to explain a very complex and confusing subject...
-- Deb M. (vmcclell@columbus.rr.com), November 24, 1999.
walt, I thought that everyone understood that generally the term "Y2K" -- especially on a forum such as this -- to mean The Year 2000 Computer Problem. However, recognizing that even at this late date there might be some really clueless people, let me re-phrase it Just For You:
Thirty-seven (37) days left in 1999.
-- Jack (jsprat@eld.~net), November 24, 1999.
Hawk:You make a good point here. Still, you need to understand some of the other facets of what we're dealing with.
I agree that all too many are crossing their fingers about some or all of their systems. But imagine that you get a notification from a manufacturer saying "You purchased N widgets model XXX from us. These WILL FAIL on rollover. Replacements/upgrades are available. Please contact us."
NOW, let's say you know those widgets are critical to your operations. Are you going to assume the manufacturer is kidding you? Manufacturers really are testing their systems AND hearing about test results. Believe me, I hear about failures, I report failures. There are lines of communication. No, these don't cover every system by any means. And there are customized implementations of embedded systems, so you need to know how you used the system, and whether you reprogrammed it.
But the point is, if a manufacturer has shipped 10,000 systems all of which have the same problem, we do NOT need to do 10,000 tests. We just need to get the word out. Firmware versions are usually clearly marked and mentioned in the word that gets put out.
The testing of complex systems in line and in place isn't nearly so common as this article might be interpreted to be saying. It applies mostly to unique or highly customized systems, which are both quite rare and usually critical. I think you overestimate the number of damnfools out there who haven't had any curiosity about whether their critical systems will stop working and maybe explode.
What bothers me is the philosophy I've seen that "hey, we're running 24x7 here, and downtime costs $5,000 per minute! Testing alone means downtime for days, because it takes a day to do the shutdown, a day to do the restart, and at least a day to do comprehensive testing. That kills the whole quarter right there. The manufacturer says there's no problem or the problems are minor. Skip it."
Sometimes, when you can't afford the downtime, you do have to cross your fingers unless the manufacturer tells you there's no hope.
-- Flint (flintc@mindspring.com), November 24, 1999.
Flint,Just noticed a problem with your critique. The report said this:
"The second method of elapsed time calculation relies on an epoch, or base date, and a time counter, which is added to the epoch to arrive at a particular date and time. An example of this method is presented elsewhere in this report."
And you responded:
[But this approach doesn't really use the date, so doesn't necessarily present a danger.]
-- Hawk (flyin@high.again), November 24, 1999.
Hawk:Yes, I didn't go into enough detail there (and I'm long-winded as it is!).
What we're dealing with is the calculation of an elapsed time. However, the RTC device (as I wrote) has registers containing seconds, minutes, hours etc. To avoid problems when minutes, then hours, then days (etc.) roll over to the next one, some code uses the entire date and time all at once, as one great big number (put together in various ways, but derived from these RTC registers). When code does that, and the year register goes from 99 to 00, the big number is MUCH smaller than the one you used a second ago! This can lead to problems.
Now, let's say you want to do something once a second. You don't really care what the date is, so long as all rollovers are properly handled. If your code is reading the RTC directly, all it needs to do is look at the seconds register, and keep a count of seconds. You don't really need to look at the minutes register when the seconds register changes from 59 to 00, because you're not using the *value* in the seconds register, you're only looking for a *change* in that value. It changes once per second, so you're cool.
All embedded operating systems I've seen, however, try to keep real time, including the date. To prevent rollover problems, they read ALL the RTC registers when they initialize, and convert them to a count of seconds since some arbitrary date (like Jan 1, 1970). When you ask the OS for the date, it *calculates* the date from this big count of seconds, working backwards. Each time the seconds register changes, the OS adds 1 second to this big count, and never reads the year register again. So when the year register changes from 99 to 00, the OS never notices. So the century change causes no problems in this case, since the "year" doesn't come from the RTC, but from a calculation based on total seconds.
The danger with this approach is if some kind of historical logging is taking place, and the system is powered down after rollover. In that case, the OS might become confused in calculating total seconds since 01/01/70, since it reads a year value of 00. In that case, the seconds count becomes scrambled, the calculated date becomes undefined (that is, depends on the nature of the OS confusion), and the logged values suddenly jump from 12/31/99 to God Only Knows. Hopefully, this logging information is used by human beings who can figure out what happened.
Bottom line: In such systems, the year is *derived* from a count of seconds, rather than read directly from 2-digit hardware. Is that clear now?
-- Flint (flintc@mindspring.com), November 24, 1999.
Flint --In your last post you stated something to the effect that all operating systems for embeddeds calculate the date. Are these the commercial operating systems, like Vertol or RTOS?
-- just another (another@engineer.com), November 25, 1999.
to the top
-- Old (timer@helping.out), November 25, 1999.
just another:What I said was, all embedded OS's I'm familiar with try to track real time (including date) by some method. I don't know exactly how they all do this internally. However, they cannot guarantee access to an RTC (which may not even exist in the system). Several (VXworks, QNX, MQX) that I know of require that you (the programmer) initialize the time and date in your board-specific init routine. The default is that it's 01/01/1970, and seconds (actually, milliseconds) start counting from there. So long as the system is powered up, a "date" is maintained by the OS. The accuracy (as opposed to the precision) of this time/date of course depends on the accuracy of the crystal providing the clock signal the OS uses for all timing purposes -- clock updates, time slicing, timer routines, etc.
-- Flint (flintc@mindspring.com), November 25, 1999.
Flint --Thanks. The question arose because most of the systems I worked on did not use a commercial operating system. (These were proprietary, custom deals which used basic 'loop' type of OS where the timing of critical operations determined the timing.) The ones I have worked on where there was a commercial system, someone else always did the clock stuff. I just used it. Unfortunately, it appears that you are in the same boat. Oh well. It was worth asking.
-- just another (another@engineer.com), November 25, 1999.