On Dale Way and interfaces, a question...

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

I've been chewing on the Dale Way essay and related threads, and this question is uppermost in my mind right now. Some from the polly side argue that where remediated, interfacing systems have been put back into production, live testing in effect is taking place. The argument is that if the systems are talking to each other pre-rollover, they should continue to do so post-rollover. The protocols, or whatever you want to call 'em, are established and not going to change on Jan. 1.

But I'm a non-techie and a journalist who still believes in getting both sides of the story (no, I'm not going to use this anywhere; this is for my own consumption). So I'm asking any non-pollies out there to give their best argument, understandable to a non-techie, as to why the above argument doesn't hold water. Thanks in advance.

-- Thinman (thinman38@hotmail.com), November 08, 1999

Answers

Thinman,

".....where remediated, interfacing systems have been put back into production, live testing in effect is taking place."

Absolutely correct. Live testing of the systems, is taking effect. If you're testing interfaces, you're testing interfaces with a 1999 date. Whatever you're testing, you're testing with a 1999 current date.

If everything has -- indeed -- been fixed correctly, then there should be no problem in 2000. And, if it hasn't.........?

-- fatman (gotto@lose.some), November 08, 1999.


You might ask Mr. Way the details of his thinking. If you're a journalist, you might obtain his phone number and do it via phone.

-- Mara (MaraWayne@aol.com), November 08, 1999.

Picking up on fatman's comment, the assumption as I understand it is that once a system has been changed to handle dates in a new way, it will use that new way on both sides of rollover.

OK, I'll get out of the way now.

-- Thinman (thinman38@hotmail.com), November 08, 1999.


for the record/reference:

IEEE Y2K Chair Dale Way's original writeup http://ourworld.compuserve.com/homepages/roleigh_martin/end_game_criti que.htm

IEEE Y2K Chairman's Personal, Pessimistic Take on Y2K http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001fqh

Mr. Dale Way (IEEE)! Gary North and others are on to you Sir! http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001hHC

Mr. Way's IEEE article and the myth of Y2k compliancy http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001hM0

Interpretation of Dale Way's commentary on Yourdon's End Game article http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001gqd

Dale Way's Response http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001iAT

Dale Way http://www.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=001iXs

-- alan (foo@bar.com), November 08, 1999.


Thinman,

As one of the folks who made the argument about the ongoing live testing of interfaces between systems (I don't think Mr. Way ever commented on that one way or the other) the argument does hold water.

BUT... it's not the whole issue. A working interface is no guarantee that the systems on either end have been properly remediated. If your concern is that different systems have been remediated in different ways (expansion vs. windowing...), however, THAT is being tested now.

-- RC (randyxpher@aol.com), November 08, 1999.



Thinman and RC,

My level of technical system knowledge is zip, zilch.

"Absolutely correct. Live testing of the systems, is taking effect. If you're testing interfaces, you're testing interfaces with a 1999 date. Whatever you're testing, you're testing with a 1999 current date."

Does this mean systems, essentially, are not being tested for 2000, and we won't know until 2000 how well the remediation performs?

"As one of the folks who made the argument about the ongoing live testing of interfaces between systems (I don't think Mr. Way ever commented on that one way or the other) the argument does hold water. "

RC, I have looked, but can't find your argument in the archives. Are you saying that testing interfaces with 1999 dates is or is not effective testing? If you give me a link, I will read for myself what you have previously said about this.

I'm very glad Thinman posted this question. I've been curious about it too, but was too embarrassed to ask.

-- (new@this.com), November 08, 1999.


I meant to address my question to "Fatman & RC." (oops)

-- (new@this.com), November 08, 1999.

I am not a programmer, but the answer to the question seems to depend on how the program has been modified. I would think that if all dates in the database have been expanded to eight digits and the program has been modified to handle 8 digit dates, and the program interfaces with all other programs using 8 digit dates, then it could be assumed that it is being tested in production (assuming it is in production)

-- Dave (dannco@hotmail.com), November 08, 1999.

There are different levels of testing going on... lemme try to explain by example.

Company A and Company B each have their own independent computer systems, which communicate with each other via an interface (which is just the definition of the data layout being passed from one system to the other).

Both companies are internally remediating their software for Y2K - meaning making changes AND testing them in post Y2K mode - but their test systems are not hooked up to each other (logistically tough to do in a lot of cases). As the software is remediated, the updated software is put back into live use (noone's waiting until Dec. 31 to install remediated software).

So: 1) Each company has independently tested their software in post-2000 mode as best they can in the time they have.

2) Each company has installed some or all of their remediated software, so if during the process of remediation, somebody fouled up an interface, it would be apparent now, not in January.

3) LITTLE OR NO testing of both companies' systems together in post-2000 mode has been done. This is why even the most optimistic folks know there WILL be some Y2K glitches. My opinion is they will be at a managable level. Others disagree.

-- RC (randyxpher@aol.com), November 08, 1999.


Thinman, all you need to do to bring your thread back up the list is to add another post to it. It won't "die" as long as you keep posting to it, and you can always retrieve it from the archives.

-- (not@now.com), November 08, 1999.


OK, RC, we're getting close; bear with me. Regardless of whether Company A and Company B are remediating in the same way, we can assume that internally, each company's system will handle pre- and post-rollover dates in the same way, right? So if those systems are both live now and successfully interacting with each other, how could it fall apart after rollover? Are we getting into the area of data that's corrupted internally before it enters that smoothly operating pipeline?

-- Thinman (thinman38@hotmail.com), November 08, 1999.

Thinman (reposting to be sure you see this) --

It depends on how the system(s) have been modified. Where I work, some systems are in yyyymmdd format, others are in yymmdd where 500000 has been added to the existing date fields. Our software company went with this change since they won't support this system for more than 10 years anyway. They want all their users up on client server applications in the future (if there is one).

In the case of yymmdd format, 990105 (Jan 5th 1999) looks like 490105 in our databases, but when the date is displayed to the user, 500000 is added to give us 990105. Pretty nifty as long as birthdate fields or something like that aren't being used. So in this instance yes, we've been testing since January of this year. We don't expect any problems. It must be mentioned that we had to put this entire system up all at once. A few bad dates would have corrupted things. We had many key's that had to be rebuilt since many keys entailed a date embedded within itself. It was a real pain.

Other systems are in yyyymmdd format. Any data that is accepted still comes into my system in yymmdd format. However, the program that accepts this data has been modified to accept it and expand it to yyyymmdd (windowing), so that it reflects the proper format for my system. So again, we've tested this for 2 years now and it has been running since. We expect nothing new in 2000 either.

The problem comes into play when you're doing windowing exclusively (i.e. data still looks like 990105 in the database). Your subroutines or whatever you're using will not execute until 000105 because the condition hasn't been met (if > 50 then 1950 else 2050 for example). It would only occur if you tested this by rolling dates forward and you actually tested everything using 2000 dates. So in this example if you haven't rolled the dates forward and tested everything, then certainly you can't say everything is fine. A missed subroutine call can produce bad data after 2000 but not before then.

I think a good question to ask now is what method is being used by most companies out there. I keep hearing windowing. But which method of windowing? Adding 500000 or testing for > 50? If the answer is the latter, then as you can see we aren't testing this live.

Perhaps this is why programmers can't agree on much. There are many ways of doing things. Cripe, I got a Java programmer here who thinks y2k is overblown. But you know what he does? He's a young multimedia programmer and he uses few dates. So he sees only his world. I'm not a multimedia programmer. I'm a business application programmer and I use dates everywhere. So I see things entirely different.

I hope this has helped.

-- Larry (cobol.programmer@usa.net), November 08, 1999.


Re: "The argument is that if the systems are talking to each other pre-rollover, they should continue to do so post-rollover. The protocols, or whatever you want to call 'em, are established and not going to change on Jan. 1."

This is false.

First, real time systems (embedded systems) are fundamentally different from information systems, so let's dispose of the, simpler to discuss, real time systems first.

Real time systems operate in the "now" and a small amount of time (seconds/milliseconds) before now and predicted now. When they fail, in general, they fail at the rollover or sometime after the rollover in weird and wonderful ways. Some stop, some destroy their hardware, some have buffer problems, some stop 28 feb, some cause processes they control to do the wrong thing. Generally, you do date startup and rollover testing to find out what they do.

Information systems are different. While they have real time components (hardware, operating systems, Uninterruptable Power Sources), they also have information which includes time information. My experience in "event horizon" analysis is that the failure distribution for these systems is ususally a normal (bell curve) distribution centered immediately after rollover.

Information systems do computations and comparisons on time and then perform actions based on those computations and comparisons. They have past time (transaction date, birth date, request date, etc.), current date (cycle date, clock date, etc.), future date (required date, reorder date, etc), and time intervals (days receivable, order to ship time, etc.) So they compute and compare on all this stuff and then make decisions, like issue cancellation transactions across interfaces or reorder, or reject transactions. Results of logic before Y2K is often different from logic after Y2K.

So, You have the following problems:

Non-expanded dates can handle the before Y2K problem but not the after y2k problem in the same piece of logic.

Expanded dates can cause problems at interfaces as we are coming down to the wire and an expansion problem does its data conversion and goes live but the other side of the interface doesn't make it or misprocesses.

One system as a result of Y2K bugs sends a valid but improper transaction to the other (e.g. cancel all my purchase orders).

One last one. They fix the applications but forget to fix one of the associated real time systems (HW, OS, UPS, etc.) and they take down the system catastrophicaly.

Many more cases exist. Expect a big bump at rollover and then a whole bunch of nasty surprises for 3 months to a year.

-- ng (cantprovideemail@none.com), November 08, 1999.


"Regardless of whether Company A and Company B are remediating in the same way, we can assume that internally, each company's system will handle pre- and post-rollover dates in the same way, right?"

That's the goal. That, in fact is the very definition of "remediation". The reality is, given the time constraints and the 90-10 rule (the last 10% of bugs take 90% of the time to fix), nobody will have fixed *every* bug.

"So if those systems are both live now and successfully interacting with each other, how could it fall apart after rollover? Are we getting into the area of data that's corrupted internally before it enters that smoothly operating pipeline?"

Yes, more or less. When a bug hits System A, the best case scenario is that an internal error occurs, and System A kicks the transaction out. The worst case is that the bug causes a subtle calculation change that gets passed on to System B, which treats the transaction differently due to that error, and passes it on to System C...

Fortunately, in my experience, Y2K bugs almost always lead to internal errors, not computational errors.

-- RC (randyxpher@aol.com), November 08, 1999.


"Non-expanded dates can handle the before Y2K problem but not the after y2k problem in the same piece of logic."

This is the stuff remediation is meant to fix.

"Expanded dates can cause problems at interfaces as we are coming down to the wire and an expansion problem does its data conversion and goes live but the other side of the interface doesn't make it or misprocesses."

This is the stuff that won't wait until rollover to go haywire.

"One system as a result of Y2K bugs sends a valid but improper transaction to the other (e.g. cancel all my purchase orders)."

These are the bad ones - but again, in my experience, Y2K bugs don't tend toward this kind of error. I mean, I could see it playing hell with interest calculations and that sort of thing, but companies doing that kind of work had to deal with post-2000 dates a long time ago.

"One last one. They fix the applications but forget to fix one of the associated real time systems (HW, OS, UPS, etc.) and they take down the system catastrophicaly."

I would say most companies have testbed versions of their real time systems hooked up to their testing regions.

You've missed what I think will be the most common effect of Y2k bugs. My feeling is that most systems will have some percentage of input that will just get kicked out as data errors. 5%, say, of their input data just will not go through the system, and manual workarounds will be needed to take up the slack.

Your actual percentage of errors, and your success at working around them, will determine how well you survive the rollover.

-- RC (randyxpher@aol.com), November 08, 1999.



To Thinman,

To begin, there is a lot of remediated code running today that will continue to run fine into the next century. I don't believe that anyone with a good grasp of computing will argue with that point. However, the real rub comes in when it is impossible to find out if everyone has indeed really remediated their programs. I wish I could find out the truth it would greatly help me either work harder or sleep better.

Over the past thirty-two years I have worked in almost every part of the computer spectrum. I have done single board computers, networks, mainframes, minicomputers, etc.... you get the picture I'm sure. I've been there and done that. Now with that said I will also say that it doesn't make me an EXPERT in any way. Computer technology has changed so much in the past thirty-two years that what I knew ten years ago is out of date today.

Now to the subject being discussed. Many years ago I was working on a computerized production control and monitoring system for a large firm. The monitoring part of the project was to scan 440 sensors approximately every two seconds and keep track of production, operator efficiencies, and machine down time. As you can imagine, the Computer we were using (IBM System 7) needed to be extremely fast and flexible. In order to actually do all of these functions we used both the date and time to track events from the system. Seem unreasonable to you? Well when a machine crosses the threshold between today and tomorrow (midnight) and one scan was yesterday and the next is today you just might need to know that. All of our scans were queued since we couldn't actually keep up with that scan rate when other functions of the computer might need to be used (time & attendance recording and reporting, etc..). We processed these queued events as rapidly as possible to make them as near "real-time" as we could. Therefore the date was used in a "real-time" system. Many systems have to do it this way because of the midnight threshold crossing. Therefore dates can affect many things that you might not expect. In our case no lives were at stake but the companies production and tracking would be messed up. Not a light thing for the company as they rely on that information to bill their customers for their products.

Remediation needs to be done for all systems whether or not they control something big or small. Remediated code that is running and has been tested is great. But computers fail in strange and totally unexpected ways. My greatest fears lie in my inability to find out how much and how good the remediation has been. A known problem can be worked on but an unknown problem cannot be. Many people are working like crazy to make sure it is a non-event. I respect these people because I do know what it is like to do that type work and am doing it myself. But the one thing my years of experience has taught me is that these workers are probably not getting the support from the management to adequately address the problem.

Good luck to you in your search for answers.

wally wallman

-- Wally Wallman (wally_yllaw@hotmail.com), November 08, 1999.


Thinman --

I'll weigh in here with my $.02 worth. Each of the above responses has merit, particularly wally's.

As to the efficiency of 'on-line testing' of interfaces, it depends on the test methodology. If they are doing what I expect they are, the software is just running 'as is', that is, they aren't really 'testing', just operating.

What this means is that they are *not* exercising the code. Thus, there will be many, many lines of code which are not 'tested'. That means that any failure will likely be going through legs of the code that have not been exercised within the constraints of the 'total interface' including the rolled-over date.

On the brighter side, the remediation methodology on both sides of the interface *will* have been validated by running production. This means that at least the remediation methods are known to be compatible. (Ensuring that there aren't kproblems such as one side using 'windowing' with the >50 method, while the other corrects for 4 digit years.)

(As usual, squarely on both sides. ;-))

-- just another (another@engineer.com), November 08, 1999.


Just checking in before staggering off to bed. Thanks, everyone, for a civil, thoughtful thread. I even understand some of it. Just another, your post makes a lot of sense, especially the part about "exercising" the code.

-- Thinman (thinman38@hotmail.com), November 08, 1999.

just another says:

"As to the efficiency of 'on-line testing' of interfaces, it depends on the test methodology. If they are doing what I expect they are, the software is just running 'as is', that is, they aren't really 'testing', just operating.

What this means is that they are *not* exercising the code. Thus, there will be many, many lines of code which are not 'tested'. That means that any failure will likely be going through legs of the code that have not been exercised within the constraints of the 'total interface' including the rolled-over date."

Which is absolutely true - but which also brings up another application of the 90-10 rule: 90% of your input runs thru 10% of your code. This is why it's possible that a system which is only 80% remediated could successfully process 98% of its inputs. Typically, much of the size and complexity of a given program/system is there to handle the unusual situations, not the usual ones.

-- RC (randyxpher@aol.com), November 09, 1999.


RC --

Right you are. And a good point. But my point on that is that if there are problems, this is one spot where they will crop up.. in the stuff that is supposed to handle exceptions. Which may make the spotting of, and treatment of, those exceptions a bit dicey.

Thinman --

Glad it helped.

-- just another (another@engineer.com), November 11, 1999.


Moderation questions? read the FAQ