Implications of the Bell Curve - Event Horizon Analysis Misleading the "Fix on Failure" Crown

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Over the last few years of doing event horizon analysis on business systems we've plotted events on a time line and it normally looked like a bell curve a bit "spikey" at the rollover with real time things failing. Sort of like this (in quantities)

1 2 4 8 25 50 25 8 4 2 1

It occurs to me that this is misleading the "fix on failure" crowd. You see there are 3 clases of problems (other than anomolies like leap year Feb 29).

* There are problems that occur before 2000 because of dates in systems after 2000.

* There are things which fail the rollover test - tend to be talking about real time world here.

* There are things which fail after 2000 as a result of dates in the system before 2000.

Interestingly enough lots of code causes one problem before 2000 and a completely different problem after 2000.

So what the FOFers have been seeing is a gradual runup in which they often have been able to fix as they went along: 1 2 4 8. They expect the same approach to continue to work after 2000. I contend this has two errors:

1. The spike at 2000 is normally significantly higher then what we experience in the runup.

2. Most of the logic errors happen over a period of time, as opposed to periodic milestones (such as file purges). As a result, if you look at when you first experience an event as opposed to when they start and stop it looks "different". Something like this:

1 2 4 8 25 85 1 2 2 1

The problems are going to hit almost at once at and immediately after the rollover. Let's say that 40 of that set of things that were in the original curve (1 2 4 8 25 50 25 8 4 2 1) were the infrastructure things which constituted the "low hanging fruit" lots of organizations fixed. Things like PCs, Operating Systems, UPSs, Desktop applications, etc. That still gives you a curve that looks like this:

1 2 3 8 25 45 1 1 1

So the FOFers are dealing with the 25 now and about to get hit with twice as many before the 25 are sorted out.

Interesting times.

-- ng (cantprovideemail@none.com), December 13, 1999

Answers

Good point on the purges. Worst code I ever worked on was purge code. Got so bad at one job that we didn't actually purge when we said we did - we just detached the pointers, left the records in place, and ran that way for a month or two after the "purge", listening for outcries. When we were SURE the users were happy, we deleted the data.

When two-digit years mess up a purge, you're gonna see some fast fast coding on those purge programs. Watch the apps that are space-tight, and decide for yourself how much testing they're going to get. And perhaps they won't have the ability to retrieve, after they realize that not only did they delete everything before 1998, say, but also deleted all the records from, oh, January to June of 2000.

Yeah, FOF is going to be messy messy messy.

-- bw (home@puget.sound), December 13, 1999.


Hey Bw, do you live near the Puget Sound? I am in this area and it would be great to talk to someone in person who GI. My e-mail is false but I can post a real one if you would contact me. Thanks.

-- Vincent (Vincent@manof.god), December 13, 1999.

Moderation questions? read the FAQ