Questions - Ed Yourdon's New Article - Data Corruption

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

I read Ed Yourdon's article "Data Corruption: The Silent Y2K Killer" in October 20, 1999 issue of Computerworld. http://www.yourdon.com/articles/articlesummary.html

I find this y2k issue somewhat mystifying (I'm non-tech) so I was glad to see this article. It served as a good primer.

Ed Yourdon says "we'll probably eliminate most, if not all, of the visible bugs." I presume this means among companies that have remediated/tested.

But I'd be interested in hearing inspired guesses from anyone with Y2K technical expertise: How prevalent do you think that critical (massive, sudden, visible) problems with data corruption might be among those companies who have not verified/tested the remediation status of companies with whom they share information. 10% are likely to have problems? 20%? More? Less?

Also, a couple of questions...

In his article Ed Yourdon writes: "I'm not talking about data corruption that's massive and sudden -- e.g., a payroll system that runs amok and sets every employee's salary to zero. That kind of data corruption falls into the "visible" category, and we can assume that such problems will be attacked in the same way as a program whose Y2K bug causes it to abort suddenly."

Am I correct to think that bad data corrupts other data, but does not cause a system to abort?

How difficult and complex are the immediate fixes for critical (massive, sudden) bad data problems? How involved is it to put code through IV & V analysis? Is it a long laborious process?

Thank you any answers,

S.O.

-- S.O. (inquiring@minds.nom), September 23, 1999

Answers

* * * 19990923 Thursday

S.O.:

A kind of "hidden" data corruption that Ed refers to is bound to occur.

A simple example should do for non-tech types: ;-)

Different entities--say finance/credit (f/c) and university/finance (u/f)--share financial information via electronic data transmission:

Entity 'A' (f/c) uses ( system wide ) "window" range of 1975-to-2074. ( ergo, mm/dd/62 = mm/dd/2062! )

Entity 'B' (u/f) uses "window" range of 1951-2050. ( ergo, mm/dd/62 = mm/dd/1962! )

Entity 'C' (f/c) uses "window" range of 1950-2049. ( ergo, mm/dd/62 = mm/dd/2062! )

Unless Entity 'B' (u/f) knows that entities 'A' and 'C' (f/c's)--or any others they interface with, for that matter--are using incongruent "window" dating techniques, data from 'A' and 'C' will be unintelligible.

Entity 'B' MUST KNOW EVERY POSSIBLE EXTERNAL DATE 'window' used and program their systems/applications to re-interpret the INCOMING DATES CORRECTLY!!

Now expand this by magnitudes and you have an interesting DATE "management" problem for ALL EXTERNAL DATA INTERCHANGES in the "UNIVERSE."

Is this a mess -- or what?!? It's going on today.

Regards, Bob Mangus

* * *

-- Robert Mangus (rmangus1@yahoo.com), September 23, 1999.


99 days.

Y2K CANNOT BE FIXED!

-- Jack (jsprat@eld.net), September 23, 1999.

The typical solution is a "date sniffer" that runs through the entire database, perhaps in chunks so that you hit the whole database once a week, something like that. It knows the format for each record, and validates the dates it finds there. In any database system, each record has the transaction-identifier of the transaction or batch process that last updated it. When the date sniffer finds a bad date (odd year, month or day) it reports it, along with the transaction-id. You manually fix the bad records, and yank out whatever program caused the corruption.

I expect this to be SOP for most companies - the programming is simple and the payback is immense. If you don't catch those errors, they really will mess you up.

-- bw (home@puget.sound), September 23, 1999.


In case of a big visible problem you just restore from your most recent backup, which in most cases runs nightly. It's the hidden error, where you keep running and backing up bad data, where you have a real problem. A "date sniffer" could help I suppose but it would be hard to make one that can catch every possible subtle error. If you can figure out what the right result is in every case, why not just code that logic into the application itself so the error doesn't happen? Mostly I think you are stuck with finding obvious errors, like birthdates in the future.

-- Shimrod (shimrod@lycosmail.com), September 23, 1999.

Moderation questions? read the FAQ