What do you think of Cory's thesis that month-end, quarter-end, and year-end batch processing are the real places for y2k to begin to wreak havok, visible or not?

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

As I understand it, Cory's position is (and always has been) that the mainframes do the heavy lifting for the Fortune 1000, that batch processes on these mainframes are likely to suffer y2k degradation, that the problems will begin to surface around the significant processing points of month-end, quarter-end, year-end, that the problems may or may not be detected and/or hidden from the public, and that a percentage of the Fortune 1000 will merge or go out of business as a result.

Is this a real possibility? If so, why? If not, why?

Answers like, "No (or Yes). Because I say so.", though often interesting to read, will not be considered helpful.

George

-- George Valentine (georgevalentine@usa.net), January 11, 2000

Answers

George

It is most palusible. Consider the % of remediation work done and the amount that was adequately tested.

When you finally figure someting out, it has changed and so have you. I suspect that Y2K is a non-factor could change dramatically when it is least expected.

-- Ishkabibble (ishman@home.com), January 11, 2000.


What about Brazil and Italy, both of which, supposidly did very little remediation?

-- Earl (earl.shuholm@worldnet.att.net), January 11, 2000.

Before anyone goes too far out on a limb with any predictions, remember:

New York and a couple other states rolled over to FY 2000 back in April of 1999;

46 more states rolled over to FY 2000 July 1 of last year; and

Da Fedz have (presumably) been working in FY 2000 since last October 1...

-- I'm Here, I'm There (I'm Everywhere@so.beware), January 11, 2000.


Accurate. Cory's supposition that mainframes were/are the problem, that embedded chips/systems were less an issue, may be correct, buthas yet to be proven. Embeddeds are still failing. They may not manifest for some time. Software can fail, but manifest no 'problems' until exercised in the real world. It has to do with buffers, behavior, latency, and the scope of the program.

Now, mainframes on the other hand, are batch critters. And the critter watchers deep in the bowels of gov't or fortune 5000 companies will see the problems.

There is one other factor indicating that Mssr. Hamasaki is accurate. It has to do with reality checks and statistics. It turns out, just on calculations alone, that greater than 60% of all the business calcs are still done in batch mode on big iron. So statistically that is where the problems are most likely to surface.

As the gov't and media are currently demonstrating,
there are three kinds of people in this world,
those who can do math,
and those who cannot.


-- pliney the younger (pliney@puget.sound.snowing), January 11, 2000.


Cory's position is the one I began with in 1993 or 1994, when I started Y2k remediation. It was 1997 before I ever HEARD of things like lights-out at rollover, and I thought it was some kind of hoax.

The two key points are: Big iron runs the world, and big mistakes take time to surface.

People think PCs do more than they do, but many PCs are just GUI front ends for massive big iron systems. A data migration on a mainframe can take months; proper testing can take the same. There is just no way the big systems have all been fixed, though the ones I've been exposed to are in good shape. I've been lucky enough to work for GI companies in this process, but I believe the horror stories because they sound like things I've seen in my 25+ years as a programmer.

Big mistakes take time to surface. There are two main kinds to watch for in mainframes: Batch runs that blow up, and data corruption. Runs blowing up can be delayed and rerun the next day, or the day after. Batch mainframe systems use a data file to hold the "current" date, and all the batch runs get their as-of date from that file. You can delay batch runs by days or weeks, waiting to finish aborted runs for a given day, and you manually advance the as-of date when you are happy with that day's runs. Then you go on to the next day's work.

This means batch people don't work in "real" time, like embeddeds and the rollover-related problems. So a low-level manager can delay a set of runs for a day or two on his own, and as the problem gets more serious he can get buy-in from his boss to delay a bit more. That is, the longer the daily reports are delayed, the higher the mucky-muck has to be who ok's the delay. Batch runs can be totally hosed for a week or more before the CIO or CEO hears about it.

After this delay, the CEO can delay publicity for another period of time, while all hope for a miracle from the struggling programmers. So we are just now entering the window where we would expect to hear trouble reports from companies that had batch failures in the first few days of January. That's why I say we won't know how bad the damage is until March or so - that's how long it will take for the January end-of-month processes to become undeniable.

Data corruption is potentially worse, because the victims may not know they have been hit until they are fatally wounded. To keep track of this corruption takes some fairly sophisticated data-sniffing programs, running and rerunning through databases, looking for bad fields or for totals that suddenly stop cross-footing.

If the corrupted data is extracted once a month, say, and you do backups every night using a set of 7 tapes, then by the time you find out you're corrupted, you've over-written all the backup tapes that had any valid data on them. Your company is now doomed, with no chance to recover. (Good backup strategies rollover tapes on a series of intervals, to avoid this. But then, good programming would have caught the problem, too.)

We are hitting the iceberg now. All we know so far is that the lights stayed on.

-- bw (home@puget.sound), January 11, 2000.



I wrote in dozens of posts throughout last year that I believed this to be correct and that the remediation criticality with mainframes dwarfed all others. I even said, on occasion, that I would happily trade ALL PC remediation for mainframe remediation (to indicate the relative importance).

"You can look it up" and regulars will remember. This AIN'T Monday Morning quarterbacking and it has been the over-riding reason I still expect an 8.5, as well as why it is very premature to declare Y2K victory.

Mike Mulligan and his mainframe steam shovel still run the world. It may well be they're doing just dandy -- if they are, we wouldn't hear. If they aren't, we won't hear until news from the basement percolates upward over the next two to three months.

-- BigDog (BigDog@duffer.com), January 11, 2000.


George:

I have no doubt that there will be some problems with week-end, month- end, year-end processing. However, batch jobs have been running nightly for a week now. The great majority of the clients at which I've worked had daily runs, weekly runs, monthly runs, etc., and the weekly runs were compared to the daily runs, the monthly runs compared to the weekly runs, etc. They all "build on each other". In this way, if data is important to the workings of a firm, someone from the user department is looking at these reports and verifying accuracy.

Of course I've seen data get corrupted on mainframes many times before. It isn't always easy to determine when the bad data entered the system, but each day the data is backed up before batch processing begins. If day one's nightly run was fine, but errors were noticed on day 5, the batch jobs are rerun starting with the first good day (day 1 in this example), and run again including the 2nd day, etc. until the error is located. When the error is located, the original problem is fixed and a quick one-time program written to correct the data.

Of course this assumes that firms have done the appropriate backups, but firms don't get to be Fortune 1000 companies by taking chances on loss of data.

Bottom-line is that I disagree with Cory on this one. I DO think there will be problems, but hours of hard work by IT team members will correct them. In respect for fellow team members, I'd encourage all programmers called in to fix a problem off-hours to bring along a toothbrush, toothpaste, soap, deoderant, a change of clothes, and a can of vanilla air-freshener. Also, leave the kids food and money...even if you think you'll be back in an hour.

-- Anita (notgiving@anymore.com), January 11, 2000.


The company I work for had a job with a 7 day lookahead that started the data corruption thing on 12/25. An ommision of a return code check on a subsequent program in the job, kept the issue from being seen until the month end jobs were run on or about 1/5. The month end job blew and ultimately the culprit was identified as the other job. We were able to resolve the almost 2 weeks of 'bad data' without having to do a post rollover restore from a pre rollover backup (a thought that made our DBAs cringe). That one episode is now resolved, but are there any (many) more lurking in the periodic reports and jobs. At this point, we haven't a clue....

-- BH (bh_silentvoice@hotmail.com), January 11, 2000.

Programs blow up, programmers fix them. Data gets corrupted, somebody notices it, it gets fixed. Haven't we all been saying that if the iron triangle stays up we can fix the rest? The crisis is over.

-- Amy Leone (leoneamy@aol.com), January 11, 2000.

Right, Amy, boats leak a little, get patched by a guy sticking in a plug, life goes on. Boats leak a little more, get patched by a crew welding on a plate, life goes on. Boat leaks too much for the available crew to fix, boat sinks.

There's no question that programmers fix problems. (Cause them, too.) The unknown is the number of problems which are going to surface in the next few weeks, and the number of programmers needed, and the economic hit that results. We are still facing a potential depression, here.

-- bw (home@puget.sound), January 11, 2000.



Amy:

I so envious that you were able to succinctly state in 3 sentences what took me 4 paragraphs. Geez...I'm suffering from the verbose "meme."

-- Anita (notgiving@anymore.com), January 11, 2000.


George, Mike, I want you both to know I am a computer expert. I know where the on button is. I know where the off button is. There! You see, I am qualified, right?

Seriously, I truly appreciate this information from each of you, in language even I can understand without difficulty, since you have both stated your opinions in a fashion that is logical and makes sense.

This only goes to reinforce my "gut" feeling that a great deal of quick-fix was done to keep the lights on and the wheels turning, the basic problems haven't been taken care of and are still lurking.

No doubt you're both aware of the hundreds of reports being posted of minor problems, none of which will bring the world to an end, but are none the less difficult for the people experiencing double billings on their credit cards, credit erased, etc.,etc. All of which indicates to me that there are not only superficial problems, but major problems underlying the systems.

I thank you both for these observations. Now, I'll just sit back and wait for the "flames."

-- Richard (Astral-Acres@webtv.net), January 11, 2000.


Regarding the "other countries did nothing and had no problems" argument... I had a very interesting conversation with my girlfriend last night. Tina is originally from Athens, Greece (she's lived in the States for ten years). Her dad is a physician, and she said that until just a COUPLE OF YEARS ago, he used to come home with a big stack of cash every payday. Yep, that's right: he, a respectable doctor, was paid in cash. She said that, when she left Greece in 1989, almost nothing was computerized... certainly very few of the things that people deal with in everyday life. It has only recently begin to change.

So, I am more and more of the opinion that the examples of Second and Third World countries have very little to do with the Y2K "story" here. That's not to say I expect any sort of catastrophe (although I think the "big iron" issue is probably still an open question)... I just mean that it seems clear the remediation work in developed countries was crucial, even if it wasn't so vital in some other places, or perhaps didn't have to be carried out to nearly the same degree. It's baffling to me that all the big brains at Gartner, the State Department, and elsewhere didn't understand this... but then again, hardly anyone got this issue exactly right.

-s-

-- Scott Johnson (scojo@yahoo.com), January 11, 2000.


Something most of us said when nothing big happened during rollover is "How'd they get it all fixed in time?". One thing to consider is the priority level given to different applications and different systems. Some of the programs within a system are used daily, some weekly, some monthly, and some yearly. If you are managing a project to remediate a large number of programs, you are going to prioritize your list of modifications so that the most used programs get done first. If, as some of us suspect, some companies didn't get all their programs fixed by January 1st, it is likely that the month-end and year-end programs were the ones left until last. From that perspective, Cory's premise that problems will tend to crop up at month-end and year-end is likely correct.

-- Coder (Coder@Work.Now), January 11, 2000.

George, bw, Big Dog,

Some six or eight threads below, I ask Ed Yourdon (and others, including Big Dog) a couple of questions concerning February 29, world wide shut-down requirements, etc.

Would you please input your thoughts to that thread? Thank you

BTW, the argument that second and third world countries didn't need as much remediation (and testing!) is not true. Seventy to eighty per cent of the world's code is outside the US. South America alone has more than one thousand main frames. If big iron is the problem you can expect problems EVERYWHERE, not just the US.

Take care

-- George (jvilches@sminter.com.ar), January 11, 2000.



Sorry guys. The thread asking Ed Yourdon and others about February 29 concerns, etc. is not 8 threads below but rather 18 (eighteen) threads below.

Sorry

-- George (jvilches@sminter.com.ar), January 11, 2000.


Lots of mainframes are out of the US, and in many cases those are hand-me-downs. I remember more than once that when a new mainframe came in our door we shipped the old one (including whatever manuals we could find) to a reseller who shipped them down to Venezuela. We had a good laugh over it, but it was all they could afford. Hope they're more current, now.

I don't think 2/29 calls for any shut-down, because it's not an embedded/rollover situation. The only danger on 1/1/2000 was embeddeds and real-time systems (mainframe or PC) that have to make real-time decisions. I just don't see the risks for 2/29, but I have an open mind on the issue. There will be batch systems that fail, but the numbers will be low and I suspect the problems will be quickly fixed.

The big dangers are in database corruption (which might be happening now) and in the EOY, EOM, EOQ jobs that will run in the coming weeks. As someone said elsewhere, many of these can be rerun as bits and pieces get fixed. That's true, and it illustrates the best-case batch failure.

The worst-case is when your EOM run shows you that you have a bad YY-related sort sequence in a database, which will take a couple months to unload/restructure/reload. In that case, the January EOM runs might be your first notification that your computer is going to be out of action until the end of, say, March. That's how companies die.

It might not take many companies (each hitting this wall over a short timespan) to spook the market, popping the bubble, starting the crash that seems overdue. Once the infrastructure stayed up, if we were otherwise healthy, we'd have been fine. But we are very fragile, in this bubble, way deep in debt, and even a small Y2k downturn can be enough to destabilize the economy.

The risks remain very high.

-- bw (home@puget.sound), January 11, 2000.


Oops. The embeddeds and real-time systems weren't the ONLY thing that could break on 1/1, but it was what the $40M command centers were watching. They announced success about 20 minutes into the new year, and the only thing that had succeeded at that point was embeddeds and real-time stuff.

-- bw (home@puget.sound), January 11, 2000.

Oh probably. But why speculate? No really, why? Unless you've got a lot invested in the stock market (which has inexplicably recovered from its slump, just like it inexplicably does at the start of every year), or are contemplating going self-sufficient (not just "prepped"), what - really - does it matter?

-- Servant (public_service@yahoo.com), January 11, 2000.

I think that because of the fact that the economy is so healthy, that a little push will not push it over the edge. If it was an ailing economy, it would be different.

-- Amy Leone (leoneamy@aol.com), January 11, 2000.

This is healthy? We are in a classic bubble, and even the bulls are starting to say this feels wobbly. People are withdrawing savings to invest in the market; we have more debt than we ever had before. Companies with no earnings (and no prospect of earnings) are worth more (on paper) than General Motors and England combined. Our heavy industry is gone overseas, and we are fully employed selling fries and soft drinks.

Y2k is over and the economy is healthy. Uh-huh. As Doctor Jim would say, "memorize that".

-- bw (home@puget.sound), January 11, 2000.


OK. This looks like the thread to post this to. I can't tell you what it means, because I know zilch about computers. Maybe someone can interpret it for me. In my job, I categorize product based on quality. When I'm finished with it (piece by piece) I download it on a disk and walk it over to DP (I know, I know, antiquated). When I want to change something because of more info about a particular piece, I must submit a manual record to be input over in DP. Today, I went to check on some changes I requested yesterday (would normally take 24-36 hours) and the clerk told me she couldn't input anything, and couldn't run a "maintenance" on the inventory because of a Y2K glitch. I didn't ask, but it appeared the disks I've been carrying over to her are able to be uploaded, because the results have appeared in shipping, but with some delay. Near as I can tell, a maintenance is just that, it updates the inventory to be as current as possible. This includes everything, such as production, sales, consignments, work in progress, the whole thing.

If some of you guys can glean anything from this rambling, maybe you can figure out its impact on the big picture. It's my first glitch spotted.

BTW, medium sized manufacturing company, employees about 400.

Cheers!

-- margie mason (mar3mike@aol.com), January 11, 2000.


The largest utility company in California has delayed announcing it's earnings. They claim they are waiting to announce the earnings after the CPUC rules on their pending rate case. Also, this same company has problems with their new billing system, some customers are not receiving their utility bill, it's a system-wide problem. Supposedly, they installed a new computer system that isn't working. We may not hear what goes on beind closed doors, but I have my suspicions.

-- stock holder (stockholder@stockholderrr.xcom), January 11, 2000.

Moderation questions? read the FAQ