THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Joe Chang

Air New Zealand outsourcing problems with IBM Global Services

Has anyone recently encountered a case of SQL Server being migrated to DB2 on mainframe? Linux or zOS? Whose fault is this? the data center operator? or ANZ for not having functioning DR?

http://www.betanews.com/article/Amateur-Linux-IBM-mainframe-failure-blamed-for-stranding-New-Zealand-flyers/1255360352

http://www.stuff.co.nz/national/2953054/System-crash-creates-airline-chaos

http://www.stuff.co.nz/travel/new-zealand/2955289/Air-New-Zealand-boss-criticises-IBM-over-outage

http://www.stuff.co.nz/travel/new-zealand/2954151/Air-New-Zealand-to-meet-with-IBM-over-computer-crash

http://www.stuff.co.nz/business/industries/2957307/Air-NZ-launches-into-IBM/

Was all this a test? The last URL says: "the power outage which appeared to have been caused by a failed oil pressure sensor on a back-up generator." I am assuming that the data center is normally on the power grid. So was the DC testing the procedures that would be followed in the event of a failure of outside power?

Potentially a circuit breaker could trip, and instantaneously outside power is gone. The battery backup system should last long enough for the backup generators to come on line. 

It is also possible that the public utility power grid becomes overloaded during a hot day, voltage starts to drop, damaging things like electric motors, which depend on a certain voltage. The data center would detect an undervoltage, start their backup generators, put it on-line in parallel with the grid, transfer load, then trip the breaker to the grid. 

Recall that the Chernybol disaster occured because the operators were testing a failure scenario (http://en.wikipedia.org/wiki/Chernobyl_disaster)

Anyways, keep these URLs should anyone ever suggest moving from SQL Server to an IBM Global Services solution.

http://www.datacenterknowledge.com/archives/2009/10/12/ibm-generator-failure-causes-airline-chaos/

 

Published Monday, October 12, 2009 1:33 PM by jchang
Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Larry Hilibrand said:

Not here to defend IBM. But wasn't that a stupid data center failure? It had nothing to do with platform migration.

October 12, 2009 2:35 PM
 

jchang said:

apparently it is "IBM’s Newton data center in Auckland", so its not about IBM's ability to do platform migration, but to host a data center. Still some blame goes to the CIO for not having a proper DR site

October 12, 2009 6:14 PM
 

Nick said:

Remember this is New Zealand. Auckland went without power for a full 3 weeks just ten years ago: http://www.skmconsulting.com/to-do-news/Archive/The-Auckland-CBD-Power-Failure.aspx

If you'd been through that you'd think DR would be your top priority. Maybe it was too long ago?

October 12, 2009 7:42 PM
 

jchang said:

2 years ago there was a power outage incident in Florida. There was some kind of accident that caused fluctuations on the power grid. As a consequence the local nuclear powerplant was shutdown. I think there may be a regulation that requires two reliable emergency power sources to operate to coolant pumps in case of emergency shutdown. One is the backup diesel generator, and the other is the power grid. Because the power grid was nolonger reliable, the plant had to shutdown. So basically the regulation aggravated a minor problem into a bigger problem.

Sometime in 2002-03, there was a power distruption across the US northeast, I think its because our power grid is on the verge of collapse.

And then there is the matter of sun spot cycles. Are we heading into a peak? there will be more solar flares, which if it hits the earth, could cause massive disruptions. How many of us could go one day without the internet?

October 12, 2009 8:50 PM
 

Jonathan Kehayias said:

That outage across FL wasn't even two years ago Joe:

http://www.cnn.com/2008/US/02/26/florida.power/index.html

It never affected where I live down here, but my wifes family was without power for a while because of it.

October 12, 2009 10:23 PM
 

Jax said:

Your missing the key information,  firstly the datacentre is a relic - run by IBM sure but owned by AIR NZ running systems designed and built in the 70's - and they were just plain unlucky -   they were doing scheduled maint to the UPS's which meant they were offline when an sensor malfunctioned (no way to test for such a thing till it happens ) and triggered a auto shutdown on a multimillion dollar generator that was used as late as a day or so before in a real life power outage and worked perfectly.  So good look, no -  unlucky - yes in the extreme - i mean whats the alternatives - a second generator for backup to the first for the 1 day per year or whatever your running maintenance. -  what would be smarter would be a hot hot replicated site and migration of the load to that site when your doing maintenance - but when your dealing with an airline which wont spend money -there is no chance you will have a duplicate mainframe somewhere - if the 1970s app is even capable of such things.  

October 13, 2009 5:30 AM
 

jchang said:

JK: yes that the incident i was referring to. Ok so only 1.5 years ago, I am pretty far down the road of failing memory

Jax: if the data center is a relic, why would IBM be dumb enough to put their name on it. thats just asking for trouble. The betanews url says this was a 1999 and 2002 deployment, and part of it was Linux on zSeries, so we're not talking 360 stuff.

Still, lets wait for the report, and the counter claims.

also, the register

http://www.channelregister.co.uk/2009/10/12/sidekick_hitachi/

is saying that the MS/Danger outage + data loss is due to work contracted to Hitachi Data Systems. It does not say anything about a HDS SAN, so perhaps they were migrating to a HDS SAN.

I am thinking that it not that the data was wiped, but rather the DB file was corrupted, and hence the db cannot be started, and there are no real backups. It seems when people buy a really really expensive SAN because it "saves money", so much money was spent that they did not have adequate disk space for a local disk backup, because disk space is soo expensive on the SAN that "saved so much money"

October 13, 2009 9:16 AM

Leave a Comment

(required) 
(required) 
Submit

About jchang

Reverse engineering the SQL Server Cost Based Optimizer (Query Optimizer), NUMA System Architecture, performance tools developer - SQL ExecStats, mucking with the data distribution statistics histogram - decoding STATS_STREAM, Parallel Execution plans, microprocessors, SSD, HDD, SAN, storage performance, performance modeling and prediction, database architecture, SQL Server engine

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement