British Airways cancels all flights from Gatwick and Heathrow due to IT failure

CanonLady · 27 May 2017

https://www.theguardian.com/world/2017/may/27/british-airways-system-problem-delays-heathrow

British Airways has cancelled all flights from Heathrow and Gatwick on Saturday due to a major IT failure that is causing very severe disruption to its global operations.

The airline said that its terminals at Heathrow and Gatwick had become “extremely congested” due to the computer problems. It decided to cancel all flights from both airports before 6pm UK time on Saturday. “Please do not come to the airports,” BA said.

A later statement said the airline had been forced to cancel all remaining flights scheduled to depart from the UK’s largest two airports on Saturday. “We are extremely sorry for the inconvenience this is causing our customers and we are working to resolve the situation as quickly as possible.”

It is believed hundreds of flights at the two airports have been affected, and more around the world have suffered major delays.

Travellers have been told to check ba.com and its Twitter account for updates about the situation.

Paul Cronin · 27 May 2017

A very sad day there today, this will take a few days to get back to normal schedule.

Nawty · 27 May 2017

I'm due to fly to Moscow tomorrow, fingers crossed they sort it.

ianp5a · 27 May 2017

I suspect this is due to some high level incompetence. And completely avoidable.

CanonLady · 27 May 2017

Hopefully they will sort it soon as there must be thousands of people affected (and checking their insurance documents).

neil_g · 27 May 2017

ianp5a said:
I suspect this is due to some high level incompetence. And completely avoidable.

Or sometimes poop just happens.

ianp5a · 27 May 2017

neil_g said:
Or sometimes poop just happens.

That's what the manager who ordered someone to cut corners said.

Critical systems can fail. But you prepare for just such cases with multiple fallbacks.

neil_g · 27 May 2017

ianp5a said:
That's what the manager who ordered someone to cut corners said.

Critical systems can fail. But you prepare for just such cases with multiple fallbacks.

Which is great but it's not beyond possibility that all systems including their redundancy options simultaneously fail.

Nobody seems to be saying what the issue is yet so let's leave the finger pointing for now.

ianp5a · 27 May 2017

Sure. We don't know the cause. But we can see the effect. Misery for so many people, which I think was avoidable.

Paul Cronin · 27 May 2017

ianp5a said:
Sure. We don't know the cause. But we can see the effect. Misery for so many people, which I think was avoidable.

No need for an enquiry, now that you have solved it :clap:

ianp5a · 27 May 2017

paul100 said:
No need for an enquiry, now that you have solved it

You think so? I don't, and didn't say that. But carry on.

arclight · 27 May 2017

One thing is for sure. This will provide plenty fodder down at the Daily Fail and Sun etc.

stevelmx5 · 27 May 2017

ianp5a said:
Sure. We don't know the cause. But we can see the effect. Misery for so many people, which I think was avoidable.

If you don't know the cause, how do you know it was avoidable?

ianp5a · 27 May 2017

stevelmx5 said:
If you don't know the cause, how do you know it was avoidable?

If you have adequate failsafes, it will be avoided. And we can see that they didn't have adequate failsafes.

stevelmx5 · 27 May 2017

ianp5a said:
If you have adequate failsafes, it will be avoided. And we can see that they didn't have adequate failsafes.

I'd be very surprised if an IT system running two major U.K. airports didn't have failsafes but, as per the post above, sometimes sh1t just happens in the real world.

ianp5a · 27 May 2017

The Delta data centre outage in January cost Delta Airlines $100 million dollars. I guess BA are going to get a hefty bill. And I guess they don't want it to happen again. So they will likely improve their procedures.

stevelmx5 · 27 May 2017

ianp5a said:
The Delta data centre outage in January cost Delta Airlines $100 million dollars. I guess BA are going to get a hefty bill. And I guess they don't want it to happen again. So they will likely improve their procedures.

Yes

Mikesphotaes · 27 May 2017

ianp5a said:
I suspect this is due to some high level incompetence. And completely avoidable.

That's some crystal ball you've got there!

ianp5a · 27 May 2017

Well it's either that, or I was stating the obvious, or I'm the evil super hacker behind it all.

stevelmx5 · 27 May 2017

ianp5a said:
Well it's either that, or I was stating the obvious, or I'm the evil super hacker behind it all.

From what I can gather, it's actually a major power failure in India where their main offshore support provider (Tata) is based. I don't know more details than that, but from experience, a lot of offshore providers rely on pretty shaky power grids so unless Tata can run their support infrastructure on generators for days, there's not a lot BA can do.

neil_g · 27 May 2017

id be surprised if its a simple power outage to be fair.

id put more money on that power failure causing other knock on issues. data corruption in particular.

ancient_mariner · 27 May 2017

stevelmx5 said:
From what I can gather, it's actually a major power failure in India where their main offshore support provider (Tata) is based. I don't know more details than that, but from experience, a lot of offshore providers rely on pretty shaky power grids so unless Tata can run their support infrastructure on generators for days, there's not a lot BA can do.

Is that the Indian IT support provider that BA recently started using, making UK IT staff redundant?

stevelmx5 · 27 May 2017

ancient_mariner said:
Is that the Indian IT support provider that BA recently started using, making UK IT staff redundant?

Probably. Welcome to the future.

ianp5a · 27 May 2017

Risking their major operations on known unreliable power supplies, without an adequate, switch in alternative location, would then be the place where they'll be making big changes, in that case.

I suspect they have lots of contingencies in place, but could have skimped on the testing and quality, like Delta Airlines did. Their backup site smoothly took over. But not everything could connect to it.

Nawty · 27 May 2017

Well, you reap what you sow...

Still, I hope they fix it, I've a lot vested in tomorrow's flight.

ianp5a · 27 May 2017

Nawty said:
Well, you reap what you sow...

Still, I hope they fix it, I've a lot vested in tomorrow's flight.

Yes good luck tomorrow.

Nawty · 27 May 2017

ianp5a said:
Yes good luck tomorrow.

On the plus side I'm flying business class so if the flights are cancelled they'll have to wheel me out of the lounge as I'll have drunk all of their champagne

ianp5a · 27 May 2017

"I'm sorry sir. Supplies of Champagne have been disrupted by the computer outage. They sent Red Bull instead"

Nawty · 27 May 2017

ianp5a said:
"I'm sorry sir. Supplies of Champagne have been disrupted by the computer outage. They sent Red Bull instead, you can fly there yourself"

FTFY

rick448 · 28 May 2017

A few very short sighted responses in here. Unfortunately this is likely to be an outcome of the general publics obsession with lower prices. As with most things, when you try to reduce the cost something has to give. I've no idea what the cause of this was, but as someone who works in the aviation industry I do know that margins are now next to nothing, and I'm afraid those who drive the prices down are those who suffer the consequences. But don't worry, you can claim your money back... but, oh that will mean someone has to pay... oh no, higher prices...l so we need to cut costs, and the cycle continues.

Mr Bump · 28 May 2017

i suspect some data centra mega switch has gone pop and the 3rd part that subcontracted the support contract to the 4th party has been tole they dont stock the part so its on 24 hour order from cisco.

Nawty · 28 May 2017

I wonder if they've tried the power cycle routine failsafe?

(i.e. switch it off, switch it back on).

ianp5a · 28 May 2017

Mr Bump said:
i suspect some data centra mega switch has gone pop and the 3rd part that subcontracted the support contract to the 4th party has been tole they dont stock the part so its on 24 hour order from cisco.

That was close to the Australian census computer cock up.
"Computer giant IBM has conceded the issues surrounding the census website outage could have been avoided if it had turned one of its routers off and on again beforehand"

Nawty · 28 May 2017

Carnage in T5 this morning, glad I'm not having to re-book. Business class check in is 1hr, normal is at least double that. Be early.

petersmart · 28 May 2017

As more and more systems are loaded onto the Internet more and more systems will fail.

The more complex a system is the more room for failure whether accidental or deliberate.

B1ts · 28 May 2017

Nawty said:
Carnage in T5 this morning, glad I'm not having to re-book. Business class check in is 1hr, normal is at least double that. Be early.

Did you mean half rather than double?
Or words to that effect anyway, not slept so might be my failing logic circuit....
Unless you mean the carnage has made check-in shorter somehow but then why be early?...headache lol.

neil_g · 28 May 2017

petersmart said:
As more and more systems are loaded onto the Internet more and more systems will fail.

The more complex a system is the more room for failure whether accidental or deliberate.

Again it's driven by the consumer. People want express check in via the Internet, people want live flight details, people want to be able to check and make changes to their bookings and all of the other frilly little features.

However like I said with all technology sometimes poop happens no matter how simple or how many redundant failsafes you have.

cambsno · 29 May 2017

Outsourcing is not an issue - in fact its normally a good thing. Ok, this applies to a company the size of BA less so, but by outsourcing companies can get enterprise grade IT for an affordable price, and I have seen first hand (admittedly in smaller businesses) how many in house IT is pretty pants.

A decent data centre should have 72 hours + of diesel generator power available if a power cut happens, and a business of this site should have at least 1, if not 2 data centres as a failsafe. Outsource to India TBH would not be my first choice however.

Byker28i · 29 May 2017

stevelmx5 said:
I'd be very surprised if an IT system running two major U.K. airports didn't have failsafes but, as per the post above, sometimes sh1t just happens in the real world.

stevelmx5 said:
From what I can gather, it's actually a major power failure in India where their main offshore support provider (Tata) is based. I don't know more details than that, but from experience, a lot of offshore providers rely on pretty shaky power grids so unless Tata can run their support infrastructure on generators for days, there's not a lot BA can do.

cambsno said:
Outsourcing is not an issue - in fact its normally a good thing. Ok, this applies to a company the size of BA less so, but by outsourcing companies can get enterprise grade IT for an affordable price, and I have seen first hand (admittedly in smaller businesses) how many in house IT is pretty pants.

A decent data centre should have 72 hours + of diesel generator power available if a power cut happens, and a business of this site should have at least 1, if not 2 data centres as a failsafe. Outsource to India TBH would not be my first choice however.

BA's datacentre is at Heathrow, in their waterside HQ It support was indeed outsourced to Tata, with some smart hands on site. It appears they have power to site but due to a major fault, cannot get it to the affected parts of the building housing the datacentre rooms. It's not known if this is affecting the servers or if they are running and it's the network connectivity as there is a full clampdown on staff speaking to anyone.

Now this smacks of complacency and incompetence and really will cost them far more in lost revenue, compensation and lets face it embarrassment to the brand, than having a decent DR site. There's no technical issue with having a DR site, it's just down to cost, planning and too many times people gamble on this.

neil_g · 29 May 2017

I'd be amazed if they didn't have a secondary datacentre along with one at Heathrow. A colo rack and vpls isn't exactly expensive.

One theory that is being banded around is that apparently some displays (departure boards etc) having muddled information before the main issues. It's possible that when one dc went down and rolled to the second the data was perhaps out of sync/incomplete/corrupt. Alternatively the fail over of databases had issues and/or didn't complete, or the rollover happened but the primary came back up too fast and the data wasn't syncd back etc.

I'd not be surprised if they're pulling databases back from their backups.

British Airways cancels all flights from Gatwick and Heathrow due to IT failure

Oooh that burglar's a cutie

From under the bridge