What IT Can Learn from Delta’s Recent Flight Delays?
One glitch can have a huge ripple effect, as Delta Airlines recently discovered. The company suffered a computing issue at their headquarters, which paralyzed the entire network, resulting in delayed flights and infuriated passengers across the country.
The problem was identified as a power outage; however, that doesn’t really tell the whole story. There is no reason that a power outage should cause such a massive outage or take so long to recover from. The real underlying problem was likely that the airline’s old and complex IT systems had reached a breaking point and were difficult to bring back up after the power outage. They weren’t set up for the business agility and digital experience consumers now expect, but rather they are a combination of legacy IT systems from various acquisitions over the past several years. The cost of not being prepared was a loss of revenue and a hit to their reputation. But the worst part? It was all preventable. If the airline had only used the sort of powerful performance management technologies available today.
Computer Weekly has reported that the cost of poor application performance at the enterprise level has been estimated at well over $1 million per year. With Delta’s glitch last week forcing the airline to cancel more than 2,000 flights, the costs to the airline were undoubtedly substantial—but costs can’t be measured just in dollars. The airline’s reputation was also tarnished as the media reported endlessly on the story and angry customers took to social media with their complaints and criticisms as flight delays and cancellations went on for days on end.
According to the airline, the outage was caused by a small fire that was quickly extinguished but experts speaking to the media weren’t so sure. “Typically, almost all companies—especially if you have credit card data—are required to be spread out, sometimes across different countries, to make sure that basically [their network] never goes down,” said Rick Seaney, creator of FareCompare.com, speaking to the Washington Post. An actionable disaster recovery plan seems to have been lacking at Delta, an airline which reportedly has spent $150 million upgrading its computer systems in recent years. According to The Economist, airlines like Delta are stuck with multiple legacy systems (at least for now) simply because it’s not economically feasible, or logistically possible, to replace their aging infrastructure all at once. In order for airlines to avoid outages like those we’ve seen in the past week, they’re going to have to get really good at application performance management.
When you’re dealing with a network as large as Delta’s, it’s critically important to be able to spot problems before they turn into disasters. Riverbed SteelCentral is a great way to manage apps with continuous transaction capture, code-level metrics, dependency mapping and advanced root-cause analysis. SteelCentral is also one of the best ways to get complete network performance monitoring with end-to-end visibility and analytics and integrated troubleshooting capabilities—plus planning and configuration management. But despite powerful tools like these, disasters like the fire that supposedly caused Delta’s problems do happen. That’s why faster data replication performance between data centers is so important for improved disaster recovery and business continuity. Riverbed SteelHead allows enterprises to meet service levels, recover rapidly, and maintain business continuity by enabling data replication and transfer with greater visibility and control.
Like many traditional industries, airlines have a lot of legacy components in their networks that increase the likelihood of failure, and, like many businesses, replacing all of those legacy components just isn’t feasible. Riverbed helps prevent large-scale errors by bringing clarity to all this complexity by making unified visibility and disaster recovery easier.