Mike Canney (@mikecanney) doesn’t want to play “The Network Cop.” But after 23 years in the
business of finding what’s broken with a networked application, he knows you need to gather the evidence and build your case. Only, this evidence isn’t meant for the courtroom – it’s meant for the war room.
If you’re in IT operations, you know what war room I’m talking about. During his presentation in San Mateo last week on the Riverbed APM roadshow, Canney described the “blame-storming” sessions that engulf IT operations teams. These high-stress, high-stakes troubleshooting situations can be painfully unproductive.
The problem is applications are becoming more complex, more distributed, and more important to the business than ever before. Troubleshooting crosses many domains – in fact, it’s one of many skills in modern IT organizations that is breaking down the traditional silos of The Server Guy/Gal, The Network Guy/Gal, The Developer Dude/Chick, and The Storage Guy/Gal. There’s a lot you can learn from your peers, as well as experts like Canney, who bring years of parachuting into client environments and wearing the detective hat. Here are a couple of my favorite tips from spending a couple hours with Canney in his master class workshop:
Gather “emotionless data”
This is mandated by Canney’s first (of three) rule of performance troubleshooting: no guessing. This is troubleshooting, not Jeopardy. Sure, a good detective has good instincts for where to look and when something doesn’t smell right, but you never want to throw around blame – or worse, start implementing fixes – until you know beyond a doubt what is causing the problem
Canney called it “definitive data”, but one Riverbed customer I spoke to recently nailed it by calling it “emotionless data.” Too often, the human factor complicates the troubleshooting problem further. People get defensive, old grudges die hard, and the room gets fogged over like a bad episode of The People’s Court. No one’s ego has to get hurt (much)– let’s just see what the data tells us.
Get to know TCP
Yes, good ole’ TCP. Unfortunately, not enough developers understand the nuances of TCP and certain default behaviors can lead to 200ms delays between ACKs. Steve Niemczyk (@steveisles) gave a great technical overview of TCP from a troubleshooting perspective at OPNETWORK – a recording of session 1106, “Troubleshooting and Predicting the Impact of TCP on Application Performance with AppTransaction Xpert, “ will be coming soon to the OPNETWORK 2013 Proceedings site (Riverbed support login required).
If you’re a developer, you’ll want to brush up on TCP landmines that can cripple your application’s performance. If you’re in operations, you’ll want to learn how to spot TCP curveballs. Either way, if you are able to attend one of the upcoming live workshops with Canney, you may want to read up on RFC 1122. There will be a pop quiz.
Ghostbust the bottlenecks
With so many places where applications can go wrong, answering the question “Who you gonna call?” gets pretty challenging. What you want is empirical data of all points of delay to quickly find the biggest offender. But any tool or dashboard that is only looking at one component of the application or its delivery is too myopic. Manually correlating data from multiple sources can take days or weeks. With automation and a holistic view, you can quickly pinpoint the bottleneck by finding the largest delay. THAT is who you’re going to call first.
Canney’s list of “usual suspects” for application performance problems is honed from years of client engagements. If you can, attend one of the upcoming live (free) workshops with Mike Canney. You’ll also hear from Peco Karayanev (@bproverb) on his eight steps for application performance triage.
Riverbed delivers the most complete platform for Location-Independent Computing, turning location and distance into a competitive advantage. The Riverbed Application Performance Platform™ allows IT to have the flexibility to host applications and data in the most optimal locations while ensuring applications perform as expected, data is always available when needed, and performance issues are detected and fixed before end users notice. At more than $1 billion in annual revenue, Riverbed has 25,000+ customers, including 97% of both the Fortune 100 and the Forbes Global 100.