Dead Packets Society-Carpe Diem
Finding value in packet captures for high speed networks > 10G
Capturing packets or sniffing them from networks with lightweight probes and monitoring tools, provides a basic level of analysis and has been one of the most common ways to detect and uncover issues on the network affecting application performance.
But is it dying off, as if it were some obscure practice like the prep-school boys of Dead Poet’s Society reading verses in a candle-lit cave in the middle of the night?
A lot of packet capture tools are still free and widely supported, but might not get to the root cause of issues as effectively as they have in the past.
The packet capture challenge in today’s high speed networks
What’s changing rapidly is the ability to do high speed packet capture in today’s complex physical and virtual network architectures.
Then, being able to store the packets effectively, and sifting through the volumes of data from 10G—now 40G—and 100G lines now looming on the horizon with any kind of fidelity is definitely the way of the future. That’s if you believe that packets don’t lie as most people do. Basically, you can find out anything that happened on the network if you have the packets. Is it worth it?
Traditional packet capture tools cannot keep up unless you know exactly where to look. In today’s high speed networks, it would be like using a scalpel to cut down a tree. What you really need is a chainsaw to get the tree down first and if you’re looking for more precision, then you’d use a scalpel. Even seconds of packet capture can generate millions of packets that will be meaningless unless you can get to the packets you need quickly.
A funnel approach
So the key is using a funnel approach—that is, starting with the bigger picture of user experience and response times from a combination of flow data and passive monitoring on taps or SPAN ports. You then can find out if you have specific users, links, or applications that are consuming more bandwidth and crowding out others. For the delays, you can decode the TCP IP conversations to provide the composition of delay. For example, server delay, retransmission delay, connection setup delay, and payload transfer delay will help you decide where to look. As you work down the funnel, you’ll determine where to look and to figure out where to capture and analyze what’s going on at the packet level.
One example is during microbursts from high frequency transactions, such as when market data is released for financial trading institutions that rely on being able to trade at very high speeds and frequencies based on this data. In this case, you find that trades are not being executed as expected but bandwidth utilization from flows and TCP/IP conversations does not look like anything is wrong. That’s when the packet capture comes in and if you can drill down to your packet capture engine and then view the packets that will show the microbursts at millisecond levels. You’ll find out quickly if you need to upgrade your multiple 40G links to a 100G link when you see a wall of saturation where trading has stopped, as shown in the following figure.
Another fairly recent security issue that scared almost everyone was the Heartbleed vulnerability, where the key was having the history of packets to inspect at a later date. For example, you could use a Berkeley Packet Filter (BPF) to find out if your data was compromised. Without having the packet capture, you would not know if your data was exposed to hackers who could exploit the information at your expense.
The best combination of monitoring and analysis tools will provide you the context to determine whether or not a packet capture is warranted and then help you get to the exact slice of data you need in that context for analysis.
Seize the packets
In some cases, an ‘on demand’ approach is sufficient, such as in a branch office location or for mobile users where capturing all the packets all the time for every user might not be worth it. With an intelligent packet capture approach, you can configure alerts that trigger an on demand packet capture when you have slow pages to identify the root cause of the issue.
You could have mobile users in delivery trucks or even law enforcement personnel who suddenly have slow pages because someone at headquarters updated the news feed with a picture of a newborn baby from someone on the staff or even a tribute to someone who did outstanding work. The size of the jpeg image could clobber the performance of these apps used in the branch or on the road almost instantly but you might not catch it without doing a packet capture.
In other cases where mission critical apps run your business and slow time is really more like down time because you’re losing $1,000s by the seconds, having all the packet captures in a relevant history available for inspection might make more sense. The key is you cannot diagnose and fix what you don’t know; afterall, slow is the new down!