- Lack of network visibility led to ineffective troubleshooting
- Major email application upgrade was delayed and jeopardized by performance issues
- Finger-pointing between IT groups exacerbated root cause analysis
- Riverbed SteelCentral NetExpress all-in-one box
- Riverbed SteelCentral NetShark 2100 continuous packet capture appliances
- Riverbed SteelCentral Packet Analyzer analysis console
- “Forensic fly-away kits” that combined all of the above products
- Reduced mean time to resolution from an average of 6-12 months to one week or less
- Improved collaboration and communication between IT staff
- Cut anticipated IT travel costs by $1 million annually
Tiburon Associates provides the defense community and the federal government with program expertise and technical innovation in manufacturing, engineering,technical knowledge transfer, base operations, research and development, and acquisition management.
The consultants involved with the email migration pointed to the network, including the local-area network (LAN), the storage-area network (SAN), and the wide-area network (WAN) as the culprit, but could not identify a precise cause of the problem. “After much time and analysis—seven days a week and lots of overtime—they were no closer to finding the problem,” says Shabe.
Wellfleet put the email migration on hold and the problem persisted for several months. Eventually, management decided it had enough of the intermittent email disconnects and ordered the IT team to continue working until the issue was resolved. “When the director of the organization says, ‘You’re not going home until you fix it,’ and it is a random, intermittent problem, as you can imagine, there is a lot of finger-pointing.” At that point, Wellfleet asked Tiburon to come in and see if they could determine and diagnose the problem.
Challenge: Intermittent email disconnects;network implicated as culprit
As is typical of companies that provide services to the government, Tiburon is not allowed to disclose the names of its clients. The situation described in this study involves a Tiburon client, code-named Wellfleet, which has hundreds of remote offices located throughout the U.S. and around the world. Wellfleet was in the process of an enterprise-wide email application upgrade when users started experiencing problems. “The migration was going along, 50 users at a time, and everything was going okay until they got several hundred users switched over,” explains Larry Shabe, a systems engineer and Tiburon’s CEO. “Then they started having random disconnects. Someone would be typing a message and the email client would suddenly disconnect.”
Solution: SteelCentral—an “MRI” for the system
Tiburon was well-equipped with the tools and expertise to take on this challenge. The company had already gained plenty of experience with network performance management (NPM) tools such as NetScout, HP OpenView, OPNET, and SolarWinds. Shabe did not choose any of these for the Wellfleet job, however, because none of them would give him the holistic understanding of the situation he knew was needed. “I compare those tools to stethoscopes,” he says. “They give you some information about what is going on, but they do not let you see what the problem actually is.”
Continuing with Shabe’s medical analogy, what he needed for Wellfleet was akin to an MRI. For that he chose the SteelCentral application-aware NPM solution from Riverbed Technology. “By integrating end-to-end monitoring with deep packet capture and packet analysis, SteelCentral can tell if it is a network problem, an application problem, or in some cases, a storage issue,” Shabe explains. “SteelCentral looks at the environment holistically. This is because it is the only network performance management solution that can combine network flow data and packet data into a single logical data store, and share information between the observation points.”
Tiburon has experience with other Riverbed products as well, including SteelHead WAN optimization appliances, SteelApp Traffic Manager, and SteelHead Interceptor appliances. The company’s confidence in these products contributed to Shabe’s decision to choose the SteelCentral solution for the Wellfleet job. “Riverbed has great products. We’ve done very well with them,” Shabe says.
Benefits: Quick and convincing determination of the problem; $1 million travel cost savings
SteelCentral NetProfiler rapidly pointed Tiburon in the direction of the problem, showing quickly that Wellfleet was experiencing 120,000 resets from their email servers during a couple of hours each day. However, the consultants working on the email migration remained unconvinced that the email configuration was responsible for the problem, even after seeing this information from SteelCentral. This is where the SteelCentral solution’s ability to capture and store packet data became important. Tiburon installed two SteelCentral NetShark 2100 continuous packet capture appliances near the email servers to capture all email traffic. Then, using the SteelCentral Packet Analyzer and Wireshark techniques learned at a Riverbed “Power User Training Session,” Tiburon was able to analyze the specific packet behavior and see what was happening at a more detailed level.
NetProfiler identified the symptom and the NetShark appliance pinpointed the culprit. Apparently, a virtual software load-balancer between the users and the clustered mail servers was causing reset problems. No one could refute the empirical evidence that the SteelCentral tools provided. “We captured and analyzed the packet data, put it into a concise PowerPoint presentation and then brought everyone involved into a room,” explains Shabe. “This data was irrefutable. We had the actual IP address of what was, causing the resets. This convinced the email migration consultants that it was in fact, their software and servers that were causing all the resets, and in turn causing the performance problem.” Tiburon made this information available after working on the problem for only four hours. The email migration team accepted this conclusion, made the appropriate changes, and the problem immediately disappeared.
Without the SteelCentral solution, Tiburon estimated that it could have taken six months before the customer diagnosed the root cause of the problem. “SteelCentral helped provide clarity,” Shabe adds. “NetProfiler told us there was a reset problem, but because there were competing ideas, we used SteelCentral’s packet capture functionality to take it to the next level and provide irrefutable evidence of where it was occurring.” Prior to using SteelCentral solutions, problem resolution took Tiburon 6-12 months, on average. “Typically we had to set up tiger teams and subteams,” he adds. “With SteelCentral, resolution now takes less than a week, even for the type of random, intermittent problems that Wellfleet was experiencing.”
Tiburon was so excited about the solution that they decided to build several “forensic fly-away kits” consisting of SteelCentral NetExpress 360, Packet Analyzer, NetShark 2100 appliances, and Wireshark. Rather than send their own personnel to customers having performance issues, they plan to simply ship the kit. “It’s cost-inhibitive to put network monitoring gear in each office and we know we will have some remote offices with problems,” Shabe explains. “The Riverbed products are easy enough to install that local personnel can do it on their own by following instructions Tiburon provides with the equipment.” “With these kits, we can just sit back here in headquarters, log in and see what is going on,” Shabe adds. “Historically, we would have to put engineers on a plane and they would spend several weeks at a site.” He estimates travel cost savings alone could run as high as $1 million per year thanks to the SteelCentral kits.
Part way through an enterprise-wide email upgrade, a Tiburon Associates customer started experiencing random, intermittent email client disconnects. The problem grew serious enough that the migration was put on hold while the customer, along with the consultants managing the email upgrade, tried to find the cause. The consultants implicated the network, but were unable to point to a specific cause. Tiburon was called in to help after the problem had persisted for several months. They came equipped with the full SteelCentral solution, which provided both broad and deep visibility into application performance on the network.
With SteelCentral NetProfiler, Tiburon could see within 15 minutes that there were 120,000 resets at the email servers during a couple of hours each day. The consultant team was still not convinced, so a few hours later Tiburon prepared a report showing the packet-level details—the exact IP address of the equipment causing the resets—using SteelCentral NetShark continuous packet capture appliances. Tiburon is so excited about full SteelCentral functionality that they have created “forensics fly-away kits” that they will ship to remote sites having problems instead of sending their engineers, saving nearly $1 million in travel costs annually.
“Riverbed has great products. We’ve done very well with them.”
“SteelCentral looks at the environment holistically.”