National Instruments is a producer of automated test equipment and virtual instrumentation software.
Riverbed SteelCentral AppInternals Reduces Time Spent on App Performance Issues by 90%
National Instruments’ external-facing website, ni.com, gets three quarters of a million page requests each day and 1.7 million unique visitors each month. Customers and potential customers use the site to learn about the company’s products, configure and purchase complex test and measurement solutions, and download programs, sample code, and development tips.
There are also forums in which NI employees answer users’ questions.
Behind the scenes, ni.com is a conglomeration of some 300 Java-based applications, more than 90% of which were custom-developed by NI programmers located around the world. These applications run on 37 servers supporting 181 Java virtual machines (JVMs). In addition to the servers in the production environment, 33 servers in the development and test environment run 228 JVMs, for a total of over 400 JVMs.
Frequent releases to business-critical public website introduced problems
Complex conglomeration of Web services (300 Java applications running on nearly 200 JVMs) is difficult to debug and troubleshoot
Reduced application troubleshooting time by 90%
Improved website stability and customer satisfaction with the site
Reduced issues introduced into production by 20% to 30%
Supported six-fold increase in website updates
Challenge: Business-critical website gets 120 scheduled updates each year
Eric McCraw, global web systems manager for IT at NI, directs the team that keeps ni.com functioning optimally. Complicating his job is the fact that the site is frequently updated with new features and functionality. “We have an enormous amount of change that goes live all the time,” McCraw explains. “That includes quarterly, monthly, and biweekly releases for a minimum of 120 scheduled change windows each year, and that doesn’t included unscheduled emergency fixes.”
Not every release runs perfectly the minute it hits the production environment, and in the past McCraw’s Web Systems team spent thousands of hours each year troubleshooting website issues caused by newly released applications. “When we were on call, we got paged multiple times during the day and night. In fact, the support team spent 90% of its time trying to figure out what was going wrong with certain applications,” says Mark Osborne, an NI web systems engineer. “It was so bad that when people were on call, any projects they were working on were basically put on hold.”
The situation caused friction between the Web Systems team and the developers. “It was the natural amount of tension that arises when the developers’ first reaction is to blame anything but their own code,” says McCraw.
Application performance problems also had drawbacks for the business as a whole. “Once we identified someone’s code as being problematic, those developers could not move on to other things. Rather than adding new functionality that might be revenue-generating, they had to work on solving the problem,” McCraw adds. Site downtime caused by application issues meant that customers had to come back to ni.com later. When a problem affected the configurator application on the site, users could lose the 20 to 30 minutes they spent entering their information into the system.
Solution: Ability to look “under the hood” of problematic applications
McCraw’s team practiced “the old-school way of troubleshooting applications, where they would try to refactor the code, doing things like sending portions of the applications out to logs to see where the application got before it started throwing errors,” he explains. This took too much time and didn’t always identify the cause of the problem. “We had no insight about what was going on under the hood,” Osborne adds. “A lot of the time it was a guessing game about what was needed to make the application run more smoothly.”
That all changed when they purchased SteelCentral AppInternals software, which enables them to see deep inside the applications to diagnose the root causes of performance problems. AppInternals combines end-user experience monitoring, code-level transaction tracing, and deep application component monitoring in a single integrated solution, thereby providing multiple solutions for NI.
Developers use AppInternals after deploying their code into one of the test environments. “AppInternals is fantastic for letting them see what’s going on when they’re having some unexpected behavior,” explains McCraw.
The Web Systems team uses AppInternals similarly. “Mostly we use AppInternals when we’re having issues with a certain application that makes up part of our site,” McCraw notes. “That’s when we do a deep dive into the AppInternals output to take a look at what’s going on with the developer’s code and then work with him and his team to correct it. We’re no longer flying blind or having to take the developer’s word. Even though those of us on the Web Systems team are not Java developers, the tool allows us to trace out exactly what’s going on inside the application.”
In addition, the operations people on the Web Systems team use AppInternals daily to get a health check on the JVMs in the production environment.
According to Osborne, one of the features of AppInternals that is extremely useful is that “it follows every Java transaction as it goes in and out of different JVMs and across the wire. It will trace every Java method call that’s made so you can see exactly where the delays are in each application, almost down to the line of code. I haven’t seen this functionality in any other tool.”
Benefits: Troubleshooting time reduced to 90%, up to 30% fewer production issues, MTTR down, and finger-pointing finished
One of the most important benefits AppInternals brings to the Web Systems team is a reduction in time spent on application performance problems. “I’d estimate that we spend one-tenth the time we used to spend,” says Osborne.
Part of that can be attributed to the fact that newly released applications go into production with fewer issues than they had in the past, thanks to the use of AppInternals in the development and test environment. When there are problems with applications in production, NI’s practice is to create a production issue resolution (PIR). McCraw estimates that they open 20% to 30% fewer PIRs since deploying AppInternals.
The ability to fully understand application behavior has also decreased mean time to resolution (MTTR). Not only are causes of issues identified more quickly, fewer people are involved in the search. “There were times in the past when we’d spend days with Web systems people, developers, UNIX guys, and database administrators in a room,” recalls Osborne. “That just doesn’t happen any more.”
Another change following the deployment of AppInternals is less tension between developers and the Web Systems team. As McCraw explains, “It is kind of a black eye or embarrassment to get a PIR around your application. In the past, the reaction was to blame the environment or servers for the application behavior. That went away for lots of reasons, but one has to do with how we approach fixing the problem now. Tools like AppInternals let us remove the guesswork and emotional finger-pointing and get down into facts about how the application is actually performing. Facts get rid of the emotion very quickly.”
Ni.com is more stable since the adoption of AppInternals, in particular the configurator, which is a critical component of the site. The website also benefits from more frequent improvements now that developers spend less time fixing problems and more time creating new functionality. Prior to deploying the SteelCentral solution, ni.com had only monthly and quarterly updates, for a total of 16 over the course of a year, compared to the 120 scheduled updates it gets now.
McCraw sums up NI’s experience with Riverbed this way: “We’re now able to look inside of the developers’ code—without having to modify the code—while it’s running in our production environment. That’s fantastic. I can’t imagine someone running a site of any real size without this capability.”
National Instruments’ public-facing website, ni.com, gets updated 120 times a year. The Web Systems team, which is charged with keeping the site running optimally, used to spend thousands of hours each year troubleshooting issues caused by newly released applications. The inability to quickly find the cause of application performance problems caused tension between the Web Systems team and the developers, and sometimes negatively affected the site’s users as well. The Web Systems team now uses Riverbed SteelCentral AppInternals to quickly diagnose the root causes of application performance problems.
Since deploying the software, they have seen application troubleshooting time go down by 90%. Additionally, MTTR has gone down, and there are fewer team meetings needed to resolve issues. Developers use AppInternals after deploying their code into a test environment. As a result, there are 20% to 30% fewer issues introduced into production, making it possible for the number of scheduled updates to go from 16 a year to 120 a year. The use of AppInternals has made this business-critical website more stable and contributed to a better customer experience.
“I can’t imagine someone running a site of any real size without this [Riverbed SteelCentral] capability.”
“Even though those of us on the Web Systems team are not Java developers, the tool allows us to trace out exactly what’s going on inside the application.”
“It will trace every Java method call that’s made so you can see exactly where the delays are in each application, almost down to the line of code. I haven’t seen this functionality in any other tool.”