Troubleshooting Virtualized Applications is Like Searching for a Needle in a Haystack
For many enterprise organizations, troubleshooting performance issues for virtualized applications is like searching for the proverbial needle in a haystack. With so many different layers of technology and systems to analyze, and so many different domain-specific management tools to use, resolving performance issues can actually feel like searching a field full of haystacks, each inside its own silo. Is the issue related to database performance? Is it related to the application? Maybe it’s poor network connectivity. To make matters worse, the end users experiencing the performance issue do not typically provide the clearest picture as to what is really happening.
Searching silos fails to find the problem
I recently worked with a large financial services company facing a performance problem in their call center application, which is a virtualized application delivered via Citrix XenApp. The backend database is Oracle, and contact center agents access the application from their desktops using a local Citrix Client. Agents working from various satellite and home offices across the country use this application to look up customer accounts and execute basic transactions. Recently, users from various locations reported a severe degradation in performance interacting with the application.
Multiple IT teams scrambled to investigate the incident. Server Administrators looked at the health and resource utilization of the backend application servers and database servers. The Citrix Server administrators did the same on the Citrix Servers. Oracle DBAs collected deep-dive tracing data on the queries that correlated with the time of the user complaints. The application team investigated their code to ensure it was rendering properly. And the Desktop Services team reached out to the end users to make sure there was nothing running on their local machines that may have contributed to the issue.
After dozens of troubleshooting hours and finger-pointing, they came up empty. Unfortunately, this situation happens all too often for many IT organizations. So what can be done to expedite the time it takes to troubleshoot these types of problems?
First, find the right haystack
Troubleshooting problems in virtualized environment is extra complicated because virtualization disrupts the traditional relationship between applications, physical hardware, operating systems, and presentation layers. For Citrix XenApp monitoring and virtual desktop monitoring, Aternity enables IT to quickly identify which haystack needs to be searched first by providing insight into the four major technology tiers used in a virtualized environment.
- Remote Display Latency
- Host Resource Utilization
- Application Performance
- Infrastructure Latency
Ruling out the application and the infrastructure
The financial services company used Aternity to monitor all of the critical business activities executed by agents within the call center application.
As the chart shows, average response for most of the business activities was relatively stable and within acceptable performance thresholds. And logically, the application business activities cannot meet thresholds if the backend servers and databases are not also performing well. As a result, this chart immediately ruled out the application and the infrastructure as root causes of the problem.
Zeroing in on the network
The next troubleshooting step was to look at Remote Display Latency. Citrix ICA Latency measures the time taken to stream content from the Citrix Server back to the end user. As the dashboard shows, contact center agents experienced a significant spike in latency (between 1 – 3 seconds) during the period in question. This means that every time end users moved or clicked their mouse, they waited an additional 1-3 seconds for the content to be streamed back to their client device.
As we further investigated the incident, we noted 77 end points experiencing major incidents at the time of the ICA latency spikes, followed by a severe drop in the number of end points reporting. Further investigation showed that only two of six Citrix Servers actually remained online during the issue.
Within minutes, the IT team isolated the problem to a specific network segment that had become saturated, severing nearly all communications between the end users and the Citrix Servers. The networking team was then brought in to further diagnose the cause and coordinate with the local ISP to resolve the problem.
With the convergence of virtualization, mobile, and cloud, application environments have never been more complex. In today’s world of ‘five nines’ SLA targets, IT must move beyond the siloed approach to troubleshooting and finger-pointing.