APM and Cloud Monitoring – Monitoring a Dynamic Application
Running in the cloud offers a lot of power, as well as flexibility with resource allocation. You don’t have to buy your hardware up front and just pay as you go. There is a lot of flexibility in defining servers, networks, availability and fail-over on a large scale with a simple software interface. Software development is also more dynamic, you can develop, test and deploy code in lightning speed. You can use automation where possible, test more efficiently and you don’t need to schedule or coordinate hardware usage months in advance. You can also run multiple versions in multiple environments and geographies and deploy new code on a daily basis. This flexibility and dynamic behavior is absolutely fantastic but it does offer some challenges for APM (application performance management) and cloud monitoring for a dynamic application. In this first blog of a series we will concentrate on the the challenges of monitoring a dynamic application in the cloud. If you are hitting challenges with APM and cloud monitoring of an application. SteelCentral AppInternals can help!
APM and cloud monitoring challenge 1: frequent code changes
With agile development and multiple release cycles running on a regular basis, your APM solution needs to automatically discover new code and code changes. An effective APM solution needs to automatically instrument the new code or automatically adjust instrumentation according to changes in existing code. With development teams in multiple or remote locations it is not always easy to know what changed or even allocate and coordinate the time to have the “what’s new” conversation with the Dev team. A common problem with legacy APM solutions is that they instrument only a subset of the code, in many cases it is standard classes or libraries and some custom instrumentation that was added a while back. For many APM solutions the re instrumentation is a manual process. Changing instrumentation for those APM solutions ends up consuming a valuable time from both the APM administrators and the developers, trying to understand what needs to be added or modified. A common alternative ends up doing nothing, only to be making these changes frantically after an outage just to understand what went wrong. The problem is that then you are waiting for the system to break again just so you can get additional information about the problem.
SteelCentral AppInternals allows auto-discovery and auto-instrumentation of applications and code. We detect all applications running on a machine and automatically instrument them. SteelCentral starts capturing detailed information about new code, or code changes without needing to change the application or the configuration. We capture all transactions for all users all the time (we DON’T sample). SteelCentral user interface also gives a compare mode so you can see the information before and after a change and lets you easily find code bottlenecks.
An example for detecting a code change and SteelCentral ease of use can be seen in this short video:
APM and cloud monitoring challenge 2: systems growing and shrinking
One of the great advantages of a cloud environment is the ability to grow and shrink on demand. On a normal lazy Sunday morning you may have very little traffic and don’t need a large environment or a lot of resources to support the application load. On a crazy busy Sunday just before a Christmas you need a large environment and a lot of resources just to handle all the last minute shoppers. Since the cloud allows you to grow and shrink your environment and resources, monitoring an application running in the cloud requires automatic discovery and instrumentation of servers/instances/objects and understanding the relationships between them on the fly. For some APM tools this is not easy or not even possible, in some cases it may require manual configuration or adjustments.
APM and cloud monitoring challenge 3: short lived and moving objects
In a cloud environment objects can move to another server, cluster, datacenter, on premises, in the cloud or even to another geography. What do you do in such a case? Lose visibility when your objects move? Have multiple copies of the same object appear many times in your system? What do you do in a case where objects are short lived? You may have objects that are relevant for certain time period, in some cases they are relevant only for a specific application, a specific transactions or a specific geography. Do you just track only long lived objects? Track them all in some sort of central repository that has everything? How do you know what objects are relevant? And what happens when those objects are gone? Are they deleted from the monitoring system, are they still visible and making it more difficult to understand where a problem is coming from? Do you run manual or automatic purge process and potentially impact the performance of your APM solution and the accuracy of the data?
SteelCentral AppInternals allows auto-discovery and auto-instrumentation of servers, instances and applications. With SteelCentral customers are up, running and solving complicated problems FAST! SteelCentral simplifies the process of investigating problems even when objects are short lived or move around. SteelCentral show the relevant servers, instances, containers and code information for the right time frame, application, transaction or geography and simplify root cause analysis with big data analytics.
Example: For applications running on docker containers, who are typically transient and elastic, you can build the image with the agent or install the agent on the docker host and reference it from the container as seen in the following image. In both cases the agent starts reporting container and application information when the container is running.
In the next blog we will cover more challenges of APM and cloud monitoring and how SteelCentral addresses those challenges.
Ready to start monitoring your application? Please register for a free trial.