We're thrilled to announce that Zebrium has been acquired by ScienceLogic!

Learn More

Uncover Blind Spots in Your Monitoring

Alert! Friday Fire Drill

It’s the end of a quiet Friday and you’re about to finish up for the week when Slack starts going crazy. Orders aren’t being fulfilled and no one has any idea why. 

 

slack feed

Something bad is happening, but what is it and why weren’t you alerted by your monitoring tool? You check the Cisco AppDynamics custom dashboard to examine business metrics and system load. 

 

appdynamics charts

The system load has been spiking periodically since 4:58PM, but never long enough to trigger a health rule violation in AppDynamics. Since the number of pending orders continues to increase, there’s clearly an issue somewhere. 

 

You also see a red dot on the System Load chart (above) coinciding with the time when the pending orders issue started. This dot represents a custom event that Zebrium sent to AppDynamics, indicating it found an incident in the log files. By clicking it, you can see the details that Zebrium’s machine learning has uncovered, as shown below:

 

zebrium custom alert

Of course! You remember seeing all those emails about the certificate expiring and should have done something about it weeks ago. A few minutes later: crisis averted, certificate issue fixed. Your weekend has started.

 

How Was This Kind of Detection Possible?

This easy incident resolution was made possible by the recently announced integration between AppDynamics and Zebrium. Here’s a short explanation of how it works:

 

AppDynamics collects, processes, and monitors metrics gathered across the stack. In addition, a copy of all logs are sent to the Zebrium Machine Learning (ML) engine. In simple terms, the logs are processed by an unsupervised ML pipeline that starts by structuring, categorizing and learning the patterns for each type of log line. The Zebrium ML engine then looks for clusters of correlated anomalies across the logs. When it finds one, it generates a root cause report and sends it to AppDynamics as a custom event, which contains a link back to Zebrium for more details. 

 

This ML-derived root cause report includes a plain-language summary created with the GPT-3 AI language model (more details can be found here) as well as a small set of log lines that help explain details of the problem. In the example above, the plain language summary was shown directly in an AppDynamics dashboard; by clicking the “details” link, the user can view the full root-cause report within Zebrium. 

 

Here’s a short video showing what this looks like in real life:

 

 

Avoid the Root Cause Bottleneck

Our Friday afternoon scenario involved a completely new problem where no pre-existing health rules or other business-metric triggers were in place in AppDynamics to detect the problem. In this case, the AppDynamics dashboard and custom event enabled a very quick resolution.

 

However, there is also a more common use case where AppDynamics and Zebrium integration can speed up the resolution of almost any kind of incident. Let’s say you’ve built a set of health rules in AppDynamics that are able detect a large range of known problems. Even though this will let you detect most problems, finding their root cause can sometimes require manual investigation—usually hunting through dashboards and AppDynamics snapshots, and then drilling down and searching through logs. 

 

In these cases, the AppDynamics and Zebrium integration can significantly speed up identification of the root cause through the following flow: 

 

Zebrium and AppDynamics process flow

 

  1. AppDynamics detects a health rule violation and triggers an alert.
  2. At the same time, AppDynamics sends an alert to Zebrium.
  3. Zebrium generates a root-cause report based on the signal request and sends it back to AppDynamics as a custom event. This event contains a description of the problem, along with a link back to Zebrium for more details.

The net effect of this integration? When AppDynamics detects a problem it immediately alerts the user, who’s then able to see the root cause (found in the logs by Zebrium) through the custom event sent to AppDynamics.

 

So don’t wait until a nasty problem strikes at the most inconvenient time. Getting started with AppDynamics and Zebrium is easy. Please visit here for information and a free trial of Zebrium and here for a free AppDynamics trial. 

 

 

Limited Time Offer! 50% off first annual Zebrium subscription when used with AppDyanmics.
GET STARTED FREE

 

 

FREE SIGN-UP