Zebrium Blog

A Prometheus fork for cloud scale anomaly detection across metrics & logs

At Zebrium, we provide an Autonomous Monitoring service that automatically detects anomalies within logs and metrics. We started by correlating anomalies across log streams to automatically raise incidents that require our user's attention. Now, we have taken the step to augment our incident detection by detecting anomalies within a group of related metrics and correlate those with anomalies found in logs.

Introduction

At Zebrium, we provide an Autonomous Monitoring service that automatically detects anomalies within logs and metrics. We started by correlating anomalies across log streams to automatically raise incidents that require our user's attention. Now, we have taken the step to augment our incident detection by detecting anomalies within a group of related metrics and correlate those with anomalies found in logs.

Read More

What Is an ML Detected Software Incident?

Based on our experience with hundreds of incidents across nearly a hundred unique application stacks, we have developed deep insights into the specific ways modern software breaks. This led us to thoughtfully design a multi-layer machine learning stack that can reliably detect these patterns, and identify the collection of events that describes each incident. In simple terms, here is what we have learned about real-world incidents when software breaks.

Based on our experience with hundreds of incidents across nearly a hundred unique application stacks, we have developed deep insights into the specific ways modern software breaks. This led us to thoughtfully design a multi-layer machine learning stack that can reliably detect these patterns, and identify the collection of events that describes each incident. In simple terms, here is what we have learned about real-world incidents when software breaks. You can also try it yourself by signing up for a free account.

Read More

Using Autonomous Monitoring with Litmus Chaos Engine on Kubernetes

A few months ago, our friends at Maya Data joined our private Beta to give our Autonomous Log Monitoring platform a test run. During the test, they used their newly created Litmus Chaos Engine to generate issues in their Kubernetes cluster, and we managed to detect all of them successfully using our Machine Learning completely unsupervised. Needless to say, they were impressed!

Read More

The Future of Monitoring is Autonomous

Monitoring today is extremely human driven. The only thing we’ve automated with monitoring to date is the ability to watch for metrics and events that send us alerts when something goes wrong. Everything else: deploying collectors, building parsing rules, configuring dashboards and alerts, and troubleshooting and resolving incidents, requires a lot of manual effort from expert operators that intuitively know and understand the system being monitored.

TL;DR

Monitoring today puts far too much burden on DevOps - requiring too much time and skill to build and maintain complex dashboards and fragile alert rules. Fortunately unassisted machine learning can be used to autonomously detect and find the root cause of critical incidents. Please read about it below, or try our free Autonomous Monitoring Platform now and start auto-detecting incidents in less than 5 minutes.

 

The Longer Version

 

Monitoring today is extremely human driven. The only thing we’ve automated with monitoring to date is the ability to watch for metrics and events that send us alerts when something goes wrong. Everything else: deploying collectors, building parsing rules, configuring dashboards and alerts, and troubleshooting and resolving incidents, requires a lot of manual effort from expert operators that intuitively know and understand the system being monitored.

Read More
AUTONOMOUS INCIDENT
DETECTION
 
TRY IT NOW!
 
FREE SIGN-UP