A few months ago, our friends at Maya Data joined our private Beta to give our Autonomous Log Monitoring platform a test run. During the test, they used their newly created Litmus Chaos Engine to generate issues in their Kubernetes cluster, and we managed to detect all of them successfully using our Machine Learning completely unsupervised. Needless to say, they were impressed!
Monitoring today is extremely human driven. The only thing we’ve automated with monitoring to date is the ability to watch for metrics and events that send us alerts when something goes wrong. Everything else: deploying collectors, building parsing rules, configuring dashboards and alerts, and troubleshooting and resolving incidents, requires a lot of manual effort from expert operators that intuitively know and understand the system being monitored.
Monitoring today puts far too much burden on DevOps and developers. These teams spend countless hours staring at dashboards, hunting through logs, and maintaining fragile alert rules. Fortunately, unsupervised machine learning can be applied to logs and metrics to autonomously detect and find the root cause of critical incidents. Read more below, or start using our Autonomous Monitoring Platform for free - it takes less than 2 minutes to get started.
Monitoring today is extremely human driven. The only thing we’ve automated with monitoring to date is the ability to alert on rules that watch for specific metrics and events that occur when something known goes wrong. Everything else - building parsing rules, configuring and maintaining dashboards and alerts, and troubleshooting incidents - requires a lot of manual effort from expert operators that intuitively know and understand the system being monitored.
In our last blog we discussed the need for Autonomous Monitoring solutions covering the three pillars of observability (metrics, traces and logs). At Zebrium we have started with logs (but stay tuned for more). This is because logs generally represent the most comprehensive source of truth during incidents, and are widely used to search for the root cause.
In our last blog we discussed the need for Autonomous Monitoring solutions to help developers and operations users keep increasingly large and complex distributed applications up and running.