Zebrium Blog

Using machine learning to shine a light inside the monitoring black box

October 24, 2019 | Ajay Singh

A widely prevalent application monitoring strategy today is sometimes described as “black box” monitoring. Black box monitoring focuses just on externally visible symptoms, including those that approximate the user experience. Black box monitoring is a good way to know when things are broken.

Read More

The hidden complexity of hiding complexity

October 22, 2019 | Gavin Cohen

Kubernetes and other orchestration tools use abstraction to hide complexity. Deploying, managing and scaling a distributed application are made easy. But what happens when something goes wrong? And, when it does, do you even know?

Kubernetes and other orchestration tools use abstraction to hide complexity. Deploying, managing and scaling a distributed application are made easy. But what happens when something goes wrong? And, when it does, do you even know?

Read More

Using ML and logs to catch problems in a distributed Kubernetes deployment

October 3, 2019 | Ajay Singh

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers.

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers. You need to centralize logs, but that comes with its own problems. The sheer volume can bog down the text indexes of traditional logging tools. Centralization also adds confusion by breaking up connected events (such as multi-line stack traces) in interleaved outputs from multiple sources.

Read More