Using machine learning to shine a light inside the monitoring black box

October 24, 2019 | Ajay Singh

A widely prevalent application monitoring strategy today is sometimes described as “black box” monitoring. Black box monitoring focuses just on externally visible symptoms, including those that approximate the user experience. Black box monitoring is a good way to know when things are broken.

Read More

Using ML and logs to catch problems in a distributed Kubernetes deployment

October 3, 2019 | Ajay Singh

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers.

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers. You need to centralize logs, but that comes with its own problems. The sheer volume can bog down the text indexes of traditional logging tools. Centralization also adds confusion by breaking up connected events (such as multi-line stack traces) in interleaved outputs from multiple sources.

Read More

Catching Faults Missed by APM and Monitoring tools

August 19, 2019 | Ajay Singh

As software gets more complex, it gets harder to test all possible failure modes within a reasonable time. Monitoring can catch known problems – albeit with pre-defined instrumentation. But it’s hard to catch new (unknown) software problems. 

A quick, free and easy way to find anomalies in your logs

 

Read More

Getting anomaly detection right by structuring logs automatically

June 20, 2019 | Ajay Singh

Observability means being able to infer the internal state of your system through knowledge of external outputs. For all but the simplest applications, it’s widely accepted that software observability requires a combination of metrics, traces and events (e.g. logs). As to the last one, a growing chorus of voices strongly advocates for structuring log events upfront.

 

Observability means being able to infer the internal state of your system through knowledge of external outputs. For all but the simplest applications, it’s widely accepted that software observability requires a combination of metrics, traces and events (e.g. logs). As to the last one, a growing chorus of voices strongly advocates for structuring log events upfront. Why? Well, to pick a few reasons - without structure you find yourself dealing with the pain of unwieldy text indexes, fragile and hard to maintain regexes, and reactive searches. You’re also impaired in your ability to understand patterns like multi-line events (e.g. stack traces), or to correlate events with metrics (e.g. by transaction ID).

Read More

Featured Posts