Using ML and logs to catch problems in a distributed Kubernetes deployment

October 3, 2019 | Ajay Singh

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers.

It is especially tricky to identify software problems in the kinds of distributed applications typically deployed in k8s environments. There’s usually a mix of home grown, 3rd party and OSS components – taking more effort to normalize, parse and filter log and metric data into a manageable state. In a more traditional world tailing or grepping logs might have worked to track down problems, but that doesn’t work in a Kubernetes app with a multitude of ephemeral containers. You need to centralize logs, but that comes with its own problems. The sheer volume can bog down the text indexes of traditional logging tools. Centralization also adds confusion by breaking up connected events (such as multi-line stack traces) in interleaved outputs from multiple sources.

Read More

Featured Posts