Kubernetes apps can have complex failure modes and are hard to troubleshoot. Our machine learning automatically catches these failures and generates root cause reports. No more hunting through logs and dashboards.
See a summary of what happened in plain English!
We use the GPT-3 language model to construct simple to understand summaries of the root cause reports our machine learning generates.
Zebrium machine learning works across existing logs and Prometheus metrics. Our open source forked instance of Prometheus achieves near real-time metrics updates, captures labels for correlating with logs, handles out of order samples and achieves > 500x bandwidth reduction.
See everything at once or use powerful filters (e.g. cluster, deployment, pod, container, logtype, etc.) to see just what you want. Plus regex search & more.
ML categorizes events by type and auto-extracts variables (strings, floats, ints, IP addr., etc.). Works with any app. No more manual parsing rules!
Easily select and chart any metric with a click and view correlations across different time-series. Metric charts are also included in incident reports where applicable.