Kubernetes distributed apps can have complex failure modes. Our machine learning can catch these failures without any manually defined rules! It detects problems that cascade across microservices and combines them into a single incident report with details of root cause.
Zebrium machine learning works across existing logs and Prometheus metrics. Our open source forked instance of Prometheus achieves near real-time metrics updates, captures labels for correlating with logs, handles out of order samples and achieves > 500x bandwidth reduction.
The Grafana tab in the Zebrium UI lets you leverage the power of one of the word's most popular visualization and analytics tools.
Easily create dashboards from rich metrics and event types made available in the Zebrium platform. And you can even build real-time visualizations using metrics that are embedded in logged events.
See everything at once or use powerful filters (e.g. cluster, deployment, pod, container, logtype, etc.) to see just what you want. Plus regex search & more.
ML categorizes events by type and auto-extracts variables (strings, floats, ints, IP addr., etc.). Works with any app. No more manual parsing rules!
Easily select and chart any metric with a click and view correlations across different time-series. Metric charts are also included in incident reports where applicable.