Historically, monitoring tools have put a lot of burden on the user to detect and root cause incidents - requiring continual management of alert rules, visual dashboard scanning, iterative drill down and frequent log searches. This hurts Mean-Time-To-Resolution (MTTR) and causes some new (unknown) failure modes to escape undetected.
Fortunately, new approaches that use machine learning to perform anomaly detection on logs and metrics have been developed to solve this problem. They automatically detect patterns in logs and metrics that catch incidents and correlate them with the root cause. These approaches are built to work at scale and can finally turn logs and metrics into a more proactive monitoring solution.