Catching Faults Missed by APM and Monitoring tools

A quick, free and easy way to find anomalies in your logs



As software gets more complex, it gets harder to test all possible failure modes within a reasonable time. Monitoring can catch known problems – albeit with pre-defined instrumentation. But it’s hard to catch new (unknown) software problems if you need pre-instrumentation and are only looking for known failure modes.


What’s needed is a way to learn normal software patterns automatically, and reliably detect abnormal ones. There have been earlier attempts to do this, but a good solution is hard - it needs to work close to real time, work without (impractical) training and not annoy developers with too many false positives.


Our team has thoughtfully worked on these challenges for a long time and has built something that truly works. We improve the accuracy of pattern learning by first learning the foundational “dictionary” of all unique event types generated by your software. Our ML can do this with surprisingly little data (as little as a couple of MB, although more data obviously helps). It extracts all event structure in near real time, including typed variables and metrics embedded in logs. Building this structured event dictionary lets us accurately learn the normal patterns of every unique event type, and allows us to perform very reliable detection of “anomalies” – when events break pattern. Factors considered include: occurrence of a new event type, change in frequency or periodicity, severity, and correlations between anomalies within one or more files or streams. By fully learning (and continuing to adapt to) event structure, our software is also the perfect building block to capture known failure patterns.


It all sounds good, but how do I know it will really work for me?


From a user’s perspective, it can be hard to verify claims about the effectiveness of machine learning and anomaly detection. Positive anecdotes from other users may not apply to your application. So the easiest way we could come up with is to provide you with free access to our technology so you can see how it performs with your own data.


Click here to get started. 

Tags: log files, anomaly detection, devops, observability