As scale and complexity grow, there are diminishing returns from pre-deployment testing. A test writer cannot envision the combinatoric explosion of coincidences that yield calamity. We must accept that deploying into production is the only definitive test.
Software log messages are potential goldmines of information, but their lack of explicit structure makes them difficult to programmatically analyze. Tasks as common as accessing (or creating an alert on) a metric in a log message require carefully crafted regexes that can easily capture the wrong data by accident (or break silently because of changing log formats across software versions). But there’s an even bigger prize buried within logs – the possibility of using event patterns to learn what’s normal and what’s anomalous.
Why understand log structure at all?