“Zebrium is super cool. It tells us all about important things that no-one knows are going on. Incidents that used to take three hours to track-down now only take a few minutes.”
Jason Johnson – Senior Vice President of Information Technology
Sweetwater is the world's leading music technology and instrument retailer. They are known for providing exceptional customer satisfaction with industry leading warranties and after sales service. Sweetwater has over 2200 employees with 58 technology staff and has been growing at 110% year over year.
Sweetwater runs a leading-edge hybrid cloud environment with approximately 300 Ubuntu virtual machines on VMware and 40 Kubernetes workloads using Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP). Many applications, like the one that matches bar code scans in their distribution center, are mission critical.
Prior to implementing Zebrium, any kind of problem or outage resulted in tying up key engineers who first had to locate relevant logs (no easy feat in a complex environment) and then manually search to determine root cause. Identifying root cause took an average of three hours per incident.
Several months ago, after experiencing a difficult incident, their frustration came to a head. “We had a problem in an environment where we were creating and destroying 10 Kubernetes deployments each hour. It was really hard to figure out what was going on by tailing and grep’ing log files given that containers were constantly being spun-up and down”, said Jason Johnson, Senior VP of Information Technology, “That’s when I knew we urgently needed a better solution”.
After extensive online research, Sweetwater decided to evaluate Zebrium because it offered something different - the promise of using AI to do what they had manually been doing for so long.
“Within about an hour of installing Zebrium, it generated an incident that showed a problem in a customer facing app. Redis was frequently crashing, but Kubernetes was spinning it back up so quickly that we hadn’t noticed the issue.”
There was enough detail in the Zebrium incident that Sweetwater was able to quickly fix the problem. “We didn’t realize the impact of the problem that had been there for a long time, until we fixed it. The result was a pretty significant performance increase in a customer facing app that we would otherwise have missed.”
Zebrium has continued to prove its value since that time. According to Jason Johnson, “Just today we saw an incident with a PHP notice saying that we needed to generate a new keyfile in Google. Each time this notice was generated, it was writing to disk and slowing things down.”
Sweetwater takes customer service and experience seriously and expects the same from its vendors. “Throughout our evaluation, we provided several pieces of feedback to Zebrium. Not only did they take the feedback seriously, but they were able to implement most of our suggestions within a week or two. And as we used the product, we watched it continually improve with a bunch of other cool features and enhancements”.
Zebrium has now been implemented for all production logs and metrics. Summing up Sweetwater’s experience with Zebrium, Jason Johnson added, “Zebrium is super cool. It tells us all about important things that no-one knows are going on. Incidents that used to take three hours to track-down now only take a few minutes. Plus, if you want to view and search through logs, it also does a really good job of that. But of course, we rarely have to waste time searching logs anymore!”