Slack’s mission is to make your working life simple, pleasant and productive, and by all indications, they’re making serious progress. When we last checked in, an estimated 500,000 people per day were using Slack. With a user base that doubles every three months, Slack’s Ops team has to be smart to keep pace. We caught up with Richard Crowley, director of operations at Slack, to talk about why he chose Librato and what Slack uses us for.
Why did you decide to implement Librato?
I’ve always been a big proponent of metrics software, but have run enough big Ganglia instances to know the pain and time sink that can come with it. We needed a new tool in addition, and implementing Librato was very liberating - it can express application and cluster metrics much better.
How do you use Librato?
We use StatsD for our PHP app to send business- and app-health metrics into Librato. In addition, one of the Slack databases now breaks down connect, query and fetch latencies for each of our server clusters, and we chart all that data in Librato. We also have backend services written in Go that emit internal metrics about that system’s health. We run in parallel metrics from app side to time the full request and response cycle. Having those graphs side-by-side lets us clearly see and compare the effects of network latency on one of our new services, and that’s very cool.
What are some of your favorite Librato features?
The fact that you can aggregate all sources of the metric is a first class feature. Librato doesn’t treat the same metric from two hosts as two different things, which is fantastic.
We’ve been using the new Spaces UI heavily - I love the speed at which you can throw together something enlightening. Composing dashboards and charts is a breeze, and it’s super easy for us to see API-level failures and latency spikes. The “toolkits”, particularly the vertical bar that follows along everywhere, are an excellent touch.
Any comments on how Librato has benefited the business?
Librato increased visibility into problems that actually impact users immensely: for example, the underlying cause of a 500 response, or a cause of perceived or actual slowness.
Librato makes it easy for developers to create instrumentation. You showed us that if it’s easy to instrument your app, engineers will do it. Engineers really engaged with metrics equals better availability, security and performance of our site.
Librato can help your app run smoothly. Try us out - it’s free.