Code Couture: From Metrics to Fashion with Stitch Fix — Librato Blog

Code Couture: From Metrics to Fashion with Stitch Fix


Stitch Fix is an online fashion retailer with a unique concept: rather than just click-and-buy, customers fill out an online style profile and receive a selection of items picked by a personal stylist. Behind the complex platform is a powerful IT infrastructure built by a dynamic group of PhD statisticians and engineers.

We talked to Dave Copeland, Director of Engineering at Stitch Fix, to learn how he and his team keep the online platform healthy and seamlessly integrated with complex warehouse logistics.

Tell us about your experience with monitoring  

Before Librato, we were using a hodgepodge of tools, which was by design difficult to manage. We did not have a visualization tool either. Since we are running on Heroku, I decided to try the Graphite add-on, but was unhappy with it. The integration required too many code changes and was time consuming, whereas I was looking for a clean, low-maintenance means of augmenting our operational visibility.

Why Librato?

I first heard about Librato at a Heroku conference in San Francisco. The visualization looked impressive, so we decided to give it a shot. The integration was super simple: instead of hours on Graphite, I had alerts set up within minutes. Librato's turnkey integration immediately begins monitoring myriad dyno-level metrics, so by just turning it on, the engineering team began to see live performance data from across their infrastructure. We started off with a bunch of business metrics to see how things would work, and seeing that they worked impeccably, migrated all our alerts to Librato.

How does Librato fit in with your infrastructure?

Our entire engineering team of 20+ uses Librato, because having all alerts in one system ensures that alerts are routed to the right engineering teams, circumventing alert fatigue and waste of resources. Runbooks for setting up any new application include instructions on how to set up Librato alerts. We also use Librato to monitor database connections and background workers. With all of this, Librato has ensured that our customer experience does not suffer.

In short: Librato is a key piece of our infrastructure.

What is your favorite Librato feature?

The ability to instrument my code. Before Librato, as a developer, I had no visibility into operations. Even with Graphite/StatsD I had to rely on operations engineers to set up my metrics. Now, all our engineers can independently instrument their code and have instant visibility into key metrics that are relevant to their work.

It’s a shift in thinking about the development process: we think about what we want to monitor before the code is even written, which means that we detect problems earlier. Basically, Librato’s way of dealing with code enables us to ward off catastrophes.

To learn how Librato can help your engineering team, sign up for a free trial today.