Velocity SC: The Most Common Questions — Librato Blog

Velocity SC: The Most Common Questions

I have never speed-dated, but from what I've seen on TV, I imagine it’s very similar to running a booth at a conference like Velocity in Santa Clara, where Librato was recently a gold sponsor. We engaged hundreds of people in rapid-fire sessions at our booth, attempting to wrap up who we are, what we represent and what we’re trying to accomplish—in a mere smattering of minutes.

It is a comically impossible task. Although we Librato booth-dwellers thoroughly enjoyed every conversation, we sometimes wished we had hours and not minutes to talk to everyone about metrics and monitoring, so we thought it might be a good idea to follow up on a few of the most frequent questions we heard at Velocity, along with more verbose answers than the speed-dating vendor-booth setting allows.

So you’re a dashboard?

It’s easy to define us by our dashboard interface because our metrics UI is beautifully wrought, easy to use, and extremely featureful, but it’s really just the face of a well engineered signal-processing and persistence system.

Behind our API resides a microservices architecture that allows us to stream-process the input signal in parallel to do things like service-side aggregation, detecting and alerting on problems or computationally transforming signals based on user-specified criteria. Each measurement is then safely and reliably persisted in our storage tier. This entire infrastructure—including the persistence layer—is horizontally scalable and fast, usually returning from an API-write call in under 100 milliseconds.

What? No agent?

We don’t have a collection agent, which can be a bit confusing for prospective customers to wrap their heads around. This isn’t an oversight on our part. In fact, we haven’t rolled a collection agent because we’re philosophically opposed to them. We don't think you should be forced to install anything on your servers you don't want. Our opinion is that you should be able to measure anything you want using the tools that work best for you.  We have built-in integrations with 65 open source data collectors and bindings for 10 different languages (more on the way!), and our REST-API makes it very easy to emit metrics to us directly from your code in every popular language.

What does that graph mean?

Many people paused to look at the dashboard we were displaying on the monitor in our Velocity booth, and had technical questions about what they saw there. Here are a few of the instruments we were showing in that dashboard, and our reasoning for showing them off.

The instrument above is showing that 95% of our worker threads on 6 different hosts are successfully fetching the database objects they need in under 200ms. We’re simultaneously depicting the max (red), min(green), average(yellow) and breakout(blue) of this data across all 6 hosts. If you want to monitor ephemeral sources like worker threads in a process, computing and emitting a percentile like this is a great way to preserve the integrity of your data.

In the following instrument we’ve overlayed an annotation stream that documents incidents of various cron jobs running in our back-end.  The start time of each cron job is depicted as a vertical line. Some jobs, like the one casting a yellow area, are long-running. These jobs update the annotation stream with both a start and end time, which makes it possible for us to visualize the total runtime of the job. In this graph, therefore, we can see which back-end processing jobs have a meaningful effect on read operations inside our Cassandra Ring.

In the next two instruments, we see two classic use cases for stacking data streams. The first instrument, can be thought of as a sort of unraveled pie chart. You get to see not only the total amount of HTTP 200’s being served, but also the fraction contributed by each server.

In the graph below, we are stacking percentiles from a single data stream. You can think of each layer as a club that individual measurements belong to. The bottom layers are exclusive -- only 75% of our measurements belong to them -- while the upper layers are less exclusive (all data belongs to the top group). Stacking percentiles lets us visualize the statistical distribution of our measurements, and makes it easy to get a feel for how many individual measurements contributed to a spike.

Mind Bending Conversation

All in all, we talked to a varied and interesting group of attendees at Velocity, from bioinformatics engineers to APM resellers. These diverse interactions help to shape our understanding of what Librato is at a fundamental level as well as what it could grow to be in the future. It’s satisfying to realize that Librato is flexible enough to accommodate so many unique types of needs.

Did you visit our booth at Velocity? Even if you didn't have time to visit us, we'd love to start a conversation. Come visit us today to discuss your monitoring needs.