2013, An Application Monitoring Retrospective — Librato Blog

2013, An Application Monitoring Retrospective



Trapped in this anti-time (or maybe ante-time) as we are, and facing the inevitability of a brand new year -- having only barely digested all that the last one brought -- it's natural, I think, that this time of year evokes in us a curious sense of nostalgia for shit that really only happened a few weeks ago. It seems to grow in us all -- this sentimental little cookie-monster-like creature, hungry for campy retrospective meta-history. We are never dismayed by it's appearance, and happily feed it articles and funny animations, and even more articles.

Let's feed the beast shall we, and retrospectify the monitoringosphere together; I'll go first.

A Few Trends

IT trends are difficult to talk about in general, because they're sort of emergent in nature. Like the dawning comprehension that you are covered in cat hair, it's hard to know after the fact just exactly when it happened. Luckily for us, there is no 2012 monitoringosphere retrospective from which we can be accused of plagiarizing, so I'll fire away with some trends that I've been noticing recently.

The line between application monitoring and analytics is getting blurry

Business Intelligence people are becoming aware of systems monitoring folks and vice versa. This is probably a healthy side-effect of devops, and it brought some analytics and data scientist types to our meet-ups and conferences in 2013. They seem to be excited by the prospect that systems and application monitoring can, in some cases provide metrics and real-time business intelligence of the sort that is helpful to them. As the worlds collide, we're noticing a few problems. A couple of them are:

  1. Business Intelligence people often get views from Ops that they don't understand.
  2. Real business intelligence data often requires back-end processes to synch with slower systems like POS terminals and etc... before they become accurate, so Ops metrics are often predictive rather than factual, and we don't have great models to make use of that predictive data yet.

The job titles are all confusing now

This predates 2013, but I think it's fair to say we witnessed a more rapid and expansive cladogenesis in professional nerd taxonomy over the course of the year than we did previously. Honestly it's been dizzying. How is a blogger supposed to refer to us now? Sysops? Webops? Developer? Programmer? Do we even know?

One thing is certain: the classical silos of "operations" and "development" no longer make sense to a growing majority of web-operations engineers, who seem to be relying instead on a special-purpose developer genre focused on the combination of uptime, monitoring, release control, and automation.

Some people are pointing at this role and (incorrectly) calling it 'Devops', and others like Google are giving it a name (SRE is the Google moniker). The nexus of those roles is probably the future for people who have been operations focused in the past, and is what most people probably mean when they say things like "no-ops". Teams that call themselves "no-ops" seem to have one or two developers whose interests align with those roles. Meanwhile, if you talk to the long-beards, they'll cross their arms and argue that as Sysadmins that's what they've been doing all along. Perhaps Ironically, this seems to be coinciding with the rise of dedicated monitoring and telemetry teams (Like the one at Netflix), so that we now have engineers who are very specifically focused on monitoring and metrics collection systems.

Holy interfaces

2013 also brought a more fervid creation of infrastructure abstractions specific to application monitoring, logging and metrics collection. Examples include log-shuttle, l2met and eleventy-billion node scripts.  I think this is fueled by the increasing interest in all the wonderful application instrumentation libraries we have now.  The thrust is to make it trivially easy to throw your stuff in the general direction of standardized, site-wide monitoring and logging systems. Oh, I see you have some metricky stuff there; why don't you shove it in this socket, or call this or that function on it and let me take care of it for you.

One format to rule them all

Systems monitoring is moving away from one-off systems that were designed to monitor a specific thing toward business-wide stream processing engines that generate and operate on common data formats. Right now, companies are mostly building these themselves. Examples include Jeff Weinstein's talk on structured logs and Heroku's shh to logplex to l2met pipeline.

The goal is to source input in a common format from everywhere and output to:

  • Graphs
  • Correlation engines
  • Notification and alerting systems
  • Batch analytics
  • Long term storage
  • Forensics tools.

Monitoring; not just an afterthought anymore

With the use of heroku and RDB instances that provide no means of installing monitoring agents, in-application monitoring and instrumentation is simply a critical necessity. If you don't have it, you have no visibility. Monitoring, as a result is becoming more and more a part of test-driven release-control, and less a red-headed stepchild that we forgot to plan for.

A preponderance of metrics

Initially the call was for 'measure everything', but many of us, I think, are beginning to realize that might not have been an optimal rallying cry.

Wither RRD

Here I'm referring to the emergence of Database systems designed specifically for time series data that have no visualization front end. See for example influxdb.

My short list of highlights

This is already running long, so here's a very short list of posts and talks that I really enjoyed this year, along with some unforgettable moments (good and bad).

@aphyr Jepson: Call Me Maybe

If you haven't read Kyle Kingsbury's series exploring the, ahem, guarantees made by the various modern distributed data stores, you're missing out on the only URL on the internet that will simultaneously give you the gifts of data science and pop music. His Carly Rae Jepson metaphor may be tenuous, but the result is hands down the best thing I've read all year (and probably in several years). The series is scientific, hilarious, surreal, and horrifying, and often all at the same time. If our every claim were subjected to this transparent and competent an examination, the world would be a quieter place.

Brendon Gregg: Flame Graphs

If someone asked Brendon Gregg to find a needle in a haystack, I imagine he would grab the haystack by the collar and punch it in the face until it gave up the needle, it's wallet and car keys, and then thanked Brendon for giving it the opportunity to comply. The bottom line here is that to the extent bending computers to our will is a yardstick of hacker success, Gregg wins. Period. His blog is legendary, and his talk on flame graphs at Lisa made me wish all computers were Illumos. We should be grateful that he didn't go into behavioral psychology, because if he had, he would surely by now have perfected mind-control (or has he?).

Todd Underwood to LISA: you're all probably fired

I have to mention Todd Underwood's Plenary session at LISA this year for no other reason than the sheer audacity it took to give a no-ops talk as the closing session at a systems administration conference. I mean it was a pretty good talk, but WOW, if I had a choice between crossing the Russian Mafia, and a room full of 1000 of the worlds best systems administrators... pretty sure I'd go with the former.

Not really Monitoring related, but must be mentioned

Well that about wraps it up, I'm sure there's a lot I'm missing here, so please feel free to add more in the comments. What’d I miss?