If you’re running a multi-node Elasticsearch cluster checkout Automattic’s fork of the Elasticsearch StatsD Plugin for pushing cluster and node metrics to StatsD.
At Automattic we’ve been using a set of Munin scripts to collect and aggregate Elasticsearch metrics via its native node & stats REST APIs. This method works relatively well giving us enough longitudinal information about the cluster and nodes to diagnose issues or test optimizations. That said, cluster monitoring with Munin at a 5 minute resolution leaves a lot to be desired.
First and foremost, our Elasticsearch cluster is spread across 3 data centers each with it’s own Munin instance. This makes collecting and aggregating even simple metrics like cluster wide load quite difficult. In addition, due to the polling nature of metrics collection with Munin we are limited to a somewhat corse resolution of 5 minutes. While this is good enough for looking at time series data over the course of a day or week it’s quite time-consuming to wait 5 or 10 minutes for graphs to update when deploying changes or testing performance optimizations.
We’ve already had some good experience instrumenting our PHP stack with StatsD and building dashboards with Grafana so reusing that infrastructure for Elasticsearch metrics seems like a good fit. Our fearless leader Barry suggested we tryout the Elasticsearch StatsD Plugin from Swoop Inc. however upon closer inspection we found that it does not deal with clustered Elasticsearch environments well and have yet to be updated to work with ES 1.x. So over the past week we’ve forked and rewritten much of it to suit our needs.
The Automattic Elasticsearch StatsD Plugin is designed to run on all nodes of a cluster and push metrics to StatsD on a configurable interval. Once installed and configured each node will send system metrics (e.g. CPU / JVM / network / etc.) about itself. Data nodes can also be configured to send metrics about the portion of the index stored on itself. Finally, the elected master of the cluster is responsible for sending aggregate cluster metrics about the index (documents / indexing operations / cache sizes / etc.). By default the plugin will send the total cluster aggregate as well as per index metrics for indices however granularity can be configured to report down to the individual shard level if so desired.
Check it out on GitHub: Elasticsearch StatsD Plugin.
It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on.
So without further ado I like to introduce:
Heroku WP — A template for HHVM powered WordPress served by nginx.
There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are:
- It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain.
- It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno.
- It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted.
- It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking.
Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on:
- NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
- HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement.
I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks.
While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:
Over the past few months I’ve been working with the Elasticsearch cluster at Automattic. While we monitor longititudinal statics on the cluster through Munin when something is amiss there’s currently not a good place to take a look and drill down to see what the issue is. I use various Elasticsearch plugins however they all have some downsides.
ES Head is fantastic for drilling down into what is happening down to a shard level however its rendering is way to bulky. Once there is over a dozen nodes or indices in a cluster it becomes a scrolling nightmare.
Another tool that I use often SegmentSpy gives lots of info about the underlaying Lucene segments however the use of logarithmically scaled stacked bar charts tends to make it hard to estimate the deleted documents ratio in each shard. In addition it’s hard to drill down to just one shard group to figure out what’s going to happen when nodes restart and shard recovery kicks off on a per segment basis.
I’ve taken all that I wished I could do with both of those plugins and created a new Elasticsearch plugin that I call Whatson. This plugin utilizes the power of D3.js to visualize the nodes, indices, and shards within a cluster. It also allows the drilling down to segment data per index or shard. With the focus on visualizing large clusters and highlighting potential problems within. I hope this plugin helps others find and diagnose issues so give it a try.