WordPress on NGINX + HHVM with Heroku Buildpacks

WordPress on NGINX + HHVM

It’s been a year since I last made any major changes to my WordPress on Heroku build and in tech years that’s a lifetime. Since then Heroku has released a new PHP buildpack with nginx and HHVM built in. Much progress have also been made both HHVM and WordPress to make both compatible with each other. So it seems like now is as good a time as any to update the stack this site is running on.

So without further ado I like to introduce:
Heroku WP — A template for HHVM powered WordPress served by nginx.

The Goal

There are numerous other templates out there for running WordPress on Heroku and my main goals for this templates are:

  1. It should be simple — use the default buildpack provided by Heroku so there’s no other 3rd party dependency to implicitly trust or to maintain.
  2. It should be fast — use the latest technologies available to squeeze every last ounce of performance out of each Heroku Dyno.
  3. It should be secure — security is not an add-on, admin pages should be secure by default and database connections needs to be encrypted.
  4. It should scale — just because we can serve millions of page hits a day off a single Heroku Dyno does not mean we’ll stop there. The template should be made with cloud architecture in mind so that the number of Dynos can scale up and down without breaking.

The Stack

Standing on the shoulder of giants I was able to use the latest Heroku buildpack and get WordPress running on:

  • NGINX — An event driven web server that was engineered for the modern day to replace Apache. This high performance web server is preferred by more top 1,000 sites then any other and it’s what’s used by the largest WordPress install out there, WordPress.com.
  • HHVM — HipHop Virtual Machine, a JIT (just in time) compiler developed by Facebook to run PHP scripts which when tested with WordPress showed up to a 2x improvement.

I have yet to run any statical analysis on performance however antidotally it feels a lot faster navigating WP admin and page generation times looks much better. I’m looking forward to running more tests and performance tuning this build in the coming weeks.


Update:
While still not a head-to-head test looking at the response times as reported by StatusCake for this site running on Heroku-WP and a mirror of this site that is running on the old Heroku LAMP stack with no load other then StatusCake pings shows a dramatic improvement:

Stack Max Min
LAMP 3,514 1,166
Heroku-WP 1,351 68

30. June 2014 by Xiao
Categories: DevOps | Tags: , , , ,

Introducing Whatson, an Elasticsearch Consulting Detective

Elasticsearch Whatson

Over the past few months I’ve been working with the Elasticsearch cluster at Automattic. While we monitor longititudinal statics on the cluster through Munin when something is amiss there’s currently not a good place to take a look and drill down to see what the issue is. I use various Elasticsearch plugins however they all have some downsides.

ES Head is fantastic for drilling down into what is happening down to a shard level however its rendering is way to bulky. Once there is over a dozen nodes or indices in a cluster it becomes a scrolling nightmare.

Another tool that I use often SegmentSpy gives lots of info about the underlaying Lucene segments however the use of logarithmically scaled stacked bar charts tends to make it hard to estimate the deleted documents ratio in each shard. In addition it’s hard to drill down to just one shard group to figure out what’s going to happen when nodes restart and shard recovery kicks off on a per segment basis.

I’ve taken all that I wished I could do with both of those plugins and created a new Elasticsearch plugin that I call Whatson. This plugin utilizes the power of D3.js to visualize the nodes, indices, and shards within a cluster. It also allows the drilling down to segment data per index or shard. With the focus on visualizing large clusters and highlighting potential problems within. I hope this plugin helps others find and diagnose issues so give it a try.

GitHub: elasticsearch-whatson

10. February 2014 by Xiao
Categories: Uncategorized | Tags: ,

Static Asset Caching Using Apache on Heroku

There’s been many articles written about how to properly implement static asset caching over the years and the best practices boil down three things.

  1. Make sure the server is sending RFC compliant caching headers.
  2. Send long expires headers for static assets.
  3. Use version numbers in asset paths so that we can precisely control expiration.

Implementing these suggestions on Heroku or elsewhere is super simple and will help not only reduce load but also make subsequent page loads faster. I’ve implemented the following on my Heroku WordPress install which runs on the default Apache/PHP Heroku build pack however you can apply these concepts to other tech stacks as well.

Proper Cache Headers

To properly cache a resource we should always send an explicit cache header that tells the client retrieving the content when the resource expired and a cache validator header. Cache validators are either ETag or Last-Modified headers. The server should send these headers even if the expires headers have already been explicitly set on the request so that browsers may issue conditional get requests with If-None-Match, If-Modified-Since, or If-Range requests. The browser issues these types of requests to validate the content that is already cached locally with the origin server. (This happens when the end-user clicks the refresh button on their browsers.)

For static files Apache will do most of the heavy lifting for us and it will generate both a ETag and a Last-Modified based on the filesystem stats on the static file. Unfortunately due to the way Heroku dynos operates Apache will not set either of these tags properly and will prevent the proper caching of the static asset object.

By default Apache 2.2 generates ETags using the filesystem i-node, last modified time, and file size of the file. With Heroku, when we push any changes our code gets packaged with Apache and PHP into a compiled slug which can then be deployed to dynos as they are spun up and down dynamically on an ephemeral filesystem. This means as Heroku allocates dynos to our app (which may happen at anytime without action from us) the underlying filesystem i-node for any given static file will fluctuate. Even worse, if we have multiple dynos running our app each dyno will report a different i-node value for any particular static file and thus will calculate a different ETag value. The net effect being we end up confusing any downstream browsers / reverse proxies potentially causing them to not cache our content and to respond incorrectly to any If-None-Match requests. To fix this we will need to configure Apache to use only last modified time, and file size to calculate ETag values.

In addition to sending differing ETags for the same content it’s also possible for us to have the exact opposite problem, sending the same ETag for different entities. The W3C designed ETags to identify unique instances of entities which includes the encoding used for transport. This is important because what is actually transferred for the same CSS file served compressed or uncompressed is vastly different our servers should mark the entity as different so that intermediate reverse proxies knows to treat the transferred content as different.

Because Apache generates the ETag based on file system stats any transformations on the file is not taken into account. This means that when mod_deflate is enabled both the deflate encoded and plain instance of each asset will have the same ETag value as calculated by Apache. This is not compliant and could cause reverse proxies to send improperly encoded content via ranged requests. There is a ticket opened for this with Apache but no timeline for a fix. So instead of waiting for a patch it’s better to configure Apache to not calculate ETag values when mod_deflate is turned on and rely on Last-Modified for content validation.

Putting the two things together I have added the following into my httpd.conf for my Heroku app.

# Only use modified time and file size for ETags
FileETag MTime Size

# Don't use ETags if mod_deflate is on (Apache bug)
# https://issues.apache.org/bugzilla/show_bug.cgi?id=39727
<IfModule mod_deflate.c>
    FileETag None
</IfModule>

As a side note, when new code gets pushed to Heroku all files within the app will have the time of the push assigned as the last modified date. This keeps last modified times consistent across dynos but it also means our static assets will send the full content back to the browser instead of a 304 not modified response back for If-Modified-Since requests after deploys. This is not ideal but it’s not super terrible and I don’t know of a simple way to solve this issue.

Dynamic Version Slugs & Long Expires

The easiest way to speed up repeat page loads is to cache page assets like JavaScript and CSS so that as a visitor clicks around the site their browser will only request the updated content from the server. And if those assets has a long enough expires value, the visitor will get the same quick load time when they comes back in a day or week or month. The only problem is if we cache assets for too long on the user’s browser and then decide to change something on the site our visitor will get outdated assets.

The easiest way to fix this issue is to version our assets so that we can specify a long expires value with a unique version number and then simply update the asset version number every time we change the content. Ideally we specify the version slug within the path of the URL instead of simply adding it as a query string because some reverse proxies will not cache resources with a query string even with a valid caching header.

For this site I created a /assets/ subdirectory under the document root and placed the following .htaccess file within it.

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /assets

    # Don't rewrite for actual files
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule . - [L]

    # Remove version slug & rewrite to asset
    RewriteCond %{DOCUMENT_ROOT}/assets/$1 !-d
    RewriteCond %{DOCUMENT_ROOT}/assets/$2 -f
    RewriteRule ^([^/]+)/(.*)$ /assets/$2 [L]

    # Set 360 day expiration on everything
    <IfModule mod_headers.c>
        Header set Cache-Control "max-age=31104000"
    </IfModule>
</IfModule>

This allows me to inject any arbitrary version slug after assets and have that file served up with a long expires time. So if there exists a file on my server with the path assets/js/my_script.js I can refer to it as /assets/v123/js/my_script.js and then simply replace v123 with any other version number I want for caching purposes.

Please note I have set the expires value of assets to 360 days or slightly less than one year because according to RFC 2616

To mark a response as “never expires,” an origin server sends an Expires date
approximately one year from the time the response is sent. HTTP/1.1 servers
SHOULD NOT send Expires dates more than one year in the future.

So if we send something with a max-age of 10 years that may get interpreted as an invalid expires value and invalidate our caching headers.

05. August 2013 by Xiao
Categories: DevOps | Tags: , ,

← Older posts