Running GatsbyJS using Amazon S3 and CloudFront

As I've progressed through learning more about Gatsby, I've been particularly interested in all of the different approaches people are taking to host their staticaly generated sites.

The initial problem

In my attempt at implementing proper browser caching on Github Pages (cache-control headers), I realized it wasn't actually possible (or I couldn't figure it out properly). Even after looking at how Netlify handles properly generating caching headers for its files, I ended up running into the issue of being unable to apply those headers to the responses of those assets when being served.

While that might not be the biggest deal breaker out there, I found myself wanting to at least have the ability to tweak as I saw fit.

Where else could I host it?

Doing a Google search of static site hosts provides plenty of options, but Gatsby even lists several out on its own website:

static-hosts

AWS has always been a particular black box to me. I've only dabbled with it briefly at work because we primarily use Heroku for all of our application needs (it's great and hides so much of the complexity).

I figured this would be a great excuse to actually have my own AWS account and start utilizing the services as they're intended!

The architecture

In order to get this current site to be hosted entirely on AWS, I had to use the following four services:

  1. S3 (short for Simple Storage Service) to host the actual files and assets
  2. CloudFront to provide a global CDN to serve said files efficiently and quickly
  3. Route53 to handle routing the domain
  4. Lambda@Edge to handle the nitty-gritty details of running a static website

S3 configuration

We need a place to store our actual files, and S3 makes it incredibly easy (and cost efficient) to do so by letting us create "buckets". I created a bucket called www.bayphillips.com (more to come on this naming convention) and hosted it in my personal region. No need to set this bucket's policy to public.

s3-bucket-name

S3 also provides static site hosting out of the box. While we need more functionality than it provides, we do need to enable it for our bucket. This is done by simply going to the bucket's Properties tab, clicking on the Static website hosting item, selecting Use this bucket to host a website and entering the following:

s3-static-site-config

One last, but important, note about configuration: by default, S3 will not pass along most response headers when being served via CloudFront. In order to perform proper gzip compression when serving the assets, S3 must pass along the Content-Length header.

Fortunately, S3 makes this easy with its CORS configuration editor. By going to the bucket's Permissions tab, selecting the CORS configuration button, you can add in the appropriate header to be included:

content-length-header

This was will help things out in the next section.

Deploying to S3

Putting your built Gatsby site onto S3 is quite easy. I added a new script to my package.json called deploy that uses the AWS CLI.

"scripts": {
  "deploy": "aws s3 sync public s3://www.bayphillips.com --acl public-read --delete"
},

Breaking down what's happening there:

  • We're syncing everything in our public folder (the default ouput location of gatsby build) to our S3 bucket we created and configured earlier
  • We're setting all of the uploaded files to public-read, meaning they can be viewed when requested via a browser
  • We're also passing in the --delete parameter, telling S3 to delete any files that are in our bucket but are no longer in our public directory. This should keep things nice and tidy 🤞

Now that our files are being hosted properly and are accessible, we need to throw them behind a CDN.

Cloudfront

To get started, we want to create a new Cloudfront web distribution and set the S3 bucket that we created above as its Origin Domain Name. This means it'll be primarily serving files from that one particular bucket!

The other important settings you'll need to set are:

  1. I chose Redirect HTTP to HTTPS for my Viewer Protocol Policy in order to require HTTPS for my requests while not preventing HTTP requests from working.
  2. For Compress Objects Automatically, select Yes. This will allow our requests to be compressed! This is also made possible from above when we allowed the Content-Length header to be passed along from the S3 origin.
  3. In the Distribution Settings there's a place to specify Alternate Domain Names (CNAMES). Here I put in both bayphillips.com and www.bayphillips.com.
  4. I also created a custom SSL certificate in the SSL Certificate location in order to provide TLS for my domain via Cloudfront.
  5. Set the Default Root Object to index.html. If you don't specify this, Cloudfront will serve up a XML document listing out the contents of your S3 bucket. Gatsby uses the index.html as its route. To support our other routes, keep going to the Lambda@Edge section below.

Once your distribution is created, go to the Error Pages tab and Create Custom Error Response.

custom-error

By default, a 404 will return a 403 response from S3. We want to ensure we show our proper 404 handling.

Route53

In order to serve up our Cloudfront distribution via our domain, I created what's called a Hosted Zone on Route53. This zone allows me to set custom DNS nameservers (my domain is registered on Google Domains) to point to AWS in order to serve my website.

This hosted zone also allows me to have custom SSL certificates, used for my Cloudfront distribution, to be served on my domain 😎.

Once the zone was created, I had to create two Record Set. These sets tell Route53 to point my domain(s) to a specific feature that AWS provides, namely, our Cloudfront distribution!

Clicking on Create Record Set shows a new form. Selecting the type as A - IPv4 address, and selecting yes to Alias, I'm able to target my Cloudfront distribution.

cloudfront-distribution-route53

You should also create another Record Set for the www part of your domain and have that similarly alias to your Cloudfront distribution!

Now, you can view your Gatsby powered website on your secured domain that's being served by a global CDN, great!

... except for when you try to goto any other page. You'll get errors when attempting to goto, say /about/, because Cloudfront isn't aware that it needs to look for a index.html by default (outside of the root object which we specified earlier).

This is where Lambda@Edge comes in.

Lambda@Edge

This service was entirely new to me. I was very fortunate to find this excellent blog post over at XIMEDES which provides a ton of great information.

My basic understanding is this: AWS provides us the ability to attach Lambda functions (snippets of code we can run independent of having a server/instance running) to different parts of our Cloudfront and S3 request and response lifecycle.

There are two main use cases for us here to make our Gatsby v2 blog work properly:

  1. Detect when a request is coming in for a page, and tell Cloudfront to look for the corresponding index.html of that folder.
  2. Apply proper caching headers to our assets in order to not follow the standard caching policy of 24 hours that Cloudfront provides out of the box.

In order to make sense out of this, Amazon provides this really helpful image:

cloudfront-events-that-trigger-lambda-functions

From the documentation:

You can use Lambda functions to change CloudFront requests and responses at the following points:

  • After CloudFront receives a request from a viewer (viewer request)
  • Before CloudFront forwards the request to the origin (origin request)
  • After CloudFront receives the response from the origin (origin response)
  • Before CloudFront forwards the response to the viewer (viewer response)

Appending index.html to page requests

To start, I head over to the Lambda service within AWS, in the same region as my Cloudfront distribution, and create a new named gatsbyAddIndexHtml because, well, that's what it's doing! I also set it to be running the latest NodeJS version they offer (8.10 at the time of writing this).

From there, I add in the following code into the editor to ensure we're looking for the right file:

exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const uri = request.uri;

  if (uri.endsWith('/')) {
    request.uri += 'index.html';
  } else if (!uri.includes('.')) {
    request.uri += '/index.html';
  }

  callback(null, request);
};

Then, I publish this version of the Lambda as a new version and name it with some recognizable format, like 1 (crazy, I know).

Once published, select the newly published version from the dropdown, but make sure you don't have $LATEST selected. You cannot add Cloudfront triggers to the $LATEST version.

Once you've selected your version, select Cloudfront from the list of available triggers which will show a Configure triggers form. Select your Cloudfront distribution that you created previously, and ensure you've selected the Origin request Cloudfront event. This code snippet will be called when Cloudfront is about to request your data from the S3 bucket. So instead of requesting the root directory of that folder (not allowed!), it'll append the index.html to its request, unbeknownst to the user.

origin-request-lambda

Check the Enable trigger and replicate checkbox and add your trigger! Before leaving Lambda, let's also add the cache header function as well!

Apply additional headers

Similiar to the above, let's create a new function called gatsbyCacheHeaders, with similar settings. In the code editor, add the following:

'use strict';
exports.handler = (event, context, callback) => {
  const request = event.Records[0].cf.request;
  const response = event.Records[0].cf.response;
  const headers = response.headers;

  if (request.uri.startsWith('/static/')) {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=31536000, immutable'
      }
    ];
  } else {
    headers['cache-control'] = [
      {
        key: 'Cache-Control',
        value: 'public, max-age=0, must-revalidate'
      }
    ];
  }
  
  headers['vary'] = [
    {
      key: 'Vary',
      value: 'Accept-Encoding'
    }
  ];

  callback(null, response);
};

This will provide long-term caching for anything in the /static directory, while ensuring all other files aren't stuck behind the Cloudfront's default caching methodologies.

When adding your Cloudfront trigger, ensure your trigger is set on the event type of origin-response this time. This means we want Cloudfront to think S3 is providing these headers in order for it to process the file accordingly.

SUPER BIG CALLOUT

I am running Gatsby v2, which is using Webpack 4, which allows us to handle better cache-control to our bundled assets. I haven't figured it out quite yet, but we can follow a similiar strategy that Netlify uses to create a mapping file of generated files to how long they should be cached, and apply it similarly here. This is definitely another blog post in order to completely figure this out!

One note - I also added in the addition of the Vary: Accept-Encoding header here. This seemed to be a benefit for website performance tests.

The final step for these Lambda@Edge functions

Now that we've created these functions, and associated them with the correct Cloudfront events, we now need to tell our Cloudfront distribution to call these functions at the right time.

Before leaving Lambda, copy down the ARN (Amazon Resource Name) for each of the functions you'll be using, for the version that's configured with your triggers. You can find this ARN at the top right of the page when viewing your function:

lambda-arn

Head to Cloudfront and go to your distribution. Click on the Behaviors tab and edit the behavior that should already exist.

At the bottom of the edit page, there's a section called Lambda Function Associations. This is key! I didn't know to do this for a long time and nothing was firing.

Add two new associations, one for Origin Request and the other for Origin Response. Paste in the ARNs from the corresponding functions from earlier. Remember, the appending index.html one goes to the Origin Request while the headers one goes on the Origin Response. It should look something like this!

cloudfront-functions-set

Submit those edits and you're done!


Alright, that should be what it takes to get this all working. Running a Lighthouse audit in Chrome results in the following for my site at this time:

performance-audit

There's still room for optimizations, but that's for another post.