Host multiple websites with one S3 bucket

Say you have a bunch of websites hosted with S3. You decide they don’t deserve to have a separate bucket each, and you’d like to put them all into one. One bucket to host them all! It can be done, this post explains how.

Technologies used: AWS, S3, CloudFront, Lambda@Edge, DNS.

Overview

The idea is to use Lambda@Edge to match the website domain names to paths in the S3 bucket. If you haven’t worked with Lambda@Edge before don’t worry, it’s simple, I’ll explain it below. The whole Lambda function will be just 5 lines of Python code. Considering we have 3 websites we want to host: example.com, foo.example.com, and bar.example.com, the S3 bucket structure will look like this:

my-s3-bucket/example.com/index.html
my-s3-bucket/foo.example.com/index.html
my-s3-bucket/bar.example.com/index.html

Each domain will have its own folder at the root of the S3 bucket. Each domain folder in turn will have its own index.html and other frontend files.

S3 bucket

There isn’t much you will need to do with the S3 bucket itself. The simplest bucket with no versioning and all public access blocked will work out fine. Basically, you can leave all configuration values at their defaults when creating the bucket in S3 console. There is also no need to enable static website hosting afterward.

If you use Terraform to provision your infrastructure, see the example code below:

module "s3_one" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "3.2.3"

  bucket = "opsdocks-multi-website-bucket"
}

CloudFront

We will need to create a CloudFront distribution and configure it to use the S3 bucket as its primary origin. We will also make CloudFront include Host HTTP header in its cache key.

In CloudFront console click Create Distribution to get started. Here are the important configuration parameters, you can leave the default values for the rest:

  1. Under Origin, for Origin Domain Name, choose the Amazon S3 bucket that you created earlier.
  2. Leave Origin path empty.
  3. If you haven’t made your bucket public, choose Yes use OAI, create and select one, and then Yes, update the bucket policy.
  4. Under Cache key and origin requests choose Legacy cache settings → Headers → Include the following headers → Add header → Host ✓.
  5. Under Settings, for Alternate domain name add your website domains: example.com, foo.example.com, bar.example.com.
  6. For Custom SSL certificate select a certificate that will be valid for all alternative domain names from step 3 if you need HTTPS.
  7. Leave Default root object empty.

Now that the distribution is created, there’s one more thing left to configure. Open the distribution, go to Error pages and click Create custom error response. Choose 403 for HTTP error code, Customize error response → Yes, Response page path = /index.html, HTTP Response code = 200.

By default, CloudFront will return a 403 error when you try to open your website. Adding this error override will effectively make index.html the default root object. The file will be looked up in the website’s S3 folder, i.e. example.com → my-s3-bucket/example.com/index.html, etc.

Terraform S3 + CloudFront:

module "s3_one" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "3.2.3"

  bucket = "opsdocks-multi-website-bucket"
}

data "aws_iam_policy_document" "s3_one_policy" {
  statement {
    actions   = ["s3:GetObject"]
    resources = ["${module.s3_one.s3_bucket_arn}/*"]

    principals {
      type        = "AWS"
      identifiers = module.cloudfront.cloudfront_origin_access_identity_iam_arns
    }
  }
}

resource "aws_s3_bucket_policy" "s3_one" {
  bucket = module.s3_one.s3_bucket_id
  policy = data.aws_iam_policy_document.s3_one_policy.json
}

module "cloudfront" {
  source  = "terraform-aws-modules/cloudfront/aws"
  version = "2.9.3"

  aliases = var.website_domains

  enabled             = true
  price_class         = "PriceClass_100"
  retain_on_delete    = false
  wait_for_deployment = false

  create_origin_access_identity = true
  origin_access_identities = {
    s3_one = "opsdocks-multi-website-oai"
  }

  custom_error_response = {
    403 = {
      error_code         = 403
      response_code      = 200
      response_page_path = "/index.html"
    }
  }

  origin = {
    s3_one = {
      domain_name = module.s3_one.s3_bucket_bucket_regional_domain_name
      s3_origin_config = {
        origin_access_identity = "s3_one" # key in `origin_access_identities`
      }
    }
  }

  default_cache_behavior = {
    target_origin_id       = "s3_one"
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true
    query_string           = false
    headers                = ["Host"]
    cookies_forward        = "none"
  }

  viewer_certificate = {
    acm_certificate_arn      = var.acm_certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2019"

  }
}

Lambda@Edge

Lambda@Edge is a Lambda function attached to a CloudFront distribution that gets executed when a request goes through CloudFront. Our goal is to create and attach the function.

Start creating a function in Lambda console. Make sure to use US East (N. Virginia) region as it’s the only region where Lambda@Edge functions can be created. The important points:

  1. Choose Author from scratch.
  2. For Runtime choose Python 3.8.
  3. Click Change default execution role, make up a name, then Policy templates → Basic Lambda@Edge permissions ✓.

Create the function and open it. Then remove the default source code and paste this into the editor:

def lambda_handler(event, context):
    request = event['Records'][0]['cf']['request']
    request['origin']['s3']['path'] = "/"+request['headers']['host'][0]['value']
    request['headers']['host'][0]['value'] = request['origin']['s3']['domainName']
    return request

Click Deploy to save the changes.

That’s all the code you need. It tells CloudFront to look for a website’s static files in the S3 folder that matches the website’s domain name. It is that simple.

After you deployed the changes, you’ll need to publish the function. Choose Actions → Publish new version → Publish. The current Lambda configuration will be saved with version number 1.

Now the function is ready to be attached. Open your CloudFront distribution in AWS console, then Behaviors → select the S3 behavior → Edit → Function associations. Paste your function’s ARN next to Origin request. The ARN has to end with :1, the version number.

Terraform full:

module "s3_one" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "3.2.3"

  bucket = "opsdocks-multi-website-bucket"
}

data "aws_iam_policy_document" "s3_one_policy" {
  statement {
    actions   = ["s3:GetObject"]
    resources = ["${module.s3_one.s3_bucket_arn}/*"]

    principals {
      type        = "AWS"
      identifiers = module.cloudfront.cloudfront_origin_access_identity_iam_arns
    }
  }
}

resource "aws_s3_bucket_policy" "s3_one" {
  bucket = module.s3_one.s3_bucket_id
  policy = data.aws_iam_policy_document.s3_one_policy.json
}

module "cloudfront" {
  source  = "terraform-aws-modules/cloudfront/aws"
  version = "2.9.3"

  aliases = var.website_domains

  enabled             = true
  price_class         = "PriceClass_100"
  retain_on_delete    = false
  wait_for_deployment = false

  create_origin_access_identity = true
  origin_access_identities = {
    s3_one = "opsdocks-multi-website-oai"
  }

  custom_error_response = {
    403 = {
      error_code         = 403
      response_code      = 200
      response_page_path = "/index.html"
    }
  }

  origin = {
    s3_one = {
      domain_name = module.s3_one.s3_bucket_bucket_regional_domain_name
      s3_origin_config = {
        origin_access_identity = "s3_one" # key in `origin_access_identities`
      }
    }
  }

  default_cache_behavior = {
    target_origin_id       = "s3_one"
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true
    query_string           = false
    headers                = ["Host"]
    cookies_forward        = "none"

    lambda_function_association = {
      origin-request = {
        lambda_arn = module.lambda_edge_router.lambda_function_qualified_arn
      }
    }
  }

  viewer_certificate = {
    acm_certificate_arn      = var.acm_certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2019"

  }
}

# This module assumes that Lambda source code is in `lambda_edge.py` file in the same folder
# as this Terraform configuration
module "lambda_edge_router" {
  source  = "terraform-aws-modules/lambda/aws"
  version = "3.2.1"

  function_name                     = "opsdocks-multi-website-lambda-edge"
  handler                           = "lambda_edge.lambda_handler"
  runtime                           = "python3.8"
  cloudwatch_logs_retention_in_days = 30

  publish        = true
  lambda_at_edge = true

  source_path = "${path.module}/lambda_edge.py"
}

resource "aws_route53_record" "website_domains" {
  for_each = toset(var.website_domains)

  zone_id = var.hosted_zone_id
  name    = each.value
  type    = "A"

  alias {
    name                   = module.cloudfront.cloudfront_distribution_domain_name
    zone_id                = module.cloudfront.cloudfront_distribution_hosted_zone_id
    evaluate_target_health = false
  }
}

Domain names

As you might have already guessed, your website domains have to point at the CloudFront distribution domain name:

example.com -> aaabbbccc.cloudfront.net
foo.example.com -> aaabbbccc.cloudfront.net
bar.example.com -> aaabbbccc.cloudfront.net

This is the last bit of the configuration. Now let’s test it.

Testing

Try to open one of your websites, you should get a 403 error. That’s because there are no index files yet. Let’s change that.

Gotcha

Do not go to the S3 console and do not create folders that represent the domain names. If you do so, invisible 0-byte folder objects will be created and instead of rendering the index.html file, your browser will download empty 0-byte files when you open the website. Use AWS CLI to upload your website files to S3. The folders will be created implicitly and there will be no 0-byte files in them.

Create a simple index.html file locally.

Upload the index.html to your S3 bucket using AWS CLI. Here example.com/ is the website’s domain name:

aws s3 cp index.html s3://my-s3-bucket/example.com/

Edit the index.html file, so it displays something different, and upload it to another S3 folder (domain):

aws s3 cp index.html s3://my-s3-bucket/foo.example.com/

Your S3 bucket file structure should now look similar to what this post’s Overview describes.

Open the websites in your browser. You should see two different pages meaning two different websites are hosted in the same S3 bucket. Hooray!

Tip

If something doesn’t work the way you expected, make sure CloudFront isn’t in the Deploying state and try invalidating the cache before digging any deeper.

Conclusion

Using this approach you can reduce the amount of S3 buckets that you have to manage. It can be especially useful if you need to create temporary versions of a website upon request, e.g. for a git branch or PR.

CloudFront has a limit of 100 alternate domain names per distribution. That is a soft limit however and can be increased.

Bonus: Load Balancer

Half a year after I initially wrote this post, I found myself in a situation where I needed to add a dedicated backend to each of my websites. My backend workloads were exposed via an Application Load Balancer, and I like to use CloudFront on top of ALB for extra security. So, naturally, I added my ALB as a custom origin to the same CloudFront distribution. To route requests to different targets based on the hostname, I added the Host header in the CloudFront cache key and configured host-based routing on the ALB. You can read more about it in my new post – CloudFront and Load Balancer with Host-Based Routing.