August 13, 2015

Hot Linking in S3

Hot linking (also known as Inline linking, leeching, piggy-backing, direct linking, or offsite image grabs), is a where one website directly includes a resource on another website. This means the user is looking at website A, and some of the images or other content on the page is actually being served from website B. This is a problem because while B is paying for the server space and bandwidth, they are not getting credit in the users mind (or the ad revenue).

This can be a problem on any hosting platform, not just S3. Apache can prevent hot linking with .htaccess rules or mod rewrite, and tools are available for Microsoft IIS. Information on preventative measures with S3 hosting was a little thin, so I want to show you my example, in use on this website.

Lets dive in

To stop hotlinkers in their tracks, we’re going to use a “policy” on our S3 bucket. What does an aws policy look like? Policies are specified in json format, and as an example, the policy used for this website is as below.

:::json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::www.nathan.digital/*.html"
        },
        {
            "Sid": "Allow get requests originated from www.nathan.digital and nathan.digital",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::www.nathan.digital/*",
            "Condition": {
                "StringLike": {
                    "aws:Referer": [
                        "http://www.nathan.digital/*",
                        "http://nathan.digital/*",
                        "https://www.nathan.digital/*",
                        "https://nathan.digital/*"
                    ]
                }
            }
        }
    ]
}

You can see we start with a version key/value, this tells the web server how to interpret the following statement. Stick to the version string used in any examples you’re following, They may not work if you switch to a different version.

Then we have a “statement” key, with 2 objects. These are not ordered, and all are applied. If no matching sid is found, the request is denied. Lets look at a couple of examples.

  • jpg image file. This file only matches one of our 2 “Resource” lines, the second one with conditionals. If you don’t meet the conditionals you will not be served the file. Our conditionals ensure the user is coming from our website before allowing them to download the image. More on this in a moment.
  • html file. This file matches both “Resource” lines. The result is the “or” of all matching statements. so it doesn’t matter that the second statement might not be true, because the first one is. html files are allowed regardless of the conditionals in the second statement.
  • Deny takes priority. If any of the matching statements are deny, then the file will not be served. Regardless of how many allows there are, or their precision
  • Conditionals, you can see there is one conditional in the second statement, “StringLike”. This checks the “aws:Referer” key of the request against the provided strings. This key comes from one of the AWS documentation examples, but I’ve been unable to find a full listing of available keys. If you know where I can find this, please let me know in the comments below.
  • In addition to the policies we set on a bucket, logged in aws users may have user policies that are also applied. These are also “or”d with any bucket policies you set. Such as the user permission I use to update this bucket, giving write permission even though no write permission is set in the bucket policy. This only applies to users from the bucket owner’s account, or users configured for cross account sharing. For web users, you don’t have to worry about this.

If you copy the above code and change to your appropriate bucket name and url, you can rest easy, knowing any bandwidth costs you have are actually for your own website.

My code is based on one of the official examples. If you want to know more, please head to the official documentation.