These days we use Amazon Cloudfront for content delivery. Amazon has made it very easy to deliver files in a Amazon Simple Storage Service (S3) bucket using Amazon Cloudfront distribution. If you are using Cloudfront as Content Delivery Network (CDN) your next task will be monitoring the usage. For this Amazon Cloudfront has a provision to store access logs to a S3 bucket. My hurdle was to process the log files stored by Cloudfront. For sites hosted with apache I use Awstats for reading the logs. So my vote was for awstats. Please follow the steps one by one 
1. Need to download the log files stored in the S3 bucket. For this I had to use the a python script done by wpstorm.net but I had to make some modification so that it worked for me. Please follow the blog post if you need any help setting up the required libraries.
get-aws-logs.py
#! /usr/bin/env python
"""Download and delete log files for AWS S3 / CloudFront
Usage: python get-aws-logs.py [options]
Options:
-b ..., --bucket=... AWS Bucket
-p ..., --prefix=... AWS Key Prefix
-a ..., --access=... AWS Access Key ID
-s ..., --secret=... AWS Secret Access Key
-l ..., --local=... Local Download Path
-h, --help Show this help
-d Show debugging information while parsing
Examples:
get-aws-logs.py -b eqxlogs
get-aws-logs.py --bucket=eqxlogs
get-aws-logs.py -p logs/cdn.example.com/
get-aws-logs.py --prefix=logs/cdn.example.com/
This program requires the boto module for Python to be installed.
"""
__author__ = "Johan Steen (http://www.artstorm.net/)"
__version__ = "0.5.0"
__date__ = "28 Nov 2010"
import boto
import getopt
import sys, os
from boto.s3.key import Key
_debug = 0
class get_logs:
"""Download log files from the specified bucket and path and then delete them from the bucket.
Uses: http://boto.s3.amazonaws.com/index.html
"""
# Set default values
AWS_BUCKET_NAME = '{AWS_BUCKET_NAME}'
AWS_KEY_PREFIX = ''
AWS_ACCESS_KEY_ID = '{AWS_ACCESS_KEY_ID}'
AWS_SECRET_ACCESS_KEY = '{AWS_SECRET_ACCESS_KEY}'
LOCAL_PATH = '/tmp'
# Don't change below here
s3_conn = None
bucket = None
bucket_list = None
def __init__(self):
s3_conn = None
bucket_list = None
bucket = None
def start(self):
"""Connect, get file list, copy and delete the logs"""
self.s3Connect()
self.getList()
self.copyFiles()
def s3Connect(self):
"""Creates a S3 Connection Object"""
self.s3_conn = boto.connect_s3(self.AWS_ACCESS_KEY_ID, self.AWS_SECRET_ACCESS_KEY)
def getList(self):
"""Connects to the bucket and then gets a list of all keys available with the chosen prefix"""
self.bucket = self.s3_conn.get_bucket(self.AWS_BUCKET_NAME)
self.bucket_list = self.bucket.list(self.AWS_KEY_PREFIX)
def copyFiles(self):
"""Creates a local folder if not already exists and then download all keys and deletes them from the bucket"""
# Using makedirs as it's recursive
if not os.path.exists(self.LOCAL_PATH):
os.makedirs(self.LOCAL_PATH)
for key_list in self.bucket_list:
key = str(key_list.key)
# Get the log filename (L[-1] can be used to access the last item in a list).
filename = key.split('/')[-1]
# check if file exists locally, if not: download it
if not os.path.exists(self.LOCAL_PATH+filename):
key_list.get_contents_to_filename(self.LOCAL_PATH+filename)
print "Downloaded "+filename
# check so file is downloaded, if so: delete from bucket
if os.path.exists(self.LOCAL_PATH+filename):
key_list.copy(self.bucket,'archive/'+key_list.key)
print "Moved "+filename
key_list.delete()
print "Deleted "+filename
def usage():
print __doc__
def main(argv):
try:
opts, args = getopt.getopt(argv, "hb:p:l:a:s:d", ["help", "bucket=", "prefix=", "local=", "access=", "secret="])
except getopt.GetoptError:
usage()
sys.exit(2)
logs = get_logs()
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit()
elif opt == '-d':
global _debug
_debug = 1
elif opt in ("-b", "--bucket"):
logs.AWS_BUCKET_NAME = arg
elif opt in ("-p", "--prefix"):
logs.AWS_KEY_PREFIX = arg
elif opt in ("-a", "--access"):
logs.AWS_ACCESS_KEY_ID = arg
elif opt in ("-s", "--secret"):
logs.AWS_SECRET_ACCESS_KEY = arg
elif opt in ("-l", "--local"):
logs.LOCAL_PATH = arg
logs.start()
if __name__ == "__main__":
main(sys.argv[1:])
Note: The above script will download the s3 logs to specified folder. Please make sure you put your Amazon access keys.
2. Now we have bash script which will uses the above python script to download the log files and combine all of them into a single log file and then it will be analyzed by awstats.
Warning: Please read through the script files and make necessary changes needed.
Note: You should have awstats installed on your system. The bellow script uses awstats.
Note: You can download the script files at the end of this blog post where awstats configuration with custom setup for cloudfront log format is also provided.
get-aws-logs.sh
#!/bin/bash
# Initial, cron script to download and merge AWS logs
# 29/11 - 2010, Johan Steen
# 1. Setup variables
date=`date +%Y-%m-%d`
static_folder="/tmp/log_static_$date/"
mkdir -pv $static_folder
python /var/www/scripts/get-aws-logs.py --prefix=logs/www.imthi.com --local=$static_folder
gunzip --quiet ${static_folder}*
/usr/local/awstats/tools/logresolvemerge.pl ${static_folder}* | sed -r -e 's/([0-9]{4}-[0-9]{2}-[0-9]{2})\t([0-9]{2}:[0-9]{2}:[0-9]{2})/\1 \2/g' >> /var/www/logs/www.imthi.com.log
rm -vrf $static_folder
/usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=imthi -update
I would suggest you to test run the above scripts on a staging / testing environment before moving to a production. Again please change the scripts with your domain details and Amazon access keys.
Download the scripts to download and process Amazon Cloudfront Logs with Awstats.
Have a nice journey exploring the cloud 