Redirects with Amazon S3

After making a few changes to my site and uploading the entire thing again, I decided to check Google Webmaster Tools and see if there’s anything that needed my attention, and it alerted me that I’m up over 400 404 errors. Yikes!

For some reason though, it’s still returning 404 errors for an old URL naming scheme I haven’t used in years. I know, I know, changing the URL scheme of an existing site is bad - but ironically enough it was a move to Google’s Blogger service that necessitated it.

So why is it still bugging me? I decided to add the offending URLs to Disallow directives in my robots.txt file, and start clearing them out. But while I did, I thought I’d check and see if there’s anything with links from outside my domain that I might want to redirect.

But how to do it? Unfortunately there’s no easy way to accomplish this with S3, short of getting my hands dirty with a Python script (which I may wind up doing). I ended up uploading blank files with the offending URLs to my bucket, then adding redirect metadata to S3. Unfortunately you can’t control the redirect type, but it appears as though it defaults to 301 “Moved Permanently” which suits my needs.

I then needed to edit my site’s makefile to tell s3cmd sync to ignore these files, or it’ll keep trying to delete them. That’s trivially done by adding, for example, --exclude 'blog/' to the s3cmd sync line.

Now to work out a quick way to find articles I’ve moved, that other people have linked to…