Yesterday I showed how to get a basic static web site hosted on Google AppEngine. But I need a little bit more than a purely static site. My blog used to be hosted at engelke.com/blog and http://www.engelke.com/blog, but now it’s at blog.engelke.com. So any existing links to my blog entries will break. That is, they’ll break unless requests to engelke.com/blog/something get redirected to blog.engelke.com/something.
I did this with a rewrite rule when my site was being served with the Apache httpd server. The specific rule I had was:
RewriteRule ^/blog(.*)$ http://blog.engelke.com$1 [R]
That says that a request that starts with /blog will be redirected to one at blog.engelke.com followed by whatever followed the word blog. Which is actually kind of wrong; for example, this will redirect a request made to engelke.com/blogging to blog.engelke.comging, which is nonsense. I really should have given two rules:
RewriteRule ^/blog/(.*)$ https://blog.engelke.com/$1 [R]
RewriteRule ^/blog$ https://blog.engelke.com/ [R]
The first rule says that anything starting with /blog/ will be redirected to blog.engelke.com/ followed by whatever was after blog/. That handles everything but a request to engelke.com/blog just by itself. The second rule handles that.
AppEngine’s url: mapping rules in app.yaml kind of look like rewrite rules. For example, the mapping:
- url: (.*)/
static_files: static\1/index.html
That says any request that ends with a slash should be served with the file at that relative directory followed by index.html. But that’s not a redirect. The browser still requests the URL ending with the slash; the server just returns the contents of a particular file.
No, if we want redirection similar to the rewrite rules I had, we’ll need to write an AppEngine script. First we add two url: mappings to app.yaml. These should be placed before the existing ones:
- url: /blog/.*
script: redirector.py
- url: /blog
script: redirector.py
These correspond to the first part of the old rewrite rules, but they don’t tell what the response should be. Instead, they just tell AppEngine to run the redirector.py script and let it figure out what to do.
I started writing the redirector.py script with a standard skeleton from the AppEngine documentation:
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
application = webapp.WSGIApplication(
[
# pattern to handler mapping pairs go here
])
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
The “pattern to hander mapping pairs” are each a string that’s a regular expression to match and the name of a class to handle those request paths. The following two lines match both patterns and invoke the BlogHandler class for them (the ‘r’ in front a string just tells Python it’s a “raw” string, so Python doesn’t interpret any characters in any special way):
(r'^/blog/(.*)', BlogHandler),
(r'^/blog$', BlogHandler)
The BlogHandler class contains methods for each HTTP method to be handled. We’ll just handle GET and HEAD requests, since those are the only kind our blog will respond to anyway. The pattern matched inside the parentheses will be passed as a parameter to each of these methods (and self is always going to be passed as the first parameter in Python). The second pattern doesn’t have any parentheses, so that matching string parameter will be missing; our handlers will have to accept that.
Here’s the code for the BlogHandler class:
class BlogHandler(webapp.RequestHandler):
def get(self, tail = ''):
self.redirect('https://blog.engelke.com/'+tail, permanent = True)
def head(self, tail = ''):
self.redirect('https://blog.engelke.com/'+tail, permanent = True)
This code is pretty simple and should be self-explanatory. If no second parameter is given, the methods act as if an empty string was passed. The optional “permanent = True” parameter makes the redirect an HTTP status 301 Moved Permanently so that client programs can know to never bother to look at the old address.
Complete code
The new app.yaml file is:
application: engelkeweb
version: 1
runtime: python
api_version: 1
handlers:
- url: /blog/.*
script: redirector.py
- url: /blog
script: redirector.py
- url: (.*)/
static_files: static\1/index.html
upload: static/index.html
- url: /
static_dir: static
The redirector.py script is:
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
class BlogHandler(webapp.RequestHandler):
def get(self, tail = ''):
self.redirect('https://blog.engelke.com/'+tail, permanent = True)
def head(self, tail = ''):
self.redirect('https://blog.engelke.com/'+tail, permanent = True)
application = webapp.WSGIApplication(
[
(r'^/blog/(.*)', BlogHandler),
(r'^/blog$', BlogHandler)
])
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
And that’s my AppEngine hosted web site. The one problem left is that requests that should have a trailing slash, but don’t, won’t work (other than /blog, handled above). But there are only two such possible pages on my site, and I’ve never posted links without the trailing slashes. So I don’t need to deal with them.
Though I did. I just added mappings for those two raw names (/xhtmlref and /charles/TPC5) to the redirector.py script, and added a class to redirect those requests to the right URL with the trailing slash. But that doesn’t show any new ideas, so I’m not going to put the additional code here.