AppEngine “Rewrite Rules”

July 31, 2008

AppEngine “Rewrite Rules”

Filed under: How To — Charles Engelke @ 8:43 pm
Tags: appengine, google

Yesterday I showed how to get a basic static web site hosted on Google AppEngine. But I need a little bit more than a purely static site. My blog used to be hosted at engelke.com/blog and http://www.engelke.com/blog, but now it’s at blog.engelke.com. So any existing links to my blog entries will break. That is, they’ll break unless requests to engelke.com/blog/something get redirected to blog.engelke.com/something.

I did this with a rewrite rule when my site was being served with the Apache httpd server. The specific rule I had was:

RewriteRule ^/blog(.*)$  http://blog.engelke.com$1  [R]

That says that a request that starts with /blog will be redirected to one at blog.engelke.com followed by whatever followed the word blog. Which is actually kind of wrong; for example, this will redirect a request made to engelke.com/blogging to blog.engelke.comging, which is nonsense. I really should have given two rules:

RewriteRule ^/blog/(.*)$  https://blog.engelke.com/$1 [R]
RewriteRule ^/blog$  https://blog.engelke.com/        [R]

The first rule says that anything starting with /blog/ will be redirected to blog.engelke.com/ followed by whatever was after blog/. That handles everything but a request to engelke.com/blog just by itself. The second rule handles that.

AppEngine’s url: mapping rules in app.yaml kind of look like rewrite rules. For example, the mapping:

- url: (.*)/
  static_files: static\1/index.html

That says any request that ends with a slash should be served with the file at that relative directory followed by index.html. But that’s not a redirect. The browser still requests the URL ending with the slash; the server just returns the contents of a particular file.

No, if we want redirection similar to the rewrite rules I had, we’ll need to write an AppEngine script. First we add two url: mappings to app.yaml. These should be placed before the existing ones:

- url: /blog/.*
  script: redirector.py

- url: /blog
  script: redirector.py

These correspond to the first part of the old rewrite rules, but they don’t tell what the response should be. Instead, they just tell AppEngine to run the redirector.py script and let it figure out what to do.

I started writing the redirector.py script with a standard skeleton from the AppEngine documentation:

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app

application = webapp.WSGIApplication(
            [
            # pattern to handler mapping pairs go here
            ])

def main():
   run_wsgi_app(application)

if __name__ == "__main__":
    main()

The “pattern to hander mapping pairs” are each a string that’s a regular expression to match and the name of a class to handle those request paths. The following two lines match both patterns and invoke the BlogHandler class for them (the ‘r’ in front a string just tells Python it’s a “raw” string, so Python doesn’t interpret any characters in any special way):

     (r'^/blog/(.*)', BlogHandler),
     (r'^/blog$',     BlogHandler)

The BlogHandler class contains methods for each HTTP method to be handled. We’ll just handle GET and HEAD requests, since those are the only kind our blog will respond to anyway. The pattern matched inside the parentheses will be passed as a parameter to each of these methods (and self is always going to be passed as the first parameter in Python). The second pattern doesn’t have any parentheses, so that matching string parameter will be missing; our handlers will have to accept that.

Here’s the code for the BlogHandler class:

class BlogHandler(webapp.RequestHandler):
    def get(self, tail = ''):
        self.redirect('https://blog.engelke.com/'+tail, permanent = True)
    def head(self, tail = ''):
        self.redirect('https://blog.engelke.com/'+tail, permanent = True)

This code is pretty simple and should be self-explanatory. If no second parameter is given, the methods act as if an empty string was passed. The optional “permanent = True” parameter makes the redirect an HTTP status 301 Moved Permanently so that client programs can know to never bother to look at the old address.

Complete code

The new app.yaml file is:

application: engelkeweb
version: 1
runtime: python
api_version: 1

handlers:
- url: /blog/.*
  script: redirector.py

- url: /blog
  script: redirector.py

- url: (.*)/
  static_files: static\1/index.html
  upload: static/index.html

- url: /
  static_dir: static

The redirector.py script is:

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app

class BlogHandler(webapp.RequestHandler):
    def get(self, tail = ''):
        self.redirect('https://blog.engelke.com/'+tail, permanent = True)
    def head(self, tail = ''):
        self.redirect('https://blog.engelke.com/'+tail, permanent = True)

application = webapp.WSGIApplication(
            [
                (r'^/blog/(.*)', BlogHandler),
                (r'^/blog$',     BlogHandler)
            ])

def main():
   run_wsgi_app(application)

if __name__ == "__main__":
    main()

And that’s my AppEngine hosted web site. The one problem left is that requests that should have a trailing slash, but don’t, won’t work (other than /blog, handled above). But there are only two such possible pages on my site, and I’ve never posted links without the trailing slashes. So I don’t need to deal with them.

Though I did. I just added mappings for those two raw names (/xhtmlref and /charles/TPC5) to the redirector.py script, and added a class to redirect those requests to the right URL with the trailing slash. But that doesn’t show any new ideas, so I’m not going to put the additional code here.

Comments (4)

4 Comments

thanks for showing us how to handle rewrites!

this is awesome… I thought there was no easy way to handle this… was driving me nuts…

Comment by alfredo — February 17, 2009 @ 11:10 pm
Hi,

I am trying to set up the “naked domain handling” for the app engine in Java development .

Can you pls help me how to do it ?

Thanks

Comment by Shyam — May 4, 2009 @ 8:50 pm
- I don’t think it has anything to do with Java versus Python development. Google apparently disabled this ability a few months ago, so you just can’t do it. Any request to a “naked domain” that reaches AppEngine will simply not be handled.
  
  Instead you have to get somebody (your domain registrar is the only reasonable choice I can think of) to set up HTTP redirection for you. Then, if somebody tries to open http://example.com/ that request will go to that server (your registrar’s in this case) which will always respond with an HTTP redirect to http://www.example.com/. That redirected request will go to Google AppEngine and be properly handled.
  
  I posted more on this a few weeks ago here.
  
  Hope this solves your problem for you.
  
  Comment by Charles Engelke — May 4, 2009 @ 10:14 pm
By the way, i coined a nice method to handle such problem,
when you have such urls ad “/directory/”, “/something/” and to provide correct 301 redirect to them from “/directory” and “/something” respectively:

from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app

class DirecotryHandler(webapp.RequestHandler):
def get(self, tail):
if tail == ”:
self.redirect(‘/directory/’, permanent=True);
#TODO: respond to “/directory/” request

def head(self, tail):
if tail == ”:
self.redirect(‘/directory/’, permanent=True);
#TODO: respond to “/directory/” request

application = webapp.WSGIApplication(
[
(r’^/directory(/?)$’, DirectoryPage),
])

def main():
run_wsgi_app(application)

if __name__ == “__main__”:
main()

Comment by Klichuk Bogdan — December 20, 2009 @ 6:44 am

RSS feed for comments on this post.

Charles Engelke's Blog

July 31, 2008