BlogMatrix
 

Wednesday Morning Notes

edit David P. Janes 2006-08-30 11:57 UTC add comment  ·  ·  ·  ·  ·

There's a few more updates this morning:

  • We've added a "Text Gadget", so you can add your own arbitrary text to the sidebar. If you go to your admin page and then to your Profile, you will get instructions use this.
  • You can now control your comments (from your Profile). Your options are:
    • accept all comments, display immediately
    • accept all comments, but they must be approved before posting
    • don't accept comments
  • On a technical note, we've upgraded to mod_python 3.2.10 and we're using FileSession to manage sessions (which should cut a few tenths of seconds of page loading time).

The need for speed; and the solution

edit David P. Janes 2006-08-07 20:44 UTC add comment  ·  ·  ·

I've got page loading time on this site -- for constructed pages1 -- down to near 1 second times. Most of this one second is coming from network and rendering delays, which I'll have to sort out later -- locally I can curl the page in 0.065 seconds!). As previously documented, I've already done the following:

After a lot of mulling today, I've made another big improvement. Formerly, we used to load information about the user's session from a URI called '/:admin/status/'. This returned three pieces of critical information: the IHOST, the USERID, and the HOME. The IHOST is the installation host (semantic.blogmatrix.com), the USERID is the user you are logged in as (or the empty string), and HOME is set only if you serve your pages from a different URI than the default2.

This caused rendering to pause for .75 to 1.5 seconds depending on how well the network was responding. Effectively, the made the site feel really sluggish.

We now do the following: IHOST is just built into the templates; USERID and HOME are loaded into Cookies when the user is logged in. When these values are needed, instead of taking them out of Javascript variables, we call functions that pull them out of Cookies.

Instant speed. 

1. We don't work under the same model as TypePad or Blogger. We only put a page together when we don't have it in cache. This could take several more seconds. Once a page is constructed, we'll always serve it from cache until the cache is invalidated (say, by a new post or comment being added).

2. For example, I serve my personal blog as http://blog.davidjanes.com even though deep down it's really http://davidjanes.semantic.blogmatrix.com!

Using Amazon S3 to serve static files

edit David P. Janes 2006-08-06 12:11 UTC 2 comments  ·  ·  ·  ·  ·

The V10 look and feel (i.e. what you're seeing here) uses a substantial number of GIF files to achieve the candy-like "web 2.0" look. Additionally since we're using a fair number of javascript include libraries (MochiKit, TinyMCE, Yahoo UI), what we're ending up is a lot of trips to our server the first time a user sees a page.

This equals unnecessary slowness and page response time. What I'd like to do is speed this up a little (or maybe a lot) by offloading serving these mostly static files. Right now I'm experimenting with Amazon S3 and I'll document what I'm doing.

I'm not breaking any ground here: Adrian Holovaty did this first to offload pages from Chicago Crime and has documented his experiences. I'm just going to expand and annotate what he wrote (the blockquoted italics text is his):

It was easy to get this working; took less than an hour total. Here's what I did:

First, I signed up for an Amazon S3 account. Do that by clicking "Sign Up For Web Service" on the main S3 page. When you sign up, you get two codes: an access key ID and secret access key.

You'll need to provide a credit card to pay for your (as-you-use-it) Amazon S3 services. You have to click on a provided link to get the keys. There's a X.509 certificate (rather than secret key) way of accessing your S3 account but it only works with SOAP and I'd rather stick a fork in my eye and wiggle it around first. Moving right along...

Next, I created an S3 "bucket" for my chicagocrime.org media files. An account can have multiple buckets. As far as I can tell, it's just a way of keeping your S3 stuff in separate containers. I did this by using the free S3 Python bindings. Just download the file, unzip it and put the file S3.py somewhere on your Python path. To create a bucket named 'mybucketname', do this:

import S3
conn = S3.AWSAuthConnection('your access key', 'your secret key')
conn.create_bucket('mybucketname')

I found it easier just to distutil S3.py into my standard Python library:

from distutils.core import setup
setup(
        name='S3',
        version='20060805',
        py_modules=['S3'],
)

I created a bucket called 'semantic.blogmatrix.com'

Next, I wrote a Python script that uploaded my media files to this bucket and made them publically readable. S3 has a bunch of complex authentication stuff, but all I wanted to do was use S3, essentially, as a Web hosting service. Here's the script I used, and here's how to use it:

cd /directory/with/media/files/



find | python /path/to/update_s3.py

The script is kind of cool because it uses Python's mimetypes to determine the content type of each file in order to pass that to the S3 API. Otherwise it's pretty straightforward.

I've written my own little program (attached) to do this which takes care of all the path searching, etc.. I'll probably modify it some more to track what it's uploaded so we don't multiple upload files. Here's the help:

blogmatrix.v10@s002. python S3Uploader.py --help
usage: S3Uploader.py [options]

options:
  -h, --help            show this help message and exit
  --debug              
  --bucket=BUCKET       Amazon S3 Bucket
  --access-key=ACCESS_KEY
                        Amazon S3 Access Key (required)
  --secret-key=SECRET_KEY
                        Amazon S3 Secret Access Key (command line prompt if
                        missing)
  --root=ROOT           All directories are made relative to this (optional)
                        root
  --directory=DIRECTORIES
                        Upload files from this directory (default: .)
  --extension=EXTENSIONS
                        Upload files matching this extension (default: all
                        files)

For example:

python S3Uploader.py \
--bucket semantic.blogmatrix.com \
--access-key 0ZB0XFMV5NE1KM15DKR2 \
--extension gif,jpg,png,css,js \
--directory v10/media \
--root ~/htdocs

Finally, it's a matter of plugging in the changed files. Adrian does it like this:

Finally, it was just a matter of changing my chicagocrime.org templates to point to S3's URLs rather than my own URLs. That was a snap, thanks to Django's template inheritance and includes.

We do it with Apache rules:

RewriteRule ^/:root/(silk_icons/.*.png)$        http://s3.amazonaws.com/semantic.blogmatrix.com/$1  [R,L]
RewriteRule ^/:root/(v10/media/.*)$             http://s3.amazonaws.com/semantic.blogmatrix.com/$1  [R,L]
RewriteRule ^/:root/((MochiKit|tinymce|yui)/.*)$                http://s3.amazonaws.com/semantic.blogmatrix.com/$1  [R,L]

 

You're seeing the result here. All the background graphics and external javscript libraries are coming from Amazon S3.

Attached Documents:

Apache's mod_deflate

edit David P. Janes 2006-07-13 12:21 UTC add comment  ·  ·  ·

We've enabled mod_deflate on our Apache2 installation. This means that we'll only be sending about 10-20% of the data over the wire for our big fat HTML, JS and CSS files as the data will be GZIP compressed.

If you're considering using Apache2, you must explicitly enable it while building, i.e.:

./configure --enable-mods-shared=most --enable-deflate

Right now, this is what I've added to our config file:

LoadModule deflate_module modules/mod_deflate.so
AddOutputFilterByType DEFLATE text/html text/plain text/xml application/x-javascript text/css

There's probably more mods coming to this yet though.