Treemap on Rails 8 comments

Posted by robon July 27, 2006

You may have used the Active Record “acts_as” extensions that ship with Rails, such as acts_as_list, or those added by third-party plugins, such as acts_as_attachment. In this post I’m going to cover how to use a new plugin for Rails by Andrew Bruno, called acts_as_treemap.

What is a Treemap?

A Treemap is a diagram that allows you to easily visualize hierarchical information, or trees. The first treemap was used to visualize the directory structure of a filesystem; to make it very easy to identify the disk-hogs (files taking up a disproportionate amount of disk space) on a system with very limited resources. A more recent example of a treemap is one that Tim O’Reilly has posted on O’Reilly’s Radar that shows recent trends in programming language book sales.

The following treemap show two dimensions of information: Square size represents sales volume, and color represents rate of growth.

Book Sales

In order to represend your data with a treemap, your data must be modeled as a tree. The tree data model is much like that of an XML document, where there’s a root or parent node, and zero or more child nodes. Each subsequent node may have it’s own children, and so on.

In SQL, this structure is simple to set up. You create a field that serves a the primary key (in Rails that’s always “id”), and another field that stores the parent id of each record (e.g. “parent_id”). Note that the root node will have a parent_id of “NULL.” There can be more fields, of course, but these are all that are required to structure records as a tree.

The following diagram shows the relationship between data in a table, a tree structure, and a treemap:

Tree Structure

Using acts_as_treemap with Rails

To demonstrate how to use the acts_as_treemap plugin Andy has supplied an interesting dataset; a database of SourceForge projects. The information the resultant treemap will convey has to do with project activity; number of downloads and rate of change (ROC). With such a graphical representation of this data, it becomes very clear what projects are on the rise or decline, and the popularity (number of downloads) of projects relative to one another.

Consider this example merely a starting point for your own implementation. There are a number of built-in options you can set to alter the properties and appearance of the treemap, but there are endless other possibilities as well. For example, you could have JavaScript pop-ups over each square that display even more details about each segment of data.

Step 1. The first step is to download and install the ruby-treemap gem. This gem is deliberately created separately from the Rails plugin so that it may be extended for any Ruby client that needs to build a treemap, not just Rails. Install the ruby-treemap gem with:

$ sudo gem install ruby-treemap

Step 2. The SourceForge sample data can be downloaded from here. The only table that we’ll be using for this example is sourceforge_nodes.

$ wget http://www.qnot.org/sourceforge_database.sql 

Step 3. You’ll need to create a database to load the SourceForge data into. Call it “sourceforge” and grant your Rails application user (e.g. “rails_user”) access to it. As long as you’re at it you should create a sourceforge_test database as well, although we won’t really be using it much here.

$ mysql sourceforge -u rails_user < sourceforge_database.sql

Step 4. With your database set up and loaded with data data you’re now ready to create your Rails application. Let’s call it “sfmap.” Create it now, with:

$ rails sfmap

Step 5. Now you need to configure the Rails app you just created so that it may communicate with your database. Do so with the following (replace username and password if needed):

./config/database.yml:
development:
  adapter: mysql
  database: sourceforge
  username: rails_user
  password: ******
  host: localhost
test:
  adapter: mysql
  database: sourceforge_test
  username: rails_user
  password: ******
  host: localhost
production:
  development

Step 6. Now you can install the acts_as_treemap plugin into your Rails application’s vender/plugins directory. This simple with the ./script/plugin command. Just type the following from the root of your sfmap project.

$ ./script/plugin install http://code.qnot.org/svn/projects/acts_as_treemap/

Step 7. Now you need to set up an Active Record model to represent the sourceforge_nodes table in the database. Generate it now by passing the name of this class (“SourceforgeNode”) to the Rails model generator:

$ ./script/generate model SourceforgeNode 

Step 8. Open up the model class file that was just generated, and replace its contents with the following:

./app/models/sourceforge_node.rb:
<samp>class SourceforgeNode < ActiveRecord::Base
  acts_as_tree :order => "curmo" 
  acts_as_treemap :label => "name", :size => "curmo", :color => "roc" 
end</samp>

You can see a little foreshadowing of our expected outcome with the two method calls: acts_as_tree and acts_as_treemap. They set up the model to be “treemapped” and configure what columns are to be represented by region size, and by color.

The first call to acts_as_tree is actually a build-in Rails method. It tells Rails to treat the data in this table as a tree structure (each row references a parent row, excluding the root element). The :order => “curmo” specified how to order children of the tree structure that are on the same level of the tree and would be otherwise unordered.

The acts_as_treemap method specifies to use the SourceForge project name for labeling each region of the treemap, the size of each region is to be based on the number of downloads for the current month, and the color of each region is to convey information about the rate of change in the number of downloads for each project.

Step 9. With the model configured, you can move on and generate a controller. Create a controller named “SourceforgeMap” and a corresponding “index” view using the controller generator.

$ ./script/generate controller SourceforgeMap index

Step 10. Now edit the controller that was just created and replace its contents with the following:

./app/controllers/sourceforge_map_controller.rb:

<samp>class SourceforgeMapController < ApplicationController
  def index
    # Retreive the root node of the treemap
    @root = SourceforgeNode.find(18)
  end
end</samp>

The call to SourceforgeNode#find above sets the id of the root element of the tree structure (the element of which all other elements are children of). (Id #18 in the SourceForge data represents “topic.”)

Step 11. In the index.rhtml view that was created with the controller generator above, add the following call to the html_treemap method, passing in the @root instance variable, containing the root level SourceforgeNode object.

./app/views/sourceforge_map/index.rhtml:
<%= html_treemap(@root) %>

Step 12. Finally, start up your server with

$ ./script/server

and view the treemap output in your browser (http://localhost:3000/sourceforge_map). You should see something like the following:

SourceForge Treemap

View an HTML version of the treemap with Javascript tool-tips here.

Once you’ve got this example up and running, you can switch out the SourceForge data for your own, and then modify or even extend the rendered treemap however you like. Oh, and if you end up using this plugin in your own projects, please report back about it in the comments.

Thank you,
Rob


Code Syntax Highlighting in Mephisto with CodeRay 1 comment

Posted by robon July 20, 2006

The other day I switched this blog over from Typo to Mephisto, a lightweight blog/cms system written by Rick Olson. Over all, I’m glad I took the leap, but one thing I missed was a slick syntax highlighting system. I browsed through the source to make sure that there wasn’t some hidden feature supporting it that I had missed. Based on good reviews I had heard, I looked to CodeRay to get code syntax coloring working within Mephisto.

Googling with “rails+coderay” turned up this snippet from over at Rails Weenie. I tried it out by first installing CodeRay as a gem. After noticing some strange behavior I asked for help in #caboose. Rick pointed out that the Subversion version of CodeRay was miles ahead of the gem and that it would likely clear up the problems I was having. I ended up doing something like this to get the latest CodeRay into my Mephisto project:


$ cd ./vendor
$ svn export svn://rubyforge.org//var/svn/coderay/trunk/coderay/trunk/lib/
$ mv lib/* .
$ rmdir lib

(I chose not to set up externals for CodeRay, but you could go that route if you always want their edge version.)

I ended up getting Rick’s CodeRay snippet working in Mephisto as a Liquid filter, and then turned the whole thing into a Plugin called mephisto_code_colorizer. Here’s the meaningful files from the plugin, in pretty colors (paths relative to your Rails project root):

First, the init.rb, to load everything up:

./init.rb:
require 'coderay'
require 'snippet_parser'
require 'colorizer'
Liquid::Template.register_filter(Mephisto::Liquid::Colorizer)

Then, Rick’s SnippetParser class definition:

./bin/snippet_parser.rb:
<samp>

class SnippetParser < String
  class << self
    # SnippetParser.parse text do |tag, code, i|
    #   # return processed code
    # end   
    def parse(text, &block) 
      build_snippets text, &block
    end

    private 
      def method_missing(method, *args, &block) 
        new(args.first).send(method, *args[1..-1], &block) 
      end     
  end

  # returns snippets in an array
  def snippets
    build_snippets if @snippets.nil?
    @snippets
  end

  # wraps snippets in &lt;pre>&lt;code>
  def pre_format
    build_snippets do |tag, code, i|
      %(&lt;pre>&lt;code>#{code}&lt;/code>&lt;/pre>)
    end
  end

  protected
    def build_snippets(&block)
      @snippets = []
      contents  = []
      tag       = nil
      returning [] do |output|
        tokenizer = HTML::Tokenizer.new(self.strip)

        while token = tokenizer.next
          node = HTML::Node.parse(nil, 0, 0, token, false)
          if node.tag? && node.name == 'samp'
            if contents.blank? # open tag
              tag = node.dup
            else # closing tag
              output << close_snippet(tag, contents.join, &block)
              tag = nil
            end
            contents.clear
          else # inside a code tag
            (tag.nil? ? output : contents) << node.to_s
          end
        end

        # get any unfinished code blocks
        output << close_snippet(nil, contents.join, &block) unless contents.empty?
      end.join
    end

    def close_snippet(tag, contents, &block)
      @snippets << contents
      block ? block.call(tag, contents, @snippets.length) : %(&lt;samp>#{contents}&lt;/samp>)
    end
end
</samp>

Finally, I created a Liquid filter named “syntax” and baked it in with the other Liquid filters, with:

./bin/colorizer.rb:
<samp>

module Mephisto
  module Liquid
    module Colorizer
      include ActionView::Helpers::TagHelper
      require 'cgi'

      def syntax(html)
        # tag is the HTML::Node instance
        # code is the text inside the code tags
        # i is the counter for this current snippet.
        SnippetParser.parse CGI.unescapeHTML(html) do |tag, code, i|
          code.gsub! /^\s+\n/, '' # gets rid of an extra linebreak at the top.
          %(<div class='samp'>) + 
            CodeRay.scan(code,
            (tag.attributes['lang'].to_sym rescue nil) || 
            :ruby).div(:line_numbers => :list, :css => :style) +
          "</div>" 
        end
      end     

    end
  end
end
</samp>

The filter colorizes code marked up in samp tags. To filter the body of an article, use something like the following in your liquid templates:

./themes/site-1/templates/home.liquid:

...
{{ article.body | syntax }}
...

Wait! Almost forgot the css styles. Of course you’ll need them to have full control over all those nice colors. Check out my css file from this site for the colors I’m using.

I tossed this solution together pretty quickly because I have a lot of other things that I should be doing (like writing a book), but I would love to hear some feedback about how I might make this better. Please leave suggestions in the comments.

Thank you,
Rob


Customize Pound's Logging with Cronolog 2 comments

Posted by robon July 16, 2006

By default Pound (a software load balancer) logs to syslog. This is probably not where you want your web application’s access logs, especially if your site gets a lot of traffic. Using a very cool log rotation utility called Cronolog, and with a little reconfiguration of Pound, you can have Pound log to a directory of your choosing, just as you’re used to with Apache or Lighttpd.

Installing cronolog

To get this working, first download and install cronolog. On Debian based systems, this is as easy as:


$ apt-get update
$ apt-get install cronolog

Alternatively, you can download the source and build it yourself with:


$ wget http://cronolog.org/download/cronolog-1.6.2.tar.gz
$ tar xvzf cronolog-1.6.2.tar.gz
$ cd cronolog-1.6.2
$ ./configure --prefix=/usr/local
$ make
$ sudo make install

Once you’ve got cronolog installed, you can test it by sending it some output from the command-line with echo. For example:


$ echo "Arrg, this be a test..." | /usr/bin/cronolog \
> /var/log/www/%Y/access.%m-%d-%y.log

Running this command demonstrates how cronlog accepts input and creates log files based on a specified template string consisting of the current date and time. In this case, cronlog receives the output of the echo command and creates a directory named “2006” under /var/log/www containing a file called “access.07-17-06.log.”


$ cat /var/log/www/2006/access.07-17-06.log
Arrg, this be a test...

The date template format options are the same as those of the Unix date command (which is in turn the same as your systems C library’s implementation of the strftime). See the cronolog man page for a full listing of format options.

Using cronolog with Pound

The idea behind using Pound with cronolog is basically the same as the echo command above. You want to pipe the output of Pound directly to cronlolog. In order to get at Pound’s logs, you have to disable its built-in logging behavior that sends all of its output to syslog. To do this you reconfigure Pound, having it log to stderr instead. Then you can pipe Pound’s output directly to cronolog.

The option to change the default logging behavior of Pound must be done compile-time. You need to pass the —disable-log to configure when building pound. For example:


$ tar xvzf Pound-2.0.9.tgz
$ cd Pound-2.0.9
$ ./configure --disable-log
$ make
$ sudo make install

The final step is to pipe Pound’s output to cronolog. On a Debian system you can do this by modifying Pound’s init script a little. Basically, anywhere in this script where pound is started, you add an additional pipe string to the cronolog command. Here’s my Pound init script:

/etc/init.d/pound:

#! /bin/sh

PATH=/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/local/sbin/poun
CRONOLOG='/usr/bin/cronolog /var/log/www/pound/%Y/access.%m-%d-%y.log'
NAME=pound
DESC=pound
PID=/var/run/$NAME.pid

test -f $DAEMON || exit 0

set -e

# check if pound is configured or not
if [ -f "/etc/default/pound" ]
then
  . /etc/default/pound
  if [ "$startup" != "1" ]
  then
    echo "pound won't start unconfigured. configure & set startup=1 in /etc/default/pound" 
    exit 0  
  fi
else
  echo "/etc/default/pound not found" 
  exit 0  
fi

case "$1" in 
  start)  
    echo -n "Starting $DESC: " 
    start-stop-daemon --start --quiet --exec $DAEMON | $CRONOLOG &
    echo "$NAME." 
    ;;
  stop)
    echo -n "Stopping $DESC: " 
    start-stop-daemon --oknodo --pidfile $PID --stop --quiet --exec $DAEMON 
    echo "$NAME." 
    ;;
  restart|force-reload)
    echo -n "Restarting $DESC: " 
    start-stop-daemon --pidfile $PID --stop --quiet --exec $DAEMON 
    sleep 1 
    start-stop-daemon --start --quiet --exec $DAEMON | $CRONOLOG &
    echo "$NAME." 
    ;;
  *)
    N=/etc/init.d/$NAME
    # echo "Usage: $N {start|stop|restart|reload|force-reload}" >&2
    echo "Usage: $N {start|stop|restart|force-reload}" >&2
    exit 1  
    ;;
esac

exit 0

To avoid some repetition, I store the call to cronolog in a Bash variable named “CRONOLOG.” Then, in each place where pound is called, I append: ”| $CRONOLOG &” (a pipe, the output of the CRONOLOG variable, and an ampersand to put the process into the background).

After starting Pound with the init script with


$ sudo /etc/init.d/pound start

Pound logs its Apache style logs (Pound loglevel 3) to the following file:

/var/log/www/pound/2006/access.07-17-06.log:

blog.tupleshop.com 24.60.34.25 - - [11/Jul/2006:10:51:15 -0700] "GET /favicon.ico 
    HTTP/1.1" 200 1406 "" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; 
    rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4" 
blog.tupleshop.com 67.121.136.191 - - [11/Jul/2006:10:55:12 -0700] 
    "GET /images/figures/pound-deploy.png HTTP/1.1" 200 45041 "" "Mozilla/5.0 
    (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) 
    Safari/419.3" 
blog.tupleshop.com 68.142.33.136 - - [11/Jul/2006:10:55:50 -0700] 
    "GET /images/figures/pound-deploy.png HTTP/1.1" 200 45041 
    "http://www.oreillynet.com/ruby/blog/" "Mozilla/5.0 (Macintosh; U; PPC 
    Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) NetNewsWire/2.1" 

So, that’s it. If you think of anything I missed or got wrong, please leave it in the comments.

Thank you,
Rob


Deploying Rails with Pound in Front of Mongrel, Lighttpd, and Apache 10 comments

Posted by robon July 08, 2006

It seems that the Rails deployment dilemma is finally getting the care that it desperately needed to make the whole situation less of a pain in the neck. For a while there, everyone was hanging on the edge or their seats, hoping that Apache developers would fix Apache’s FastCGI interface that had fallen out of maintainence. While waiting for that, many people flocked to Lighttpd as a promising faster/lighter alternative to Apache that seemed to have its FastCGI interface under control.

Meanwhile, development of an alternative to WEBrick was under way, by a guy named Zed Shaw, called Mongrel. It seems Zed just got fed up and decided to change the Rails deployment world with his own bare hands. This is good news for all of us and the best thing about Zed is how much he cares about getting a situation together that works for everyone. (Also, if you ever need help with Mongrel, Zed is always right there with the answer.) So, this seemingly simple little pure HTTP web server has turned out to be much more useful than anticipated. With the introduction of the mongrel_cluster gem, serving Rails applications with a small pack of Mongrel processes and a load balancer is a snap.

Software load balancers that people are using include Pen, Pound, and Apache2’s mod_proxy_balancer. Recently, on the main Rails blog, there was a post about setting up lighttpd with a single proxy to Pound which in turn served up a cluster of Mongrel processes. In reading the post and its comments I realized there seems to be some confusion about where Pound can exist within a typical deployment setup. A few people commented that with Lighttpd in front of Pound, the value of request.remote_ip was 127.0.0.1 (localhost) or something other then the IP of each external request.

There is no reason that Pound can’t sit out in front of Lighttpd, a pack of Mongrels, or any other web servers waiting to process and respond to requests. Because of the way Pound handles headers, the correct value of request.remote_ip is preserved by the time the request is received by Rails. In any case, the Pound docs send the vibe that the intention is to have Pound in front of other servers. Here’s a bit from the latest Pound README that talks about what Pound is and how it can be used:

Pound-2.0.9/README
  1. a reverse-proxy: it passes requests from client browsers to one or more back-end servers.
  2. a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information.
  3. an SSL wrapper: Pound will decrypt HTTPS requests from client browsers and pass them as plain HTTP to the back-end servers.
  4. an HTTP/HTTPS sanitizer: Pound will verify requests for correctness and accept only well-formed ones.
  5. a fail over-server: should a back-end server fail, Pound will take note of the fact and stop passing requests to it until it recovers.
  6. a request redirector: requests may be distributed among servers according to the requested URL.

It’s number six above that give Pound its flexibility in terms of serving different requests to different back-end web servers. So, on with a simple demo of a Pound setup that passes requests back to a cluster of Mongrels, an Apache server, and a Lighttpd server.

Step 1. Get/Install Pound

Start by downloading the latest version of Pound and unpacking it somewhere nice (like /usr/local/src).

Wait a second… Pound, like many tools that make liberal use of regular expressions, prefers that you have PCRE (Perl Compatible Regular Expression) installed. If you don’t, download and install it with:


$ wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-5.0.tar.gz 
$ tar xvzf pcre-5.0.tar.gz 
$ cd pcre--5.0
$ ./configure
$ make 
$ sudo make install

Next, move into /usr/local/src (or wherever you downloaded Pound) and configure and build with:


$ tar xvzf Pound-2.0.9.tgz
$ cd Pound-2.0.9
$ ./configure \
>   --with-ssl=ssl_dir # SSL support, if needed.
$ make
$ sudo make install

Debian users can “apt-get install” Pound but likely won’t get the latest version without some sources.list hackery. If you have installed Pound in Debian, you’ll need to edit the following file and flip the startup bit:

/etc/default/pound

startup=1

On my current system (where this blog lives), I installed Pound (v2.0) with apt-get install Pound and then realized that I wanted the lasted version of Pound (v2.0.9), so I built it from source. But, the nice thing about the Debian package is that it gives you a start-up script (/etc/init.d/pound) which is very handy, especially for a service that should always be up.

So, after installing Pound from source, I ended up with apt’s Pound in /usr/sbin/pound, and my Pound in /usr/local/sbin/pound. To get the start-up script to use the newer Pound, I made this change:

/etc/init.d/pound

#DAEMON=/usr/sbin/pound
DAEMON=/usr/local/sbin/pound

While apt’s Pound stores it configuration file in /etc/pound, the new pound looks for its config info in /usr/local/etc/pound.cfg. To make things work I create a sym link, with:


$ sudo ln -s /etc/pound/pound.cfg /usr/local/etc/pound.cfg

With Pound installed and acting as the ring leader for requests to the various listening web servers, the next step is to configure it. But wait! We need a nice figure that illustrates a deployment plan.

Step 2. Plan Your Deployment Setup

What we want is for Pound to do some request routing for us as well as some load balancing. All incoming requests to blog.tupleshop.com should be sent to a small cluster of two Mongrel processes. Requests for www.tupleshop.com should be sent to Apache running PHP. Finally, any requests for “mov” files should be handled by Lighttpd.

Let’s start by configuring Pound.

Step 3. Configure Pound

The pound configuration file contains three types of directives: global, listener, and service. The global directives in this configuration specify the user and group that the pound service is to run as. The log level states how much logging we want pound to send to syslog, if any. Loglevel takes the following values:

  • 0 – for no logging
  • 1 – (default) for regular logging
  • 2 – for extended logging (show chosen backend server as well)
  • 3 – for Apache-like format (Common Log Format with Virtual Host)
  • 4 – (same as 3 but without the virtual host information)

The listener directive, ListenHTTP, specifies the IP address and port that Pound is to listen for quests from (you’ll want a real address here).

The remainder of the configuration file contains service directives that define what back end servers are to handle various types of requests. The first Service directive states that anything with a Host header containing www.tupleshop.com should be routed to port 8080 of the localhost address (127.0.0.1). In this case Apache, running PHP (among other things), is listening on port 8080, waiting to handle whatever requests Pound passes to it. (Note: There’s no reason this IP couldn’t be on another physical server, but in this case all three web servers are on the same box.)

The next Service directive uses URL ”..mov”* to match requsts for quicktime movie files. For performance reasons, we want Lighty to handle these requests exclusively. So while where request for http://blog.tupleshop.com would be handled by the Mongrel cluster, a request for http://blog.tupleshop.com/zefrank.mov would never make it to Mongrel and would instead be served by Lighty. The location of .mov files on the server is pretty much irrelevant here—they can be anywhere as long as Lighty knows where to find them.

The finial Service directive effectively serves as a catch-all because it’s the last one in the file, and because there is no URL or Header matching criteria defined. This is the one doing actual load balancing to the Mongrel processes. In this case there are two Mongrel processes listening on ports 9000 and 9001, on the local IP address.

/etc/pound/pound.cfg

User        "www-data" 
Group       "www-data" 
LogLevel    2
Alive       30

ListenHTTP
    Address 123.123.123.123
    Port    80
End

Service
    HeadRequire "Host:.*www.tupleshop.com.*" 
    BackEnd 
        Address 127.0.0.1
        Port    8080    
    End
    Session 
        Type    BASIC   
        TTL     300     
    End
End

Service
    URL ".*.mov" 
    BackEnd 
        Address 69.12.146.109
        Port    8081    
    End
    Session 
        Type    BASIC   
        TTL     300     
    End
End

Service
    # Catch All
    BackEnd 
        Address 127.0.0.1
        Port    9000    
    End     
    BackEnd
        Address 127.0.0.1
        Port    9001
    End
    Session
        Type    BASIC
        TTL     300
    End
End
Okay, with Pound all configured, we can start the service with:

$ sudo /etc/init.d/pound start

If there’s a problem with your configuration file, pound won’t say much about it to STDERR, so it’s a good idea to be watching /var/log/syslog as you start Pound until you’re confident that you configuration is solid.

None of the services that Pound directs requests to have to be running when you start Pound. But if they aren’t, you’ll get HTTP 503 errors from requests bound for servers that aren’t running or are improperly configured. One way to look at the Pound configuration file is as a specification for how the rest of your services should be set up. If you forget what port a server should listen on, always refer back to Pounds config file.

Tracking down problem with so many web servers running can get a little hairy, but if you stay organized and are methodical about your setup (like knowing where each server logs events), it shouldn’t be too bad at all.

This post is already too long so I’m not going to get into configuring a Mongrel cluster, Lighttpd, or Apache. Instead, I’ll just include my config files for reference.

Step 4. Configure the Rest of Your Servers

First, my mongrel_cluster config file.

/var/www/robblelog/config/mongrel_cluster.yml

--- 
user: mongrel 
cwd: /var/www/robblelog
port: "9000" 
environment: production
group: www-data
pid_file: log/mongrel.pid
servers: 2
which I start with:

$ sudo mongrel_rails cluster::start

A slicker way to handle this is to copy the included mongrel_cluster start-up file to you system’s initialization scripts directory so your Mongrels will survive a server reboot.

Next, is my lighty config file. It’s pretty simple with the document root pointing to the public directory of the Rails project: robblelog.

/etc/lighttpd/lighttpd.conf

server.modules = ( 
            "mod_access",
            "mod_alias",
            "mod_accesslog",
)

server.port = 8081
server.bind = "127.0.0.1" 
server.document-root = "/var/www/robblelog/public/" 

server.username  = "www-data" 
server.groupname = "www-data" 
server.pid-file  = "/var/run/lighttpd.pid" 
server.errorlog  = "/var/log/lighttpd/error.log" 
index-file.names = ( "index.php", "index.html", 
                     "index.htm", "default.htm" )
accesslog.filename = "/var/log/lighttpd/access.log" 

## mimetype mapping 
include_shell "/usr/share/lighttpd/create-mime.assign.pl" 

Finially, a small chunk of my Apache2 configuration:


Listen 8080

NameVirtualHost *:8080

<VirtualHost *:8080> 
    ServerAdmin admin@tupleshop.com

    ServerName   www.orsini.us
    ServerAlias  www.orsini.us
    ServerAlias  orsini.us
    DocumentRoot /var/www/tupleshop.com

    # ...
</VirtualHost>

Appendix A: Debugging

If you’re used to only running a single web server on your system it may be a little daunting to have more servers, all listeing on different ports. How can you know what is up and running a which ports are available? Install and run nmap. Use the following command to display what services are listenting on different ports:


$ sudo nmap -sT -O localhost

To see what Internet services are currently tied up, use the following lsof command.


$ lsof -i -P

Finially, Pound logs to /var/log/syslog, and Mongrel, Apache, and Lighttpd all have their own logging configurations. Between network inspection and watching your logs, you should be able to naild down most configuration issues.

Of of the most obvious tweaks you can make to the Mongrel cluster is to specify more or less Mongrel processes to run. You have to play with this number based on your anticipated traffic load and your available system resoources (mostly RAM). The standard tool for measureing how well any of your servers are performing is httperf. Here’s a example that blasts port 8080 with 100 requests:


$ httperf --port 8080 --server 127.0.0.1 \
>                  --num-conn 100 --timeout 5

The number you want to dig out of the output of httperf is req/s (requests per second). Of course, a higher number is better.

Appendix B: Logging Remote IP Addresses

One problem that is a show stopper for many people who might otherwise put their web servers behind Pound is the issues of access logging not preserving the IP address of the original request. Instead it shows up as 127.0.0.1.

Luckily, of the very few modifications Pound makes to requests, it adds an X-Forwarded-For header containing the IP address of the original request. The general format is:


X-Forwarded-for: client-IP-address

Note that other proxies my already have added an X-Forwarded-for header (there can be more then one, as allowed by the HTTP RFC’s). In this case, Pound adds its own X-Forwarded-for header, last, after the others.

To capture the IP address from this header in your Apache common log format, replace “h” (the remote host format directive) with \”{X-Forwarded-for}i\”. The whole format definition:

LogFormat "\"%{X-Forwarded-for}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" 
\"%{User-Agent}i\"" combined

Appendix C: More Logging (Extra Credit)

Another “interesting” solution (more to demonstrate an advanced customization option) to this is to have Pound add an additional header to each request, called something like “REAL_REMOTE_ADDR”. This can be done easily by recompiling Pound with a small addition to the source. Don’t worry, you don’t have to be a C Guru for this. It’s very simple. The following excerpt from Pound’s http.c shows where you want to add the one line that adds the “REAL_REMOTE_ADDR” header.

/usr/local/src/Pound-2.0.9/http.c (~line 850)

/* put additional client IP header */
BIO_printf(be, "X-Forwarded-For: %s\r\n", inet_ntoa(from_host));
BIO_printf(be, "REAL_REMOTE_ADDR: %s\r\n", inet_ntoa(from_host));
Save this file with change you made and recompile Pound, with:

$ cd /usr/local/src/Pound-2.0.9
$ ./configure
$ make
$ sudo make install

Now that the new header is being added to each request, you have to alter the log file format for each web server to acknollage the additional field. This is also pretty simple to do. In the case of Apache, you just make a small change to the definition of log file format you’re using. I juse the “combined” format and here is a definition that replaces “%h” with our custom header string.

LogFormat "\"%{REAL_REMOTE_ADDR}i\" %l %u %t \"%r\" %>s %b \"%{Referer}i\" 
 \"%{User-Agent}i\"" combined

The result is that your access logs should apeear just as they would if Apache was receiving external requests directly.

So, that’s it. Good luck and let me know if I missed anything, in the comments.