Blog
Posted Sunday, November 01, 2009 06:59:17 PM by dfe

In part 1 we explored traditional URL request handling in the webserver and began to touch on URL request handling by application code. In part 2 we detailed how the WebObjects adaptor is able to send all URLs beginning with a particular string to the correct application instance and how the application instance can dispatch the request in any fashion it wishes.

The final installment of this series, The Duke of URL, Part 3, will explore an entirely new mechanism for handling URLs within WebObjects

The Goal: Simple URLs

When I set out to write the new blogging software I had one simple goal: create a dynamic content platform that doesn't look dynamic.

Borrowing a scheme found on some WordPress sites I decided that the entries should be identified by a date and name. So /blog/2009/11/01/the-duke-of-url-part-1/ should retrieve an article published on November 1, 2009 named "the-duke-of-url-part-1".

As we learned in part 2, WebObjects is only capable of handling URLs within its special /cgi-bin/WebObjects/ namespace. But as we learned in part 1, neither PHP nor ASP.NET handle URLs like this either. For PHP one must use mod_rewrite in Apache. For ASP.NET one may use an equivalent for IIS or if using IIS 7 the site designer can opt to send requests to the application and do the rewriting in ASP.NET code.

So when it comes to truly user-friendly URLs all environments are on basically equal footing.

Rewrite to what?

In the case of WordPress it's actually very simple. Every URL that cannot be resolved by Apache to a file or a directory is rewritten to /index.php. That's it. Without any special flags a mod_rewrite RewriteRule matches the URL but rewrites the file path. The request URL seen by index.php will be the original unadulterated request URL.

IIS with ASP.NET is similar. An IIS module (pre-IIS7 integrated pipeline) or ASP.NET code (IIS7 integrated pipeline) rewrites URLs based on regex to paths to .aspx pages. You can actually rewrite to a page without changing the query-string and get at the original URL by using Request.RawUrl to determine what the user was really trying to see.

Often times though the rewrite will pull certain pieces out of the URL and pass them to the .aspx as part of the query string. So I might rewrite /blog/2009/11/01/the-duke-of-url-part1 to /DisplayBlogEntry.aspx?postDate=2009/11/01&name=the-duke-of-url-part1.

In WebObjects I could have done something similar. That is, I could rewrite to /cgi-bin/WebObjects/WOBlog.woa/wa/displayPost? postDate=2009/11/01&name=the-duke-of-url-part1. Unlike with PHP and ASP.NET I am required to rewrite the URL. The reason for this is that there are no files on disk to represent the WO pages. Instead the request has to make it to mod_WebObjects or the WebObjects CGI adaptor. For this to happen I have to use the "passthrough" (PT) flag on the rewrite. A side effect of this is that the original URL is completely destroyed.

My initial hack actually rewrote /blog to /cgi-bin/WebObjects/WOBlog.woa/wa/content?path=/blog. The thing is, I found this a bit dirty as it winds up taking something that was in the URL and placing it into the query string. But as we know from part 2 I could instead register a custom WORequestHandler and let it handle all path components from there.

The Subgoal: Straightforward Rewriting

Because I must rewrite the URL and because I will never be able to see the original URL it is desirable that the URL I do see be as similar as possible to the original URL. So the scheme I came up with was to rewrite /blog to /cgi-bin/WebObjects/WOBlog.woa/content/blog. Then later on I might have /darwin go to /cgi-bin/WebObjects/WOBlog.woa/content/darwin. Or I might eventually just rewrite everything under / to /cgi-bin/WebObjects/WOBLog.woa/content/. In fact, I am in the process of doing this right now.

The advantage of this scheme is that request(). requestHandlerPath() is, in and of itself, the exact original URL minus the leading / and the query string, if any. Now it _could_ be the case that the user actually followed a link with one of these dastardly long URLs but the theory is that if I don't ever link to them on my site then no one will ever find them. If they do they show the correct content so it's not a huge deal. Yes, I'm sort of breaking this rule by posting the URL format. Try it if you want.

The second advantage of this scheme is that the URL rewrite rules are dead simple:

RewriteRule ^/(blog(/.*|))$ /cgi-bin/WebObjects/WOBlog.woa/content/$1 [PT]

The outer grouping (which is also the first) contains the literal text "blog" followed by an inner grouping which allows either / followed by zero or more characters or the empty string. The $ at the end signifies that the end of the string must be matched. This prevents /blogfoo or /blog2 or whatever else from matching but allows both /blog by itself and /blog/ and everything under it to be matched.

The URL is rewritten into the request space of the content request handler and the PT flag indicates that Apache must rewrite the URL (not the file path), stop rewriting, and pass the URL on to the next phase of request handling where it will be picked up by mod_WebObects.

Advanced rewriting

Before we discuss the WebObjects Java code needed to implement the request handler let's first consider what happens here. The URL is being rewritten to something inside of /cgi-bin/WebObjects/WOBlog.woa/content/. But there's something missing. There can optionally be an application instance number like /cgi-bin/WebObects/WOBlog.woa/1/content/. Without it the request will be handled by any available instance. If the user is not logged in this will not be a problem. If the user already has a session then that session is specific to the application instance. So our simplistic rewriting is going to cause the user to potentially hit a different WOBlog instance and wind up not being signed in.

WebObjects has the capability of storing the instance ID (known as "woinst") in a cookie. By default it issues the client a cookie named "woinst" with a cookie path of the application's URL prefix. This allows WOBlog to have a woinst for /cgi-bin/WebObjects/WOBlog.woa and SomeOtherApp to have a woinst for /cgi-bin/WebObjects/SomeOtherApp.woa. Clearly we will need to change this and the scheme I decided on was to append a dash and the application name. So if you're signed in you have a cookie named "woinst-WOBlog" for the "/" path. More on this later.

For now let's see how we use RewriteCond to accomplish what we want:

RewriteCond %{HTTP_COOKIE} woinst-WOBlog=([-0-9][0-9]*)
RewriteRule ^/(blog(/.*|))$ /cgi-bin/WebObjects/WOBlog.woa/%1/content/$1 [PT]

Very simply, we match on a cookie named woinst-WOBlog with a numeric value. It is important that we only put integer (possibly negative) values into the URL. If it's anything other than an integer the WO Adapter will ignore it and the application code will think the instance number was missing and use it as the request handler key.

Apache mod_rewrite rules run in order so you want to have this rule above the non-cookie one. You need both because this rule will only match when the cookie exists. It may seem like a bit of a pain to require two rules but it will prove useful in the future.

Caching content

Suppose this blog entry winds up on Slashdot or digg (unlikely, I know). Apache, WebObjects, and PostgreSQL are all pretty efficient but you know what, there's nothing as efficient as having the webserver serve a completely static file directly off the disk (and more likely than that, directly out of the disk cache in RAM).

To accomplish this I could put my woinst RewriteCond/RewriteRule pair at the top. Any one who is logged in will get dynamically generated content (perhaps even indicating to them that they are logged in and having a welcome message or whatever). Anyone who is not logged in will fall through. From there I can actually have a directory on disk like /blog/2009/11/01/the-duke-of-url-part-1/ with an index.html file inside of it. Then I can add some conditions to my main RewriteRule to skip it if a file can be found.

If a file cannot be found then it will go in to WebObjects. And one of the things I could do as part of the request handling is actually go ahead and fire off a thread to generate new content. Of course if hundreds of thousands of people then decide to sign up for an account or log in I'm still screwed but if that happens then trying to optimize requests on one server is completely the wrong approach.

Handling the request

Alright already. Time for the interesting stuff.

In part 2 we touched on the registerRequestHandler method of WOApplication. Now let's put it to some use. To register our request handler we're going to put this in our Application class:

    protected ContentRequestHandler _contentRequestHandler;

    public Application() {
        NSLog.out.appendln("Welcome to " + name() + " !");
        /* ** put your initialization code in here ** */
        _contentRequestHandler = new ContentRequestHandler();
        registerRequestHandler(_contentRequestHandler, "content");
    }

For this to work we're going to need a new class named ContentRequestHandler. I'm sure I will get more clever with this but for now this is exactly what I'm using.

public class ContentRequestHandler extends WORequestHandler
{
    public WOResponse generateRequestRefusal(WORequest aRequest)
    {
        WODynamicURL aURIString = aRequest._uriDecomposed();
        String contentString = (new StringBuilder())
            .append("Sorry, your request could not immediately be processed. Please try this URL: <a href=\"")
            .append(aURIString).append("\">").append(aURIString).append("</a>").toString();
        aURIString.setApplicationNumber("-1");
        WOResponse aResponse = WOApplication.application().createResponseInContext(null);
        WOResponse._redirectResponse(aResponse, aURIString.toString(), contentString);
        return aResponse;
    }
    
    private WOResponse nullResponse;
    public WOResponse nullResponse()
    {
        if(nullResponse == null)
        {
            nullResponse = WOApplication.application().createResponseInContext(null);
            nullResponse.setStatus(500);
            nullResponse.appendContentString("<html><head><title>Errorlt;/title></head><body>Your request produced an error.</body></html>");
        }
        return nullResponse;
    }
    
    public WOResponse handleRequest(WORequest request)
    {
        WOResponse aResponse = null;
        WOApplication anApplication = WOApplication.application();
        if(anApplication.isRefusingNewSessions() && !request.isSessionIDInRequest() && request.isUsingWebServer())
        {
            aResponse = generateRequestRefusal(request);
        } else
        {
            Object lock = anApplication.requestHandlingLock();
            if(lock != null)
                synchronized(lock)
            {
                aResponse = _handleRequest(request);
            }
            else
                aResponse = _handleRequest(request);
        }
        if(aResponse == null)
            aResponse = nullResponse();
        return aResponse;
    }
    
    public WOResponse _handleRequest(WORequest request)
    {
        // Retrieve the application object.  We need to inform it of awake/sleep
        // and use some of its helper methods.
        WOApplication application = WOApplication.application();

        WOResponse response;
        WOContext context;

        application.awake();
        try {
            // Instantiate the action object for this request.
            // The WOAction sets up the context and restores the session and so on.
            WOAction action = new ContentAction(request);

            // Retrieve the context object from the action.
            context = action.context();
            
            // Retrieve the content path.  e.g. blog or blog/2009/10/10/foobar or whatever.
            String contentPath = request.requestHandlerPath();

            
            // TODO: We probably could use some exception handling here.
            // 1. performActionNamed throws generating the WOActionResults
            // 2. performActionNamed returns null
            // 3. generateResponse throws
            // 4. generateResponse returns null (although we do kind of handle this already).


            // Ask the action object to handle the request.  Unlike normal action objects the
            // ContentAction object takes a path instead of the first part of a method name.
            WOActionResults actionResults = action.performActionNamed(contentPath);

            // Generate the response object.
            if(actionResults != null)
                response = actionResults.generateResponse();
            else
                response = null;

            // FIXME: When we do add error handling, do we or don't we save the session in the
            // event of an error?
            if(context != null)
            {
                // Check the session in to the session store.  Particularly important if the
                // session store is out of process.
                application.saveSessionForContext(context);
            }
        }
        finally {
            // End of request.
            application.sleep();
        }

        // Ah, the joys of calling private APIs.  For some reason both WOActionRequestHandler
        // and WOComponentRequestHandler know about and call this method as virtually the
        // last thing before returning the response.  I am somewhat unclear as to why this
        // method is private and why it isn't called by our caller instead of within the
        // request handler.
        // It is imperative that this method be called because it generates HTTP Set-Cookie
        // headers from the NSArray<WOCookie>.  Without this no cookies will ever function.
        if(response != null)
            response._finalizeInContext(context);

        return response;
    }
}

In reality, most of this is not actually my code but instead Apple's code. The handleRequst() method comes from WOActionRequestHandler. The nullResponse() and generateRequestRefusal() methods come from WODirectActionRequestHandler. Yes, I do feel a bit dirty decompiling Apple's code and pasting it verbatim into my own code but I'll get over it. I suppose another option is to derive from WOActionRequestHandler and override only _handleRequest. But that is frankly just asking for trouble with an upgrade. By instead duplicating the code and using the publicly available methods to register the request handler we are less likely to break with a future release of WebObjects.

The _handleRequest() method is where the meat of the implementation lies. Unfortunately due to copious amounts of exception handling in the WOActionRequestHandler._handleRequst() method it isn't entirely easy to see what's going on. One of the failings of jad is that exception handlers within exception handlers wind up generating completely invalid java code. Still, we can get the gist of it.

The important thing is to wake up WOApplication and ensure we put it back to sleep in all cases (hence try/finally). The other important thing is to get a WOContext object because it's not possible to generate a WOComponent without one. We could create our own WOContext but it's actually easier to construct a WOAction subclass because the WOAction constructor takes care of this.

Like any WOAction class our ContentAction class implements performActionNamed(). Unlike most action classes, we're going to pass it in a path instead of an action name. What we get back will be a WOActionResults (which might be a WOResponse, a WOComponent, or potentially something else). Honestly, we don't care. As long as we can get a WOResponse from it by calling generateResponse we have all we need.

Once we have the WOResponse it is important that we tell the application to save the session for the action's context. Actually it's really not so important in the default case where Session objects stay in application RAM and that function is a no-op. But later on if we decide to use serialized sessions this would be important.

The last thing we do, after putting the application to sleep, is positively evil. The WOResponse class provides the ability to manage a set of WOCookie objects (actually this is in WOMessage because both the request and the response have cookies). The problem is that cookies ultimately consist of HTTP headers. Rather than modifying the headers every time a cookie is changed there is instead a _finalizeCookies() method. The _finalizeInContext() method calls _finalizeCookies() in addition to doing a few other tasks like setting content-length. If you don't call it any cookies that are set on to the response never make it to the client. And of course this method is private and not documented. Seems like a little bit of an oversight but at least the method is technically public to Java. This little scheme of marking methods private by prefacing their name with an underscore hails from Objective-C. It's truly a godsend for cases like this where the designer overlooked something important. Yes, it's not technically public so you call it at your own risk, but realistically it's reasonably safe to do so until a later version comes up with a real solution.

The ContentAction class

The last piece of the puzzle is the ContentAction class. This time I am not revealing the full code as performActionNamed() in particular is wildly specific to this site.

class ContentAction extends WOAction
{
    ContentAction(WORequest request)
    {
        super(request);
    }

    private static String _getSessionIDFromValuesOrCookie(WORequest request, boolean lookInCookiesFirst)
    {
        boolean isStream = WOApplication.application().streamActionRequestHandlerKey().equals(request.requestHandlerKey());
        String aSessionID = null;
        if(lookInCookiesFirst)
        {
            aSessionID = request.cookieValueForKey(WOApplication.application().sessionIdKey());
            if(aSessionID == null && !isStream)
                aSessionID = request.stringFormValueForKey(WOApplication.application().sessionIdKey());
        } else
        {
            if(!isStream)
                aSessionID = request.stringFormValueForKey(WOApplication.application().sessionIdKey());
            if(aSessionID == null)
                aSessionID = request.cookieValueForKey(WOApplication.application().sessionIdKey());
        }
        return aSessionID;
    }

    public String getSessionIDForRequest(WORequest request)
    {
        String aSessionID = null;
        if(request != null)
            aSessionID = _getSessionIDFromValuesOrCookie(request, false);
        return aSessionID;
    }
    
    public WOActionResults performActionNamed(String anActionName)
    {
        return SomethingUseful;
    }
}

Again we have run in to a situation where we need to provide a method getSessionIDForRequest. One easy way to avoid this is actually to just derive our class from WODirectAction. What I don't like about this is that any class derived from WODirectAction is subject to reflection by WODirectActionRequestHandler. So now someone could do .../wa/content/foo and cause ContentAction to be instantiated and a fooAction() method to be called on it. I find this undesirable so I'd just as soon implement the one method that isn't obvious.

As it turns out, the WODirectAction code uses _getSessionIDFromValuesOrCookie() private method of WORequest. And this time it is truly private. No matter, we simply copy the implementation, make it a static method in our class and add a WORequest parameter which was of course implicit in the original version. This is another one of those methods that seems like it probably should have been public to begin with. We could probably cut this implementation down some because we can be certain that the request handler key is not the streamActionRequestHandlerKey. But whatever. The code is there, easy to copy/paste from the decompiled stuff.

Getting the right cookies

One of the great things about WebObjects is that there are a fair amount of methods that can be overridden. Cookie generation is no exception. To begin with, we want our cookies generated with the "/" path so the client browser will send them for all URLs on the domain. This is accomplished by overriding one method in your Session class.

    public String domainForIDCookies()
    {
        return "/";
    }

Because the cookies are now domain wide it's a good idea to unique them to the application. Recall that the original path was /cgi-bin/WebObjects/WOBlog.woa. The only unique part of that URL is WOBlog, the application's name. So when deciding on cookie names this is all we need to know. Add these two methods to your Application class.

    private String _sessionIdKey;
    /*!
     @abstract   Overrides sessionIdKey to return one including the app name
     */
    public String sessionIdKey()
    {
        if(_sessionIdKey == null)
            _sessionIdKey = String.format("%s-%s", super.sessionIdKey(), name());
        return _sessionIdKey;
    }
    
    private String _instanceIdKey;
    /*!
     @abstract   Overrides instanceIdKey to return one including the app name
     */
    public String instanceIdKey()
    {
        if(_instanceIdKey == null)
            _instanceIdKey = String.format("%s-%s", super.instanceIdKey(), name());
        return _instanceIdKey;
    }

That's it. You now have WebObjects generating unique cookies that won't conflict with any other WebObjects application on the same domain. There is one little nitpick though: other WebObjects applications (and in fact any dynamic content on the site including PHP scripts and the like) will be sent these cookies by the browser. If you are going to all the trouble of rewriting URLs it's likely you control the entire domain so this is not a problem.

Conclusion

Basically, this is it. All you need to do is provide an implementation for performActionNamed() and you'll be generating your own dynamic pages in no time flat. For this blog there is actually a fairly involved scheme where the code pulls a date and name out of the URL and turns it in to an EOQualifier that can be used for a database query (although I could potentially have used a simpler matching dictionary).

Perhaps more interesting though is a more recent feature. All of the existing .html pages that have been on the site for a few years are now wrapped with the standard navigation. Since I can now process URLs with any number of path components I can, just like a webserver, turn these in to file paths and load my existing .html files from disk.

In the future I can create virtual content stored in a database. Or perhaps I could store content in an SVN repository where I can keep revisions of it all managed from directly within WebObjects. The possibilities are endless.