Blog
Posted Sunday, November 01, 2009 06:59:17 PM by dfe

In part 1 we explored traditional URL request handling in the webserver and began to touch on URL request handling by application code. In part 2 we detailed how the WebObjects adaptor is able to send all URLs beginning with a particular string to the correct application instance and how the application instance can dispatch the request in any fashion it wishes.

The final installment of this series, The Duke of URL, Part 3, will explore an entirely new mechanism for handling URLs within WebObjects

The Goal: Simple URLs

When I set out to write the new blogging software I had one simple goal: create a dynamic content platform that doesn't look dynamic.

Borrowing a scheme found on some WordPress sites I decided that the entries should be identified by a date and name. So /blog/2009/11/01/the-duke-of-url-part-1/ should retrieve an article published on November 1, 2009 named "the-duke-of-url-part-1".

As we learned in part 2, WebObjects is only capable of handling URLs within its special /cgi-bin/WebObjects/ namespace. But as we learned in part 1, neither PHP nor ASP.NET handle URLs like this either. For PHP one must use mod_rewrite in Apache. For ASP.NET one may use an equivalent for IIS or if using IIS 7 the site designer can opt to send requests to the application and do the rewriting in ASP.NET code.

So when it comes to truly user-friendly URLs all environments are on basically equal footing.

Rewrite to what?

In the case of WordPress it's actually very simple. Every URL that cannot be resolved by Apache to a file or a directory is rewritten to /index.php. That's it. Without any special flags a mod_rewrite RewriteRule matches the URL but rewrites the file path. The request URL seen by index.php will be the original unadulterated request URL.

IIS with ASP.NET is similar. An IIS module (pre-IIS7 integrated pipeline) or ASP.NET code (IIS7 integrated pipeline) rewrites URLs based on regex to paths to .aspx pages. You can actually rewrite to a page without changing the query-string and get at the original URL by using Request.RawUrl to determine what the user was really trying to see.

Often times though the rewrite will pull certain pieces out of the URL and pass them to the .aspx as part of the query string. So I might rewrite /blog/2009/11/01/the-duke-of-url-part1 to /DisplayBlogEntry.aspx?postDate=2009/11/01&name=the-duke-of-url-part1.

In WebObjects I could have done something similar. That is, I could rewrite to /cgi-bin/WebObjects/WOBlog.woa/wa/displayPost? postDate=2009/11/01&name=the-duke-of-url-part1. Unlike with PHP and ASP.NET I am required to rewrite the URL. The reason for this is that there are no files on disk to represent the WO pages. Instead the request has to make it to mod_WebObjects or the WebObjects CGI adaptor. For this to happen I have to use the "passthrough" (PT) flag on the rewrite. A side effect of this is that the original URL is completely destroyed.

My initial hack actually rewrote /blog to /cgi-bin/WebObjects/WOBlog.woa/wa/content?path=/blog. The thing is, I found this a bit dirty as it winds up taking something that was in the URL and placing it into the query string. But as we know from part 2 I could instead register a custom WORequestHandler and let it handle all path components from there.

The Subgoal: Straightforward Rewriting

Because I must rewrite the URL and because I will never be able to see the original URL it is desirable that the URL I do see be as similar as possible to the original URL. So the scheme I came up with was to rewrite /blog to /cgi-bin/WebObjects/WOBlog.woa/content/blog. Then later on I might have /darwin go to /cgi-bin/WebObjects/WOBlog.woa/content/darwin. Or I might eventually just rewrite everything under / to /cgi-bin/WebObjects/WOBLog.woa/content/. In fact, I am in the process of doing this right now.

The advantage of this scheme is that request(). requestHandlerPath() is, in and of itself, the exact original URL minus the leading / and the query string, if any. Now it _could_ be the case that the user actually followed a link with one of these dastardly long URLs but the theory is that if I don't ever link to them on my site then no one will ever find them. If they do they show the correct content so it's not a huge deal. Yes, I'm sort of breaking this rule by posting the URL format. Try it if you want.

The second advantage of this scheme is that the URL rewrite rules are dead simple:

RewriteRule ^/(blog(/.*|))$ /cgi-bin/WebObjects/WOBlog.woa/content/$1 [PT]

The outer grouping (which is also the first) contains the literal text "blog" followed by an inner grouping which allows either / followed by zero or more characters or the empty string. The $ at the end signifies that the end of the string must be matched. This prevents /blogfoo or /blog2 or whatever else from matching but allows both /blog by itself and /blog/ and everything under it to be matched.

The URL is rewritten into the request space of the content request handler and the PT flag indicates that Apache must rewrite the URL (not the file path), stop rewriting, and pass the URL on to the next phase of request handling where it will be picked up by mod_WebObects.

Advanced rewriting

Before we discuss the WebObjects Java code needed to implement the request handler let's first consider what happens here. The URL is being rewritten to something inside of /cgi-bin/WebObjects/WOBlog.woa/content/. But there's something missing. There can optionally be an application instance number like /cgi-bin/WebObects/WOBlog.woa/1/content/. Without it the request will be handled by any available instance. If the user is not logged in this will not be a problem. If the user already has a session then that session is specific to the application instance. So our simplistic rewriting is going to cause the user to potentially hit a different WOBlog instance and wind up not being signed in.

WebObjects has the capability of storing the instance ID (known as "woinst") in a cookie. By default it issues the client a cookie named "woinst" with a cookie path of the application's URL prefix. This allows WOBlog to have a woinst for /cgi-bin/WebObjects/WOBlog.woa and SomeOtherApp to have a woinst for /cgi-bin/WebObjects/SomeOtherApp.woa. Clearly we will need to change this and the scheme I decided on was to append a dash and the application name. So if you're signed in you have a cookie named "woinst-WOBlog" for the "/" path. More on this later.

For now let's see how we use RewriteCond to accomplish what we want:

RewriteCond %{HTTP_COOKIE} woinst-WOBlog=([-0-9][0-9]*)
RewriteRule ^/(blog(/.*|))$ /cgi-bin/WebObjects/WOBlog.woa/%1/content/$1 [PT]

Very simply, we match on a cookie named woinst-WOBlog with a numeric value. It is important that we only put integer (possibly negative) values into the URL. If it's anything other than an integer the WO Adapter will ignore it and the application code will think the instance number was missing and use it as the request handler key.

Apache mod_rewrite rules run in order so you want to have this rule above the non-cookie one. You need both because this rule will only match when the cookie exists. It may seem like a bit of a pain to require two rules but it will prove useful in the future.

Caching content

Suppose this blog entry winds up on Slashdot or digg (unlikely, I know). Apache, WebObjects, and PostgreSQL are all pretty efficient but you know what, there's nothing as efficient as having the webserver serve a completely static file directly off the disk (and more likely than that, directly out of the disk cache in RAM).

To accomplish this I could put my woinst RewriteCond/RewriteRule pair at the top. Any one who is logged in will get dynamically generated content (perhaps even indicating to them that they are logged in and having a welcome message or whatever). Anyone who is not logged in will fall through. From there I can actually have a directory on disk like /blog/2009/11/01/the-duke-of-url-part-1/ with an index.html file inside of it. Then I can add some conditions to my main RewriteRule to skip it if a file can be found.

If a file cannot be found then it will go in to WebObjects. And one of the things I could do as part of the request handling is actually go ahead and fire off a thread to generate new content. Of course if hundreds of thousands of people then decide to sign up for an account or log in I'm still screwed but if that happens then trying to optimize requests on one server is completely the wrong approach.

Handling the request

Alright already. Time for the interesting stuff.

In part 2 we touched on the registerRequestHandler method of WOApplication. Now let's put it to some use. To register our request handler we're going to put this in our Application class:

    protected ContentRequestHandler _contentRequestHandler;

    public Application() {
        NSLog.out.appendln("Welcome to " + name() + " !");
        /* ** put your initialization code in here ** */
        _contentRequestHandler = new ContentRequestHandler();
        registerRequestHandler(_contentRequestHandler, "content");
    }

For this to work we're going to need a new class named ContentRequestHandler. I'm sure I will get more clever with this but for now this is exactly what I'm using.

public class ContentRequestHandler extends WORequestHandler
{
    public WOResponse generateRequestRefusal(WORequest aRequest)
    {
        WODynamicURL aURIString = aRequest._uriDecomposed();
        String contentString = (new StringBuilder())
            .append("Sorry, your request could not immediately be processed. Please try this URL: <a href=\"")
            .append(aURIString).append("\">").append(aURIString).append("</a>").toString();
        aURIString.setApplicationNumber("-1");
        WOResponse aResponse = WOApplication.application().createResponseInContext(null);
        WOResponse._redirectResponse(aResponse, aURIString.toString(), contentString);
        return aResponse;
    }
    
    private WOResponse nullResponse;
    public WOResponse nullResponse()
    {
        if(nullResponse == null)
        {
            nullResponse = WOApplication.application().createResponseInContext(null);
            nullResponse.setStatus(500);
            nullResponse.appendContentString("<html><head><title>Errorlt;/title></head><body>Your request produced an error.</body></html>");
        }
        return nullResponse;
    }
    
    public WOResponse handleRequest(WORequest request)
    {
        WOResponse aResponse = null;
        WOApplication anApplication = WOApplication.application();
        if(anApplication.isRefusingNewSessions() && !request.isSessionIDInRequest() && request.isUsingWebServer())
        {
            aResponse = generateRequestRefusal(request);
        } else
        {
            Object lock = anApplication.requestHandlingLock();
            if(lock != null)
                synchronized(lock)
            {
                aResponse = _handleRequest(request);
            }
            else
                aResponse = _handleRequest(request);
        }
        if(aResponse == null)
            aResponse = nullResponse();
        return aResponse;
    }
    
    public WOResponse _handleRequest(WORequest request)
    {
        // Retrieve the application object.  We need to inform it of awake/sleep
        // and use some of its helper methods.
        WOApplication application = WOApplication.application();

        WOResponse response;
        WOContext context;

        application.awake();
        try {
            // Instantiate the action object for this request.
            // The WOAction sets up the context and restores the session and so on.
            WOAction action = new ContentAction(request);

            // Retrieve the context object from the action.
            context = action.context();
            
            // Retrieve the content path.  e.g. blog or blog/2009/10/10/foobar or whatever.
            String contentPath = request.requestHandlerPath();

            
            // TODO: We probably could use some exception handling here.
            // 1. performActionNamed throws generating the WOActionResults
            // 2. performActionNamed returns null
            // 3. generateResponse throws
            // 4. generateResponse returns null (although we do kind of handle this already).


            // Ask the action object to handle the request.  Unlike normal action objects the
            // ContentAction object takes a path instead of the first part of a method name.
            WOActionResults actionResults = action.performActionNamed(contentPath);

            // Generate the response object.
            if(actionResults != null)
                response = actionResults.generateResponse();
            else
                response = null;

            // FIXME: When we do add error handling, do we or don't we save the session in the
            // event of an error?
            if(context != null)
            {
                // Check the session in to the session store.  Particularly important if the
                // session store is out of process.
                application.saveSessionForContext(context);
            }
        }
        finally {
            // End of request.
            application.sleep();
        }

        // Ah, the joys of calling private APIs.  For some reason both WOActionRequestHandler
        // and WOComponentRequestHandler know about and call this method as virtually the
        // last thing before returning the response.  I am somewhat unclear as to why this
        // method is private and why it isn't called by our caller instead of within the
        // request handler.
        // It is imperative that this method be called because it generates HTTP Set-Cookie
        // headers from the NSArray<WOCookie>.  Without this no cookies will ever function.
        if(response != null)
            response._finalizeInContext(context);

        return response;
    }
}

In reality, most of this is not actually my code but instead Apple's code. The handleRequst() method comes from WOActionRequestHandler. The nullResponse() and generateRequestRefusal() methods come from WODirectActionRequestHandler. Yes, I do feel a bit dirty decompiling Apple's code and pasting it verbatim into my own code but I'll get over it. I suppose another option is to derive from WOActionRequestHandler and override only _handleRequest. But that is frankly just asking for trouble with an upgrade. By instead duplicating the code and using the publicly available methods to register the request handler we are less likely to break with a future release of WebObjects.

The _handleRequest() method is where the meat of the implementation lies. Unfortunately due to copious amounts of exception handling in the WOActionRequestHandler._handleRequst() method it isn't entirely easy to see what's going on. One of the failings of jad is that exception handlers within exception handlers wind up generating completely invalid java code. Still, we can get the gist of it.

The important thing is to wake up WOApplication and ensure we put it back to sleep in all cases (hence try/finally). The other important thing is to get a WOContext object because it's not possible to generate a WOComponent without one. We could create our own WOContext but it's actually easier to construct a WOAction subclass because the WOAction constructor takes care of this.

Like any WOAction class our ContentAction class implements performActionNamed(). Unlike most action classes, we're going to pass it in a path instead of an action name. What we get back will be a WOActionResults (which might be a WOResponse, a WOComponent, or potentially something else). Honestly, we don't care. As long as we can get a WOResponse from it by calling generateResponse we have all we need.

Once we have the WOResponse it is important that we tell the application to save the session for the action's context. Actually it's really not so important in the default case where Session objects stay in application RAM and that function is a no-op. But later on if we decide to use serialized sessions this would be important.

The last thing we do, after putting the application to sleep, is positively evil. The WOResponse class provides the ability to manage a set of WOCookie objects (actually this is in WOMessage because both the request and the response have cookies). The problem is that cookies ultimately consist of HTTP headers. Rather than modifying the headers every time a cookie is changed there is instead a _finalizeCookies() method. The _finalizeInContext() method calls _finalizeCookies() in addition to doing a few other tasks like setting content-length. If you don't call it any cookies that are set on to the response never make it to the client. And of course this method is private and not documented. Seems like a little bit of an oversight but at least the method is technically public to Java. This little scheme of marking methods private by prefacing their name with an underscore hails from Objective-C. It's truly a godsend for cases like this where the designer overlooked something important. Yes, it's not technically public so you call it at your own risk, but realistically it's reasonably safe to do so until a later version comes up with a real solution.

The ContentAction class

The last piece of the puzzle is the ContentAction class. This time I am not revealing the full code as performActionNamed() in particular is wildly specific to this site.

class ContentAction extends WOAction
{
    ContentAction(WORequest request)
    {
        super(request);
    }

    private static String _getSessionIDFromValuesOrCookie(WORequest request, boolean lookInCookiesFirst)
    {
        boolean isStream = WOApplication.application().streamActionRequestHandlerKey().equals(request.requestHandlerKey());
        String aSessionID = null;
        if(lookInCookiesFirst)
        {
            aSessionID = request.cookieValueForKey(WOApplication.application().sessionIdKey());
            if(aSessionID == null && !isStream)
                aSessionID = request.stringFormValueForKey(WOApplication.application().sessionIdKey());
        } else
        {
            if(!isStream)
                aSessionID = request.stringFormValueForKey(WOApplication.application().sessionIdKey());
            if(aSessionID == null)
                aSessionID = request.cookieValueForKey(WOApplication.application().sessionIdKey());
        }
        return aSessionID;
    }

    public String getSessionIDForRequest(WORequest request)
    {
        String aSessionID = null;
        if(request != null)
            aSessionID = _getSessionIDFromValuesOrCookie(request, false);
        return aSessionID;
    }
    
    public WOActionResults performActionNamed(String anActionName)
    {
        return SomethingUseful;
    }
}

Again we have run in to a situation where we need to provide a method getSessionIDForRequest. One easy way to avoid this is actually to just derive our class from WODirectAction. What I don't like about this is that any class derived from WODirectAction is subject to reflection by WODirectActionRequestHandler. So now someone could do .../wa/content/foo and cause ContentAction to be instantiated and a fooAction() method to be called on it. I find this undesirable so I'd just as soon implement the one method that isn't obvious.

As it turns out, the WODirectAction code uses _getSessionIDFromValuesOrCookie() private method of WORequest. And this time it is truly private. No matter, we simply copy the implementation, make it a static method in our class and add a WORequest parameter which was of course implicit in the original version. This is another one of those methods that seems like it probably should have been public to begin with. We could probably cut this implementation down some because we can be certain that the request handler key is not the streamActionRequestHandlerKey. But whatever. The code is there, easy to copy/paste from the decompiled stuff.

Getting the right cookies

One of the great things about WebObjects is that there are a fair amount of methods that can be overridden. Cookie generation is no exception. To begin with, we want our cookies generated with the "/" path so the client browser will send them for all URLs on the domain. This is accomplished by overriding one method in your Session class.

    public String domainForIDCookies()
    {
        return "/";
    }

Because the cookies are now domain wide it's a good idea to unique them to the application. Recall that the original path was /cgi-bin/WebObjects/WOBlog.woa. The only unique part of that URL is WOBlog, the application's name. So when deciding on cookie names this is all we need to know. Add these two methods to your Application class.

    private String _sessionIdKey;
    /*!
     @abstract   Overrides sessionIdKey to return one including the app name
     */
    public String sessionIdKey()
    {
        if(_sessionIdKey == null)
            _sessionIdKey = String.format("%s-%s", super.sessionIdKey(), name());
        return _sessionIdKey;
    }
    
    private String _instanceIdKey;
    /*!
     @abstract   Overrides instanceIdKey to return one including the app name
     */
    public String instanceIdKey()
    {
        if(_instanceIdKey == null)
            _instanceIdKey = String.format("%s-%s", super.instanceIdKey(), name());
        return _instanceIdKey;
    }

That's it. You now have WebObjects generating unique cookies that won't conflict with any other WebObjects application on the same domain. There is one little nitpick though: other WebObjects applications (and in fact any dynamic content on the site including PHP scripts and the like) will be sent these cookies by the browser. If you are going to all the trouble of rewriting URLs it's likely you control the entire domain so this is not a problem.

Conclusion

Basically, this is it. All you need to do is provide an implementation for performActionNamed() and you'll be generating your own dynamic pages in no time flat. For this blog there is actually a fairly involved scheme where the code pulls a date and name out of the URL and turns it in to an EOQualifier that can be used for a database query (although I could potentially have used a simpler matching dictionary).

Perhaps more interesting though is a more recent feature. All of the existing .html pages that have been on the site for a few years are now wrapped with the standard navigation. Since I can now process URLs with any number of path components I can, just like a webserver, turn these in to file paths and load my existing .html files from disk.

In the future I can create virtual content stored in a database. Or perhaps I could store content in an SVN repository where I can keep revisions of it all managed from directly within WebObjects. The possibilities are endless.

Posted Sunday, November 01, 2009 06:59:12 PM by dfe

The Duke of URL, Part 1 explains how webservers traditionally handle URLs. Now that we know this, what exactly does WebObjects do so differently?

The WebObjects Adaptor

You may have noticed that almost all webobjects URLs start with /cgi-bin/WebObjects/. The reason for this is that traditionally there would be an executable named WebObjects in the cgi-bin directory. This executable would then look to further parts of the URL to decide how to dispatch it. It's important to note that when Apache decides to run the WebObjects program it is unknown which WebObjects application supports the request. The WebObjects program itself (known as the adaptor) is compiled C code which loads quickly, having only the CGI fork/exec overhead and not the interpreter overhead. Because it is impossible to maintain state within a CGI program WebObjects instead consults a long-lived wotaskd process for all it needs to know.

Having said that, most WebObjects installations do not actually use the CGI WebObjects adaptor. Instead an Apache module mod_WebObjects is loaded into the server. Ultimately, the same exact code is executed and you can verify this because the source for the WebObjects CGI adaptor and the various web server adaptors (Apache, IIS, and others) is distributed with WebObjects. All of them share the same underlying code that consults wotaskd, the main difference being how that code is reached. For mod_WebObjects an Apache configuration directive is used to tell mod_WebObjects to trap all URLs beginning with /cgi-bin/WebObjects or actually /<anything>/WebObjects. For the CGI WebObjects adaptor the URL is trapped by virtue of the CGI adaptor being a file named WebObjects in the cgi-bin (or other) directory.

The common adaptor dispatch code starts by examining the path component after /cgi-bin/WebObjects which should look like SomeApplication.woa such that the beginning of the URL is /cgi-bin/WebObjects/SomeApplication.woa. At this point the WebObjects program knows that it needs to look for a running application named SomeApplication. Before it does this it checks to see if the next component of the URL is an integer. If it is an integer then it uses it as an application instance number. If it is anything other than an integer then the WebObjects request dispatcher ignores it.

An application instance is one running copy of your Java code uniquely identified by its application name and instance number. The application instance is a separate process from the webserver and may be running on either the same machine or a different machine. Before any requests can be handed for a given application name at least one application instance process must have been started and have checked in with the wotaskd service. To ensure requests don't get stuck too long waiting for an instance the wotaskd service periodically checks all attached applications by sending them a "heartbeat" request. This way if a particular instance has hung the wotaskd can simply use another available instance.

As mentioned, the WebObjects adaptor also speaks to wotaskd on port 1085 and uses it for its configuration state. To determine which application process must be contacted to service the request it sends the application name and instance number to wotaskd. The wotaskd responds with the hostname and port number of the application. Typically the first instance of the first application configured on a machine will be running on TCP port 2001.

Notice that the WebObjects adaptor only looks as far as the application name and instance number, if any. Everything thereafter is ignored so the decision as to what to do with it rests entirely in the application instance that is contacted to service the request.

What that means is that unlike PHP or ASP.NET which only see requests for their .php or .aspx files, a WebObjects application sees all requests beginning with /cgi-bin/WebObects/SomeApplication.woa.

WebObject Application Request Dispatch

As we discussed in Part 1 regarding IIS 7's integrated pipeline, the ability to handle arbitrary URLs is only as good as what you can do with them. In the case of WebObjects the answer is basically everything; in particular any arbitrary component may be instantiated and used to generate the response. The reason for this is that the underlying pages (embodied by .wo bundles and their associated Java classes) are completely decoupled from the request URLs. It is entirely up to the application's dispatch code to decide which page (i.e. .wo component) is to be used to generate output for the user.

As with ASP.NET the first code providing an opportunity to override the request handling is in your Application singleton. The main interesting method is WOApplication.dispatchRequest(). This function takes a WORequest and returns a WOResponse. Although it is possible to override this method I wouldn't recommend it because WOApplication provides an alternate method for installing custom URL handlers.

The registerRequestHandler() method takes an instance of a WORequestHandler and a String and registers the request handler by name with the application dispatcher. The WebObjects framework by default registers a few built-in request handlers and only a fairly advanced WebObjects coder would ever register additional request handlers.

The two most common user-visible request handlers are an instance of WODirectActionRequestHandler registered as "wa" and an instance of WOComponentRequestHandler registered as "wo". So any WORequestHandler beginning with /cgi-bin/WebObjects/SomeApplication.woa/1/wa/ is sent to the WODirectActionRequestHandler instance by calling its handleRequest() method. The format of this is exactly the same as WOApplication.dispatchRequest() which is accept a WORequest, return a WOResponse.

Direct Actions

If you are familiar with ASP.NET MVC then you will see a lot of parallels between MVC and DA. The primary difference is that MVC requires you to register several pieces of information with the MVC dispatcher: a regex pattern to match the URL, a controller class, a method name to call on that controller class, and a .aspx page to use as its view.

On the contrary direct actions are not registered. A direct action class derived from WODirectAction is looked up via reflection and instantiated with a WORequest. Then its performActionNamed() method is called with the action name and WODirectAction implements this to find and call a method of the form <actionname>Action taking no arguments.

The direct action request handler takes URLs of two forms: .../wa/actionname and .../wa/class/actionname. The first form looks for a class named DirectAction which is your class (not a framework class). A new WebObjects application is created by the development environment with a skeleton DirectAction class implementing defaultAction() to return pageWithName("Main"). The second URL form looks for the named class. Because most people like to see all lowercase URLs but capitalized class names the request handler capitalizes the first letter such that wa/foo/bar looks for a class named Foo not a class named foo. Method names in Java are conventionally non-capitalized camelCase (as opposed to class names which are usually capitalized CamelCase), so no recapitalization is necessary to find barAction() given "bar".

The *Action() method returns any object implementing WOActionResults. The WOActionResults interface defines one single method: generateResponse() returning a WOResponse. The interesting thing is that WOResponse itself implements WOActionResults.generateResponse() to simply return itself. And WOComponent implements it to create a new WOResponse object and call appendToResponse() on itself. What this means is that in a WOComponent the response object is not actually available until it is time to generate the response.

In the case of actions (which we're talking about here) the request flow on the page is actually a bit short circuited. Recall that the request came in to the WODirectAction-derived class and not directly to a page. This means it is up to the action handling code to deal with any GET or POST values and other aspects of the request.

If the action method is going to be using a WOComponent to generate the response it will call its own pageWithName() method (defined in the WOAction class which is a base class of WODirectAction). This method will ask the application to create a new instance of the WOComponent-derived Java class with the given name. The code might look like this:

WOComponent nextPage = pageWithName("DisplayBlogEntriesPage");

From your action code you might cast the returned value to the appropriate Java class (i.e. DisplayBlogPostsPage is what you are looking at now) and call methods like setBlogEntriesDataSource() or setBlogEntriesQualifier() on it. Alternatively you might take advantage of key-value coding, leave it as a WOComponent and instead call takeValueForKey(..., "blogEntriesDataSource") on it. Finally you return the page which will cause it to be generated and returned to the user. In code that would look something like this:

DisplayBlogEntriesPage nextPage = (DisplayBlogEntriesPage)pageWithName("DisplayBlogEntriesPage");
String requestedYear = context().request().stringFormValueForKey("year");
nextPage.setBlogEntriesQualifier(BlogEntry.qualifierToMatchYear(Integer.valueOf(requestedYear)));
return nextPage;

Instead of returning the page and letting the base handler call generateResponse() on it to get the WORespone object you might instead wish to return nextPage.generateResponse(). In this case you will be returning the actual WOResponse object which can have advantages. The main advantage is that if your page throws while generating its content you have an opportunity to catch it at this point and return some completely custom response. That said, if you simply return the component itself as the action result the underlying code will call generateResponse() on it taking care to also handle the exception and returning a generic exception page.

Recall that I said page flow is short-circuited from full page flow. Ordinary page flow is takeValuesFromRequest(), invokeAction(), appendToResponse(). But since this is a direct action there is no existing page postback data to be restored in takeValuesFromRequest() nor any decision to be made as to which page is next in invokeAction(). So processing begins and ends with appendToResponse().

So now you are probably wondering how one would actually implement a web form if the page doesn't get a chance to see the request. The answer is that you don't. Accepting the form values from the user is something that occurs in takeValuesFromRequest() which is only valid in the context (that's the WOContext) that created the form. To handle form postbacks, WebObjects uses the content request handler under the "wo" request handler key.

The Component Request Handler

The component request handler serves mainly to handle postback data. Regardless of which WOForm on which WOComponent is causing the postback the form submission URL will be for the WOComponentRequestHandler. Now some people find this offensive because you wind up with URLs that look like /cgi-bin/WebObjects/WOBlog.woa/1/wo/1mtIbBwW99kJt9QyXGccqg/6.PageWrapper.3.3.2.3.2.4.0.3.1.1. Wow, that's user friendly... not!

In actual practice once a user begins to fill out and submit a form it very rarely matters which URL appears after the form is submitted. That said, it's certainly possible, after having processed the postback data, to send the client a redirect to another page.

If you were implementing edit of a customer record from a direct action you might begin this by taking the user to .../wa/customers/edit?customerid=123. As we know from the direct actions section this causes a Customers object derived from WODirectAction to be instantiated with the request and its editAction() to be invoked. The editAction() method might then do this:

WOComponent nextPage = pageWithName("EditCustomerPage");
Customer customer = Customer.fetchCustomerByID(context().request().stringFormValueForKey("customerid");
nextPage.takeValueForKey(customer, "customer");
return nextPage;

Because the request URL was a direct action the user will see the more friendly direct action URL but the EditCustomerPage form. When he clicks a button (say a Save button) the form is submitted to one of the long component request handler URLs. From there a method of EditCustomerPage named perhaps saveAction() will save the changes to the database and return a redirect response to redirect the user back to .../wa/customers/list. This way the user never sees the component URL. If that's what you want of course.

The saveAction() does not have to return a redirect response. If saving failed it will likely return null which tells WebObjects to redisplay the page. Generally one would do something like set an errorString instance variable to some error text and have a WOString on the page which outputs this text to the user if it is present (probably inside of a WOConditional so nothing is output if errorString is null). In this case the user will see the long component URL because he is not being redirected back to the edit direct action but instead the new page content (including the error string) is directly returned.

If these URLs are undesirable to you there is actually a simple way around this for modern browsers. Instead of generating a full postback you can use an AJAX update panel. In this case the page is updated by the browser in situ so the URL doesn't change. The underlying AJAX URL will be a gnarly component-style URL (though as of WebObjects 5.4 it goes through a separate AJAX request handler).

From a coding perspective the code is the same. The AJAX handler invokes takeValuesFromRequest, then invokeAction, and the *Action() method returns null to indicate the page should be redisplayed. But instead of regenerating the entire page the code instead calls appendToResponse on only the portion of the element tree under the panel that is being updated. If something other than null is returned then the client will be sent instructions to instead retrieve an entirely new page.

The interesting thing about this is that if the browser lacks AJAX it still works. The only difference is that the button click returns an entirely new page with the special component URL. All page state is fully maintained because the component handler is able to restore the exact context using the URL to tell it which context to restore.

If you are used to ASP.NET you will notice how we've been talking about the potential for "returning" a redirect as opposed to "performing" a redirect. This is because at invokeAction() time the response object has not been instantiated. If we don't mind the user seeing a component action URL we can actually instantiate and return an entirely different page! For example, we might do something like this:

ListCustomersPage listPage = pageWithName("ListCustomersPage");
listPage.ensureCustomerIsVisible(this.customer);
return listPage;

In this fashion if the list page has a pager and the customer we just finished editing is on say page 250 of 300 we can pass the actual customer object to the list page and it can do the work of figuring out which page needs to be displayed for that customer to appear in the list. It might also highlight the row for that particular customer so the user can clearly see which customer he just finished editing.

At this point you begin to realize the inherent freedom that the component request handler provides you. At the expense of some pretty gnarly URLs you are afforded the opportunity to drag the user anywhere you wish. The URL becomes simply a vehicle for maintaining state across each request. Best of all, a certain number of URLs (typically 30) are remembered internally by the request handler. This means that back/forward by the client actually work and despite the fact that the client may repost the same data the server will actually be intelligent enough to return the response from its cache. So if the user decides to back track after placing his order, the order will not be submitted again. Ditto for if the user is dumb and likes to double-click submit buttons. The first click posts all the form data and begins generating the response. But before the response can be sent to the client the client posts the form a second time. No problem! The second request realizes that a response has already been generated or is waiting to finish generating. This response is cached so it is simply sent to the user without reinvoking any of your action code.

For an application based on user actions, particularly the typical CRUD (Create, Read, Update, Delete) the user's navigation expectations can be easily met because it is possible to directly instantiate new pages and even inform them of exactly where to return to when finished. And I don't mean passing a value via a query string or a session variable. I mean actually passing the current page instance to the next page instance so the user can be returned to exactly the page he came from instead of to whatever page the developer thought would probably be best to take the user to when he's finished editing.

A very simple example is an admin tool for a web-based store. The clerk may list customers then pick a customer to edit. When she is finished she would expect to go back to the list of customers and be looking at the same page. But she might instead be viewing a particular order and need to change some customer information. She clicks the edit link on the customer, edits the data, saves the customer and goes back to which page? The sensible thing is to return her to the order page. Most other environments cannot easily record this information so in a good majority of web apps she's taken back to a list of customers because that's where the developer assumed it would probably be best to take her when she's done editing a customer.

Component actions are so powerful that it's even possible for a CRUD site to be generated at runtime from a database model. I don't mean going in and running some wizard to generate editCustomer.aspx, listCustomer.aspx, editOrder.aspx, listOrder.apsx. I mean having one EditPage.wo that consults configuration data to determine which attributes to display and which components (e.g. text box, check box, radio buttons, pop-up lists) to display for each attribute.

The ability to do this is already a part of WebObjects and is known as Direct To Web (D2W). Let's face it, at some point every app needs a CRUD section, if only to be used as an admin interface. Would you rather write a bunch of list and edit pages to enter your blog entries or would you rather model your data once and let D2W generate your admin interface for you?

What About Content?

Component actions are all well and good for web-based user interfaces. In my opinion they absolutely exceed the abilities of every other web development environment out there. But for serving dynamic content component actions are simply inappropriate.

We did cover direct actions and these can be used to serve dynamic content. For instance you might make a direct action for every user-visible page on your site. Perhaps you'd have .../wa/sections/products, .../wa/sections/aboutus, .../wa/sections/services or whatever.

It's workable and because they are direct actions they are regular URLs (albeit rather long ones). With URL rewriting in the webserver you could perhaps shorten these to /products, /services, /aboutus. But in the end you wind up having to create a productsAction, servicesAction, and aboutUsAction in a Sections WODirectAction class.

In Part 3 we'll explore an enhancement to WO that will make you completely rethink the way you serve a content-managed site.

Posted Sunday, November 01, 2009 06:59:05 PM by dfe

The other week I decided to write some blogging software and came up with WOBlog. WOBlog is rather different from most WebObjects apps as it actually has "friendly" URLs. And I don't mean friendly in the sense that it is using the .../wa/foo/bar/ direct actions but friendly in the sense that if I didn't tell you and you didn't look at the HTTP headers you'd probably never know that I'm using WebObjects.

WebObjects is vastly different from standard web server application software. In a typical IIS/ASP.NET, Apache/PHP, or Apache/PERL configuration the webserver software is responsible for resolving the URLs to filesystem paths. Before we detail how WebObjects handles URLs we should first cover how an ordinary webserver does it.

Static Content in a Document Root

The simplest web server program is one that resolves an incoming URL to a file path on disk then sends the client the exact contents of that file. So if the client requests http://example.com/foobar.html the server looks for /var/www/html/foobar.html and returns its contents. There are still specialty webservers that do only this but the vast majority of them have several facilities for causing other code on the server to be executed in response to an HTTP request.

CGI in a Document Root

The easiest to understand example of dynamic content is a CGI script. The webserver sees http://example.com/foobar.cgi and as before resolves this to a file on disk /var/www/html/foobar.cgi. Instead of serving the file as-is the webserver software is configured to execute any files with the extension .cgi by simply running them as programs. On a UNIX system the extensions don't matter to the OS so the foobar.cgi program can be anything. It might be compiled C or C++ or even FORTRAN (I'm sure that someone has done this, if only as a joke). On many early websites, executing CGI was nearly synonymous with executing a PERL script.

On many webservers you could include additional components in the URL such as http://example.com/foobar.cgi/baz. Because a file on the filesystem cannot contain subfiles there is no ambiguity; the "baz" portion of that URL cannot refer to anything on the filesystem. With this scheme one could have a whole bunch of purely dynamic content all served behind the foobar.cgi program. That said, having periods in directory names usually looks odd to most people. Also having the webserver configured to directly execute any content with the extension .cgi and the executable flag set is seen by some administrators as a security risk because anyone with write access to the web site's document root could put any arbitrary program in the directory and cause it to be executed in the context of the server.

CGI in its own directory

In response to these problems one alternate solution is to treat the web's document root (located at say /var/www/html/ on the filesystem) as containing only pure content. Then an additional directory such as /var/www/cgi-bin/ would be added to the webserver's configuration such that any URL's beginning with /cgi-bin/ would go to that directory instead. This has the effect then of hiding /var/www/html/cgi-bin/ since everything is redirected to the other folder. In this scheme nothing in /var/www/html/ will ever be executed by the webserver. Thus the administrator can safely give any number of users access to change the static content portions of the site but only give trusted administrators rights to put new cgi-bin programs on the site.

As an added bonus, every file in cgi-bin that is executable by the system is considered executable by the webserver. So we can now have /cgi-bin/foobar instead of /foobar.cgi. Also, as above with http://example.com/foobar.cgi/baz we can have http://example.com/cgi-bin/foobar/baz and the /cgi-bin/foobar file will still be executed. This scheme offers a clue as to why WebObjects uses /cgi-bin/WebObjects/ at the start of its URLs.

Web Server Modules

At some point the fork/exec of a new process needed to run the script becomes a fair amount of overhead for the server so CGI has been mostly replaced by in-process modules. The most common of these for Apache is mod_php although there are many more. In this case the Apache module contains the interpreter code in an initialized state. The Apache server software is instructed to send any requests for a .php file to the built-in mod_php handler. The Apache software is still responsible for resolving the request URL to an on-disk file and passing the request URL, other server information, and the filename to the (now built in) interpreter.

In addition, because the mod_php (or other) interpreter stays loaded into the webserver process it becomes possible to maintain application state in process memory. With traditional CGI one would have to write the state to disk, database, or some other form of persistent storage. Depending on module the persistence of state can be confined per site or even to particular directories within a site. So http://example.com/*.php and http://example.com/foo/*.php might share in-process data but http://example.com/bar/*.php might share a different set of state.

Windows with IIS and ASP.NET has a similar setup. The web server software is configured to use an ASP.NET application to handle any files (note: files, not URLs) with an aspx extension. The administrator then configures IIS such that all .aspx files for the http://example.com/ site are to be run by the ASP.NET application contained in that site's document root. One can also configure a subapplication (sometimes known as subweb) such that http://example.com/bar/ goes to a different folder with a different ASP.NET application.

In any case, URLs which aren't one of the registered extensions (.aspx, .ashx, and others) do not get sent to ASP.NET. So all other content on the site like .jpg, .png, .js, and .css files is served directly by IIS. Also, IIS takes care to deny requests for any .aspx.vb or .aspx.cs files, the web.config file, and a few other files. Often times the web.config file and/or the code will contain passwords for database servers so this is generally a good thing.

It should be noted that Global.asax in the root directory of the configured application can contain an override method to alter the processing of the request. But for the request to make it there in the first place IIS must have already known to send it to the ASP.NET application which generally means there must be an .aspx file on disk in the appropriate place.

Decoupling the URL from the implementation file

In recent years it is considered somewhat uncouth to expose the implementation details of your site to the user in the form of direct file paths to the .aspx or .php files. It is much nicer for the user if he can see http://example.com/people/johndoe instead of http://example.com/LookupPerson.aspx?name=johndoe.

Enter URL rewriting. Rewriting is the process of turning something like http://example.com/tags/foobar into http://example.com/showtag.aspx?tagname=foobar. URL Rewriting generally occurs as a function of the webserver code. For Apache this is mod_rewrite which uses rules in .htaccess files or in certain places in the httpd.conf file. There are also modules for IIS which, like mod_rewrite, look for regular expressions in web.config and rewrite the URL to something that can be resolved to a file on disk.

The main drawback, in this author's opinion, is that you still must have some .aspx file or .php file on disk. The rewriting occurs inside of IIS or Apache and ultimately results in it doing the same old dance of loading and executing a file or telling the in-process handler to process the file. There is no opportunity to execute other code before the request reaches the page because for the request to reach code it must first reach a page.

URL handlers within the application

Things are different with IIS 7 and its "integrated pipeline" where the ASP.NET code gets a shot at the URL early. But the integrated pipeline is only a first step towards decoupling URLs from the filesystem. The MVC feature actually begins to allow ASP.NET to load an arbitrary page based on an arbitrary URL. But like URL rewriting you have to register which particular URLs (by regex) will show which particular pages. Unlike URL rewriting you do get an opportunity to run some code and decide to return something else besides the page. But you cannot decide to return some arbitrarily different page.

In part 2 we'll explore how WebObjects handles URLs.

Posted Saturday, October 17, 2009 09:15:19 PM by dfe

Those of you looking over at the bootloader page have probably been rather disappointed to see no new postings for over a year. In reality I've made several nice changes and I need to do several posts so people are aware of them. For now we're going to start with PXE.

For those unaware, PXE is a specification for booting PCs over the network. The proper name is Preboot eXecution Environment but it's generally referred to by the pronunciation of its acronym: pixie.

In its most basic form, PXE operates as follows:

  1. Computer is turned on
  2. PXE boot is selected, usually by pressing F12 (standard key). On some BIOS, F12 selects PXE boot. On most newer BIOS, F12 brings up a menu and one of the choices will be something like "PXE" or "Network Boot Agent" or whatever.
  3. PXE BIOS code stored in ROM initializes the network hardware and brings up link
  4. A DHCP request is issued
  5. A DHCP response is received including the address from which to download a boot file and the TFTP path to that file
  6. PXE ROM code downloads the file in to RAM at 0:7C00h.
  7. PXE ROM code jumps to 0:7C00h.

A few notes: 0:7C00h is linear 0x7C00 which is the 31k mark in the first segment and the standard boot address. According to the spec, the ROM should be able to load linearly all the way up to the top of "real mode" RAM (640k mark). In practice many ROMs balk at anything that would exceed the end of the first 64k segment. That is 33k and the spec recommends limiting boot file size to 32k for other reasons. Also, because it is a jump to 0:7C00 you cannot execute a return. What you can do is invoke software INT 18h which sort of serves like an abort() call.

The Darwin implementation follows the specification pretty well. The boot1pxe binary (an NBP, Network Boot Program) is downloaded and loaded by the PXE ROM. Similar to the boot1u program for UFS filesystem booting, the boot1pxe program is a 32-bit program using portions of the same support libraries that boot2, the real booter, uses. The NBP uses the boot server discover reply packet to retrieve its own filename. From there it knocks off the last path component (i.e. the thing after the last "/") and replaces it with "boot". That is the name of the real boot program and is the same exact "boot" binary used as the HFS+ startup file and in other places. The boot1pxe NBP loads it to 2000:0200h, cleans itself up, and jumps to the booter.

Once the real booter has started it is able to detect that it was loaded via PXE. The boot process is virtually identical to the process on a hard drive from a UI standpoint. That is, there will be an entry in the device list representing the PXE server and a prompt where you can add all the usual options. In addition, all of the ordinary UI applies so if you want to you can at this point opt to instead directly boot an OS X volume on the hard drive or even chain to another OS partition. This is particularly useful if you've hosed the boot code on your hard disk and simply want to boot to it.

That said there are some special considerations for PXE boot. Because TFTP does not provide any facility for enumerating directory contents, the booter must know in advance which files to load. Those files are com.apple.lre.Boot.plist or com.apple.Boot.plist, mach.macosx, and mach.macosx.mkext. The booter uses the same trick that boot1pxe used to find it. That is, it uses the pathname from the boot server's discover reply packet as the base for all filenames.

On the TFTP server you will want to create a directory containing all of the boot files. One very simple option is to simply use Mac OS X Server and the network boot tools from the Server Administration Tools package to set up the boot. In general it likes to set things up in a directory like {unique-name}.nbi/i386/. So for example, 10a432.nbi/i386/mach.macosx will be the kernel (normally named mach_kernel).

The list of files you need is as follows:

  • boot1pxe: The NBP that will be loaded by the PXE ROM
  • boot: The normal booter program loaded by boot1pxe
  • com.apple.lre.Boot.plist: Booter configuration file
  • mach.macosx: The xnu kernel.
  • mach.macosx.mkext: An Extensions.mkext archive for this kernel
  • booter: Not needed, but if you use the "System Image Utility" you will find that this is just boot.efi

On the DHCP server you simply need to set next-server to the correct TFTP server (strongly recommended this be the same as the DHCP server) and set the boot filename to /path/to/whatever.nbi/i386/boot1pxe. You may also want to set root-path option in DHCP although you can instead specify it using rp= on the command-line (including in the boot plist of course).

As an alternative to setting these DHCP options on your real DHCP server you can instead set up a separate boot server. One such boot server is Microsoft's Windows Deployment Services (WDS) or the older Remote Installation Services (RIS). I have not figured out a way to add non-Windows choices to the WDS native menu but you can add non-Windows choices in WDS Legacy which is just Microsoft's new name for RIS. Setting up a choice in RIS will have to be the subject of another blog entry.

When the booter boots from PXE it sets rd=en in the kernel boot args. This tells the kernel that the root device is ethernet. The kernel then looks for the root path in either the DHCP options or from the rp= option. Assuming you have the IOHDIXController.kext you can use an HTTP server so you would have e.g. rp=http://192.0.2.1/path/to/whatever.dmg on the command line in the boot plist or the http URL in the root-path DHCP option. If you don't have IOHDIXController then it may be possible to specify an NFS path to a disk image. A glance at the kernel source seems to indicate that it has some facility for mounting flat disk images (using BSD vndevice) as the root.

There is one little gotcha about the root-path option and this is actually true of booting a Macintosh using Apple's special Boot Service Discovery Protocol. That is that the firmware does not request the root path option. So it is incumbent upon the DHCP server to send the value even though the client did not request it. ISC DHCP has a way of telling it to send additional unrequested options. It is, however, probably easier to just set rp= in the boot plist.

The last piece of the puzzle is building mach.macosx.mkext. You cannot just use one copied from your hard drive as it is unlikely to contain the network drivers. Interestingly though, the copy on the Snow Leopard DVD does. So if you don't need any additional extensions then you can use it outright. In practice the only machines that don't need extra extensions are genuine Macs. So if you were using something like a gPXE CD to boot a real Mac over the network you could do this. But of course you're then booting OS X via the BootCamp legacy BIOS layer which breaks a few things.

To build the mach.macosx.mkext you just use the ordinary kextcache tool. The general form is as follows: kextcache -m mach.macosx.kext [-n] [-l] /path/to/System/Library/Extensions ... /path/to/SomeAdditional.kext .... You do not need to and should not run kextcache as root because we're using it in a mode where it outputs an mkext file instead of updating the running system's mkext. The options break down as follows:

  • -m {output-file}: Specifies that an mkext is to be output to the specified output file
  • -n: Specifies that only extensions required to mount the root filesystem over the network are to be included
  • -l: Specifies that only extensions required to mount the root filesystem from a local disk are to be included.
  • /path/to/Extensions: One or more directories containing .kext bundles can be specified. If provided, the -n and -l filters apply to the kexts found.
  • /path/to/SomeAdditional.kext: One or more additional extensions can be included in the archive. The -n and -l options do not apply and these extensions will always be included

With neither -n or -l no filtering takes place so all extensions are included resulting in a rather hefty mkext. With -n and -l you will get almost all the kexts you need. If -n and -l are failing to include some kext you know you need then you can specify it directly on the command-line to override this. These options look at the OSBundleRequired key in the Info.plist of each kernel extension. If its value is Root then either one will include it. If it's Local-Root then -l will include it and if it's Network-Root then -n will include it.

Once the booter has loaded the kernel and mkext archive it jumps to the kernel as per usual. At this point it's all about having enough driver support to bring up the network and mount the root disk image. The disk image itself is simply an ordinary OS X image. It is quite possible to use the normal DVD image as the root image. If you have a paid ADC account, the dmg files provided for download by Apple work out of the box.

Of course you aren't limited to installer images. The System Image Utility can be used to create a directly bootable image or you can even build one by hand. You are, however, limited to a read-only root FS similar to booting from a CD or DVD. Of course you can always mount some ramdisks for temporary files and NFS or AFP or SMB shares for permanent files.

The great thing about network booting is that it is extremely fast. With a basic fileserver it's on par with hard drive boot. If you have a fast server with fast disks, RAID, and a decent disk cache it can actually be quite a bit faster. In all cases, installing OS X off the network is certainly faster than off of a DVD. Generally you should see the GUI in about a minute as opposed to say five minutes when booting off a DVD.

There are of course a myriad of uses for network boot support. The most obvious one is to quickly install OS X. Beyond that, the network boot support has made developing the booter far easier. Prior to this testing the code involved building a vmdk or iso image, shutting down the VM, and rebooting with the new disk or CD image. With the network boot support I can set the VM to boot off the network, copy new binaries to the boot server, and simply reboot the VM. A similar situation applies for kernel or kernel extension development. With a quick little shell script I can have the code built and deployed to the boot server and I can quickly reset the target machine and start the new code running immediately.

There are other useful tricks too. Say I want to test El Torito support on some machine. The only way to have the El Torito stack correctly installed is to have booted from a CD. So I go to ROM-o-matic.net download a gPXE ISO and burn that to a CD. Now I can change the bootloader code simply by dropping a new binary on the network but because boot occurred from CD one of the BIOS devices will be the CD.

In short, if you can take a minute or two to set up a boot server you can make your life a lot easier when bringing up Darwin or OS X on a new machine.

Posted Monday, October 12, 2009 10:46:28 PM by dfe

Someone mailed me the other day about the Darwin bootloader for x86 bootsectors. When trying to boot the HFS+ partition bootsector (boot1h) from another bootmanager he was getting an "HFS+ partition error".

One thing the Darwin bootsectors do is use the SI x86 processor register as a pointer to the MBR entry that was booted. This is something that I inherited from Apple. Both boot1u0.s and boot1.s (a.k.a. boot1h) depend on SI and the boot0 (MBR) code makes sure to set it. As it turns out, so does the standard MBR from Microsoft, all the way back to the very original one included with DOS 2. There are some good write-ups of the DOS 2.00 MBR and the DOS 3.30 through 7.0 MBR at "The Starman's Realm."

Without having a pointer to the specific MBR entry that was booted it is impossible to know which sector the MBR code loaded and thus impossible to know the start of the partition. Sure, you can work around it. For instance you could, from the partition bootsector, reread the MBR looking for an entry matching some particular criteria. Or you could stash the partition offset in a data section of the code. The first has the problem that there is likely not to be anywhere near enough space (only 512 bytes!) to reinterpret the MBR and still manage to load something off the partition.

The second method has the problem that it causes you to have duplicate data. It is a given that the MBR must contain the LBA offset of the start of the partition so it knows which sector to load the partition boot code from. Storing it in the partition bootsector means you now have a second copy of this information. If you try to dd the partition from one disk to another or use a partitioning tool to slide the partition to a different area of the disk the boot code will have to be updated.

Yet it seems that Microsoft's own bootsectors do exactly this! Inside the boot code there is a data area known as the BIOS Parameter Block (BPB). One of the fields contains the number of "hidden sectors". This is Microsoft's term for the LBA address of the bootsector relative to the disk. So on a floppy this field is 0 but on a hard disk it is exactly what is stored in the MBR.

This of course leads to a few questions. The first one is, why doesn't the partition boot code use SI? A simple answer to that question is that when booting from a floppy SI is unlikely to contain a useful value since control was transferred directly from the BIOS. If it were blindly assumed to point to an MBR entry then the boot code would not be able to boot from a floppy. That would then require two different boot sectors. One for floppies, and one for hard disk partitions. Alternatively a runtime check could be performed to determine if the disk is a floppy or a hard disk. One possible way of doing this is to check the high-bit of the BIOS drive number. If set, it's a hard drive so presumably SI is reasonably trustworthy.

The second question is much more interesting: Knowing that the DOS bootsector doesn't use the SI register, why bother ensuring it is set in the MBR code? The code actually goes to fairly great lengths (for a bootsector) to stash SI to BP before it trashes it doing a disk read call and then to restore SI from BP just prior to jumping to the newly loaded bootsector.

As mentioned, I researched this because someone e-mailed me wondering why GAG (a replacement bootmanger) is unable to load Darwin. As it turns out, GAG stores CHS and LBA of partitions internally in its own table. It does not attempt to fake an MBR entry in memory.

For reference, when boot0 boots from an extended partition (one of the tricks it can do that Microsoft's MBR can't) it takes care to correct the LBA field of the partition entry and leave SI pointing to it upon entry to the newly loaded boot code. In addition, the GPT bootsector I wrote (gpt0) takes care to fake an MBR entry from the GPT information.

Does anyone know of any bootsectors aside from the Darwin ones that do make use of SI? I tried doing some searching and came up empty handed. GRUB and LILO have bootsectors that ignore partitioning entirely and just load a certain number of sectors starting from a certain LBA offset. Microsoft's bootsectors all stash the offset of the partition in the BPB. FreeBSD's bootsectors scour the disk looking for slices.

In short, the Darwin bootsectors (boot1h, boot1u, and my own boot1fat32) appear to be the only ones in fairly widespread use that care about SI, yet the fact that you can use SI has certainly entered the folklore.

That of course leads to a third question. If SI isn't used by most bootsectors then why does any bootloader bother to set it? GRUB, in particular, does set it when chaining to anything. Except it unfortunately gets it wrong for extended partitions and leaves it pointing at the raw extended partition table entry instead of adding in the LBA offset of the first extended partition table. It was actually this very bug in GRUB that inspired me to write the multiboot support into the booter proper (boot2) so that I could load it directly instead of via the boot1 bootsector or chain0 boot program to boot1 bootsector.

For those of you writing your own boot manager code there is a little gotcha to be aware of. The Darwin bootsector code zeroes DS and ES very early in startup so that when it accesses SI (which is implicitly in DS) it assumes SI is relative to segment 0. So your best bet is to ensure that you set DS to 0 before jumping to the partition bootsector and also ensure that SI is pointing to the entry in segment 0, not some other segment.

Perhaps someone who was at Microsoft or IBM at the time remembers. Of course it might be as simple as the MBR code was written by David Litton over at IBM and the partition boot code already existed as it was used to boot from floppies. The MBR author may very well have added the setting of SI thinking somebody might need it then no one at Microsoft ever decided to make any use of it. It's sort of a shame too because to this day it means you must update the hidden sectors field of the BPB (even on NTFS!) or your partition will fail to boot if you had moved it around.

If anyone has some information about how this scheme came about I'd sure like to know. Send me an e-mail.

Posted Monday, October 12, 2009 01:42:43 AM by dfe

Well, it's been a long time coming but I finally got a blog up. For whatever reason I decided to write a blogging system from scratch.

You may not notice from the "friendly" URLs but this blog is powered by WebObjects. Eventually I hope to throw most of the rest of the site content in here and move it in to the 21st century.

Why WebObjects? Because I needed to eat some of my own dogfood. Of course, the development environment isn't perfect and in the interest of saving time I did use Eclipse+WOLips to set up the build system. But after trying to get used to Eclipse's Java editor I finally gave up and created an Xcode project that runs ant to build the project. Once I added the java source files and .wo components into the Xcode project I even got WebObjects Builder working.

In Mike's defense, the WOLips Component Editor (used to edit .wo) is arguably the best part of WOLips. Where I had things that were broken in WebObjects Builder (e.g. display groups cannot be edited) it worked well. But for general editing of the components and setting bindings WebObjects Builder was the easiest.

I did use Entity Modeler to do the model editing but that's going to change soon. Like the rest of Eclipse it has a habit of warning you about things that are not problems. A quick example: one trick I have used on several occasions is to add multiple attributes to an entity with the same DB column name. Now, why would I do this? Well, so I could put a read format on one of them and get the DB to do some special queries for me.

For instance, this very entry has a publishedTimestamp attribute which is stored as a timestamptz in PostgreSQL. But to look it up by date the SQL I want to use is WHERE published_timestamp::date = ?. That is, I want the DB to cast the timestamptz to a date for me, and then compare that to the date that is passed in. In PostgreSQL this causes the time portion to be ignored. So to accomplish this I added a publishedDate attribute which is not a class property (thus, doesn't get fetched) and not used for locking (thus, doesn't get used in update statement where clauses) and has a read format of %P::date. In short, the attribute does absolutely nothing at all except for the case when you use "publishedDate" in an EOQualifier. In that case it causes EOF to generate exactly the SQL I'm looking for with minimum fuss. I can even take this to the next level and add yet another attribute to get the month (i.e. year-month) and another one to get just the year. In the end, it's all stored in the DB as one single timestamptz column.

The trick to this is that if the qualifier ever gets used to filter in-memory EOs then it bombs since the publishedDate attribute isn't available. But.. come on people, this is EOF. So you just add a publishedDate() getter to do the right thing in Java and voila; you now have the ability to filter in memory.

Granted it would be theoretically better if EOF had some facility for doing arbitrary data conversions and a way to cause that to execute a method java-side and generate the text for a function-call SQL side. For example, something like publishedTimestamp.@convertToDate. But for most purposes the built-in EOF facilities can be abused like this to do exactly what you need.

I should point out one caveat to this. In past programs I have always left the column name blank and then put %Pcolumn_name in the read format. The reason for this is just in case EOF does for some strange reason decide to generate an UPDATE statement, it won't have a valid column name (read format cannot apply to a column name for an update) and the update will fail instead of attempting to set the column value to something completely bogus. But in this case Entity Modeler still warns you that there is no column name. I suppose that's good. I mean how else would I know there was no column name other than.. wait for it... looking at the list of attributes. Since the DB column name is one of the things you see in the attribute list an empty one already stands out like a sore thumb.

So, like this blog, getting a comfortable WO development environment is a work in progress. My hope is that by having this blog written in WO it will force me to occasionally get off my duff and start getting something together that works well for me. With any luck some of you other stranded WO developers who keep mailing me can benefit as well.