Blog
Posted Monday, October 12, 2009 10:46:28 PM by dfe

Someone mailed me the other day about the Darwin bootloader for x86 bootsectors. When trying to boot the HFS+ partition bootsector (boot1h) from another bootmanager he was getting an "HFS+ partition error".

One thing the Darwin bootsectors do is use the SI x86 processor register as a pointer to the MBR entry that was booted. This is something that I inherited from Apple. Both boot1u0.s and boot1.s (a.k.a. boot1h) depend on SI and the boot0 (MBR) code makes sure to set it. As it turns out, so does the standard MBR from Microsoft, all the way back to the very original one included with DOS 2. There are some good write-ups of the DOS 2.00 MBR and the DOS 3.30 through 7.0 MBR at "The Starman's Realm."

Without having a pointer to the specific MBR entry that was booted it is impossible to know which sector the MBR code loaded and thus impossible to know the start of the partition. Sure, you can work around it. For instance you could, from the partition bootsector, reread the MBR looking for an entry matching some particular criteria. Or you could stash the partition offset in a data section of the code. The first has the problem that there is likely not to be anywhere near enough space (only 512 bytes!) to reinterpret the MBR and still manage to load something off the partition.

The second method has the problem that it causes you to have duplicate data. It is a given that the MBR must contain the LBA offset of the start of the partition so it knows which sector to load the partition boot code from. Storing it in the partition bootsector means you now have a second copy of this information. If you try to dd the partition from one disk to another or use a partitioning tool to slide the partition to a different area of the disk the boot code will have to be updated.

Yet it seems that Microsoft's own bootsectors do exactly this! Inside the boot code there is a data area known as the BIOS Parameter Block (BPB). One of the fields contains the number of "hidden sectors". This is Microsoft's term for the LBA address of the bootsector relative to the disk. So on a floppy this field is 0 but on a hard disk it is exactly what is stored in the MBR.

This of course leads to a few questions. The first one is, why doesn't the partition boot code use SI? A simple answer to that question is that when booting from a floppy SI is unlikely to contain a useful value since control was transferred directly from the BIOS. If it were blindly assumed to point to an MBR entry then the boot code would not be able to boot from a floppy. That would then require two different boot sectors. One for floppies, and one for hard disk partitions. Alternatively a runtime check could be performed to determine if the disk is a floppy or a hard disk. One possible way of doing this is to check the high-bit of the BIOS drive number. If set, it's a hard drive so presumably SI is reasonably trustworthy.

The second question is much more interesting: Knowing that the DOS bootsector doesn't use the SI register, why bother ensuring it is set in the MBR code? The code actually goes to fairly great lengths (for a bootsector) to stash SI to BP before it trashes it doing a disk read call and then to restore SI from BP just prior to jumping to the newly loaded bootsector.

As mentioned, I researched this because someone e-mailed me wondering why GAG (a replacement bootmanger) is unable to load Darwin. As it turns out, GAG stores CHS and LBA of partitions internally in its own table. It does not attempt to fake an MBR entry in memory.

For reference, when boot0 boots from an extended partition (one of the tricks it can do that Microsoft's MBR can't) it takes care to correct the LBA field of the partition entry and leave SI pointing to it upon entry to the newly loaded boot code. In addition, the GPT bootsector I wrote (gpt0) takes care to fake an MBR entry from the GPT information.

Does anyone know of any bootsectors aside from the Darwin ones that do make use of SI? I tried doing some searching and came up empty handed. GRUB and LILO have bootsectors that ignore partitioning entirely and just load a certain number of sectors starting from a certain LBA offset. Microsoft's bootsectors all stash the offset of the partition in the BPB. FreeBSD's bootsectors scour the disk looking for slices.

In short, the Darwin bootsectors (boot1h, boot1u, and my own boot1fat32) appear to be the only ones in fairly widespread use that care about SI, yet the fact that you can use SI has certainly entered the folklore.

That of course leads to a third question. If SI isn't used by most bootsectors then why does any bootloader bother to set it? GRUB, in particular, does set it when chaining to anything. Except it unfortunately gets it wrong for extended partitions and leaves it pointing at the raw extended partition table entry instead of adding in the LBA offset of the first extended partition table. It was actually this very bug in GRUB that inspired me to write the multiboot support into the booter proper (boot2) so that I could load it directly instead of via the boot1 bootsector or chain0 boot program to boot1 bootsector.

For those of you writing your own boot manager code there is a little gotcha to be aware of. The Darwin bootsector code zeroes DS and ES very early in startup so that when it accesses SI (which is implicitly in DS) it assumes SI is relative to segment 0. So your best bet is to ensure that you set DS to 0 before jumping to the partition bootsector and also ensure that SI is pointing to the entry in segment 0, not some other segment.

Perhaps someone who was at Microsoft or IBM at the time remembers. Of course it might be as simple as the MBR code was written by David Litton over at IBM and the partition boot code already existed as it was used to boot from floppies. The MBR author may very well have added the setting of SI thinking somebody might need it then no one at Microsoft ever decided to make any use of it. It's sort of a shame too because to this day it means you must update the hidden sectors field of the BPB (even on NTFS!) or your partition will fail to boot if you had moved it around.

If anyone has some information about how this scheme came about I'd sure like to know. Send me an e-mail.