I rammed my head against the wall, and all was clear (part 4)

This is part of the hard disk recovery documentation.

Part 4.


I rammed my head against the wall (of absurdity that is ext2), and all was clear

The obvious need to fork off another hard disk recovery project from a hard disk recovery project was just too pathetic to think about so I kicked this aside for a day. Today I started back from the beginning, this time researching the ext2 filesystem. The documentation (that is within reach of Google) for this file system is just poor. Whoever is responsible for documenting this part of Linux should be flogged, or at least made to go home and do it over. Yeah, I can go look at the source code (which I did), but I’m sorry, that does not constitute proper documentation.



What I gathered:

The ext2 physical structure consists of a basic unit called a block, whose size in bytes can be set at formatting time. Contiguous blocks are organized into groups, and contiguous groups make up the partition. The groups are of equal size, each containing by default 32768 blocks (on my partition a block is 4KB, so I guess a group is 128MB).

The next part is where I depart from the random wrong stuff strewn around the internet. I know they are wrong because I actually looked at the disk, but it caused me a lot of headache. The first group (Group 0) contains a 1KB filler, which some people like to call the “boot block” for unknown reasons (so far as I can tell, it doesn’t boot anything and it isn’t a block in size), followed by the 1KB Superblock (also not a block in size), which is like a filesystem header. It carries a bunch of file system parameters, and an identifying two-byte sequence called the “magic number.” At 4KB (for my block size) into Group 0 begins the Group Descriptor Table, which is also a filesystem header. The Group Descriptor Table contains 32-byte entries, one entry for each group on the disk, and each entry telling the location of the block bitmap (i.e. allocation table for blocks) and the inode bitmap (i.e. allocation table for inodes) belonging to that group.

The subsequent groups all have the same Group Descriptor Table at the same position in the group as in Group 0, and some of the groups also have a copy of the 1KB Superblock at its front (not at the 1KB position). The last part is important. The first two Superblocks are not 32768 blocks apart.

That is the gist of it. I didn’t bother with the block and inode allocation strategies because those aren’t important to me.

So when Knoppix said the Group Descriptor Table was corrupt, it means one or more of entries are pointing to the wrong places. Also when it says could not find the Superblock due to a “bad magic number,” it most likely means the Superblock copies in other groups are all overwritten, or, not where they are expected to be found, so they or the Group Descriptor Tables can’t be used to correct the Group 0 ones.

To find out what happened exactly, a disk editor was needed. I used to know a pretty good one for DOS, but I lost it. However, Windows Support Tools for Advanced Users has a Windows tool (used to be in the NT resource kit) called DiskProbe (dskprobe.exe) that does the trick. It’s a bit annoying that disk positions must be counted in units of sectors (512 B), but oh well. Also the Linux tool called fsstat was used, which dumps information read from the Superblock and Group Descriptor Table out one group at a time, whatever they contain (corrupt or not).

What I found, from fsstat, and from comparing with another disk formatted similarly with ext2, is all the Superblock copies in the groups that should have them are there, but offset by 1KB. How this happened I do not know. Seems like a bug of mkfs to me (it has to be mkfs because where the Superblock is supposed to be resides prior data). As for the source of the problem, the Group Descriptor Table stored in Group 0 has one entry in which 4 bytes were overwritten with 0x7FFFFFFF. How that happened I also do not know for sure, but it is almost certainly the fault of ext2ifs, since that is the only thing to touch the partition between it working and not working. But as 4 bytes appears to be the extent of the corruption, and the backup Group Descriptor Tables are perfectly good (they just can’t be found by silly Knoppix), I just copied four bytes by hand in DiskProbe, and bam, the ext2 file system was mountable again by Knoppix.

The absurdity is that not only did a presumably illegal ext2 file system get written onto the disk, but also none of the ext2 fs check tools were able to find where the correct Superblocks were, even though they were just 1KB away, and furthermore, gpart, the Linux program that is supposed to guess what partitions are on the disk, could not even identify this partition as ext2, despite the Group 0 Superblock being intact and all the other structures being obviously ext2. Whoever wrote these parts of Linux should be flogged, or at least made to go home and write documentation instead.

Lessons today:

  • The backup littering “strategy” of ext2 is only better than nothing… it is way too simple-minded and needs way too much hand-holding.
  • Not one piece of software can be trusted to do the right thing.

On to Part 5.

No comments yet. Be the first.

Leave a reply