Saturday, February 2, 2019

How to fix (fsck) a root file system that you have to boot into on Linux

How to fix (fsck) a root file system that you have to boot into on Linux

Two days ago I corrupted my file system during a failed resume from standby on Fedora 19. This feature has never quite worked correctly and randomly makes the kernel panic. Usually, I hard reboot my laptop and everything is fine but that time, something went wrong and when it came back up:
systemd-fsck[605]: /dev/sda2: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
systemd-fsck[605]: (i.e., without -a or -p options)
[ 13.652068] systemd-fsck[605]: fsck failed with error code 4.
Welcome to emergency mode. Use "systemctl default" or ^D to activate default
mode.
Give root password for maintenance
(or type Control-D to continue):
In this case /dev/sda2 is my root partition and since it was mounted even in maintenance mode, attempting to run fsck on it would output:
fsck.ext4 /dev/sda2
e2fsck 1.42.7 (21-Jan-2013)
/dev/sda2 is mounted.
e2fsck: Cannot continue, aborting.
Which makes sense as common knowledge tells us that running fsckon a mounted file system will most likely do more damage to it.

The best option

Your best option is simply to boot into another Linux, be it on a different partition, a USB drive or a CD and run fsck manually on the faulty partition, which can easily be unmounted if necessary because no OS is using it. Easy. Normally yes, but my stupid Macbook Pro 2008 cannot boot though USB into anything else other than Mac OS X, my cd drive has been dead for years and recently, I got rid of my OS X partition. To make things more complicated, I’m in Thailand at the moment and obviously not able to take apart my computer to grab the hard drive and stick into a working system.

The other option (if you cannot boot into another Linux)

In order to assess the damage, I ran fsck in dry-run mode and piped the output to more to make reading more practical:
fsck.ext4 -n /dev/sda2 | more
From there, I could ensure that no critical files had been damaged and while keeping in mind that it’s always a gamble to use a corrupted file system, I proceeded to boot into the system to make some backups. That out of the way, I did some research on the web on how to fix a root file system that I had to boot into and sadly, not many things turned up for its not an ideal solution. Forcing the system to do it a boot time by creating a file named forcefsck and writing y in it (echo y > /forcefsck) at root no longer works and adding fsck.mode=force on the kernel command line did not fix the problem as fsck will not fix errors on its own without authorization, ie: someone to enter yes on the keyboard. Tried a few other tricks but none worked. I had no choice but keep my fingers crossed and use the system as is.
A few days later, I decided to get back to the issue and while researching alternative solutions, I read that it was possible to fix errors on a read-only file system, which it turns out can also be used to boot into. And it worked, so for posterity here is the technique:
  1. Put your root partition into read-only mode by modifying the faulty partition’s line on /etc/fstab (but remember your old settings):
    UUID=fd1d0fad-3a4c-457f-9b5e-eed021cce3d1 /                       ext4    remount,ro        1 1
    Note: If you’re already in maintenance mode at this point, you may be able to remount your file system in read only mode by running “mount -o remount,ro /” and skipping the reboot (thanks Jay).
  2. Reboot
  3. Switch to runlevel 1 just to minimize the amount of interfering processes (skip this step if you are running the session over SSH [thanks Josh]):
    init 1
  4. Fix your file system (replace /dev/sda2 with your partition’s device), which should now work because the root partition is in read only:
    fsck /dev/sda2
  5. Reboot
  6. Make your root file system readable/writable:
    mount -o remount,rw /dev/sda2
  7. Restore your /etc/fstab to its original state.
  8. Reboot
VoilĂ , your system is safe to use again. Hopefully this will have gotten you out of a sticky situation like it did for me. If errors keep coming up, it’s probably a sign that your hard-drive is failing and before you loose it completely, you should mirror your data to a new one.

No comments:

Post a Comment