LinuxDevCenter.com

oreilly.comSafari Books Online.Conferences.

We've expanded our Linux news coverage and improved our search! Search for all things Linux across O'Reilly!

Search
Search Tips

advertisement

Listen Print Discuss Subscribe to Linux Subscribe to Newsletters

System Failure and Recovery Practice
Pages: 1, 2

No shell

This time, we're going get rid of bash, which can't be fixed by booting into single-user mode.

Related Reading

Running LinuxRunning Linux
By Matt Welsh, Matthias Kalle Dalheimer & Lar Kaufman
Table of Contents
Index
Sample Chapters
Full Description
Read Online -- Safari

While writing this article, I discovered a bug in the UML block driver which causes COW files not to work properly when they aren't mounted as the root filesystem. So, we are going to dispense with them for the time being.



Copy root_fs to no_bash, boot it up, log in, and get rid of bash:

% cp root_fs no_bash
% linux ubd0=no_bash

usermode:~# rm /bin/bash
usermode:~# halt

If the halt hangs, halt UML with the mconsole.

Let's boot it up again and see how it does without a shell:

linux ubd0=no_bash

It boots very quickly and it's impossible to log in:

INIT: cannot execute "/etc/init.d/rcS"
INIT: Entering runlevel: 2
INIT: cannot execute "/etc/init.d/rc"

Debian GNU/Linux 2.2 (none) ttys/0

(none) login: root
Unable to determine your tty name.

So, we need to shut it down with the mconsole and figure out how to fix it.

We're going to simulate booting from a rescue disk. We're going to do so using root_fs as the rescue disk, assigning that to be disk 0, and moving the damaged filesystem to disk 1:

% linux ubd0=root_fs ubd1=no_bash

So, log in, mount the damaged filesystem on /mnt and make sure that bash is missing:

usermode:~# mount /dev/ubd/1 /mnt
usermode:~# ls /mnt/bin/bash
ls: /mnt/bin/bash: No such file or directory

OK, this is now easy to fix. We can just copy the shell from the rescue disk:


usermode:~# cp -p /bin/bash /mnt/bin/bash
usermode:~# ls -l /bin/bash /mnt/bin/bash
-rwxr-xr-x  1 root   root    461400 Feb 20  2000 /bin/bash
-rwxr-xr-x  1 root   root    461400 Feb 20  2000 /mnt/bin/bash

Now, you can halt UML and boot it on no_bash to confirm that it again boots OK.

Backups, backups, backups

For our finale, we are going to make a backup of the filesystem and destroy enough of it that fixing it requires restoring the backup. The backup device will be an empty file that's large enough to hold our filesystem:

% dd if=/dev/zero of=backup seek=600 bs=$((1024*1024)) count=1

My filesystem is just over 500MB, so I created a 600MB backup file to allow for any overhead of the backup format. Replace the seek=600 with whatever size is appropriate for you. Now copy root_fs to trashed and boot it up with backup as disk 1.

% cp root_fs trashed
% linux ubd0=trashed ubd1=backup

Log in, and make the backup on /dev/ubd/1. I'm using tar here. If you favor a different backup tool, feel free to use it. Notice that we're not creating a filesystem on this device. It's being used as a raw data device in exactly the same way as a tape.

If it fails with an I/O error, the backup file you created was too small. You can extend it by simply running dd on the file with a larger seek argument and retrying the backup.

usermode:~# tar clf /dev/ubd/1 /
tar: Removing leading '/' from member names
tar: Removing leading '/' from link names

When it's done, we will make "trashed" live up to its name:

usermode:~# rm -rf /bin /lib /usr/lib

Remove anything you like. Feel free to corrupt things, too. When you're done having fun, shut it down, using the mconsole, if necessary.

Now, it's time to fix it back up. Boot UML with root_fs as the rescue, backup as disk 1 again, and trashed as disk 2:

% linux ubd0=root_fs ubd1=backup ubd2=trashed

Now, log in, mount the damaged filesystem on /mnt, cd to it, and restore the backup:

usermode:~# mount /dev/ubd/2 /mnt
usermode:~# cd /mnt
usermode:/mnt# tar xpf /dev/ubd/1  
tar: : Cannot mkdir: No such file or directory
tar: Error exit delayed from previous errors

It succeeded, despite the error:

usermode:/mnt# ls bin
arch   dd        fgrep     ls       pidof     run-parts  touch
...

Now, you can check that it is fixed by halting UML and booting it on "trashed" again and seeing that it's fine.

linux ubd0=trashed

In conclusion

Hopefully this article has convinced you that UML can be a valuable system administration tool. I've demonstrated the creation and recovery of a variety of different types of sysadmin catastrophes.

Obviously, this is only a tiny sample of the possible disasters that can happen. You can ensure that you are prepared for them by making them happen and figuring out how to fix them. It is possible to make them happen on a physical machine, but it should be apparent that simulating them with UML is far more convenient, and almost completely authentic. The devices may have different names, but the procedures are exactly the same as on a physical machine.

With the publication of this article, I am inaugurating the Sysadmin Disaster of the Month on the UML web site at http://user-mode-linux.sourceforge.net/sdotm.html. I will present a disaster and take submissions of solutions. I will arbitrarily choose a winner each month based on criteria such as originality, subtlety, brevity, and parsimony. I will also take submissions of proposed disasters. If you have a disaster that you'd like featured, submit it, along with a proposed solution, if you have one.

Jeff Dike


Return to the Linux DevCenter.


What kind of disasters do you want to cause on your UML system?
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 1 of 1.

  • Good artcile
    2001-12-06 12:03:28  srogers [Reply | View]

    I found your arctile preface on linuxtoday and wanted to say thank you for writing it. Teaching someone how to do something is hard enough, but to write out the explaintion takes even more skill. I liked your examples on recovery and I can see UML usefull for things like upgrade testing as well. One of the things besides a system boot failure is upgrading to a new version of OS. Which can cause a boot failure or worse total corruption of system and user data. I can see where this could help cut the cost of testing a upgrade scenario. It would require more disk space but not a complete system.For a small company with a producton system this could save a lot.

    Thanks again,

    Sam Rogers.


Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com