Google Compute Engine VM instance: VFS: Unable to

2020-07-23 03:59发布

问题:

My instance on Google Compute Engine is not booting up due to having some boot order issues.

So, I have created a another instance and re-configured my machine.

My questions:

  1. How can I handle these issues when I host some websites?
  2. How can I recover my data from old disk?

logs

  

    [    0.348577] Key type trusted registered
    [    0.349232] Key type encrypted registered
    [    0.349769] AppArmor: AppArmor sha1 policy hashing enabled
    [    0.350351] ima: No TPM chip found, activating TPM-bypass!
    [    0.351070] evm: HMAC attrs: 0x1
    [    0.351549]   Magic number: 11:333:138
    [    0.352077] block ram3: hash matches
    [    0.352550] rtc_cmos 00:00: setting system clock to 2015-12-19 17:06:53 UTC (1450544813)
    [    0.353492] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
    [    0.354108] EDD information not available.
    [    0.536267] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2
    [    0.537862] md: Waiting for all devices to be available before autodetect
    [    0.538979] md: If you don't use raid, use raid=noautodetect
    [    0.539969] md: Autodetecting RAID arrays.
    [    0.540699] md: Scanned 0 and added 0 devices.
    [    0.541565] md: autorun ...
    [    0.542093] md: ... autorun DONE.
    [    0.542723] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
    [    0.543731] Please append a correct "root=" boot option; here are the available partitions:
    [    0.545011] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
    [    0.546199] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-39-generic #44~14.04.1-Ubuntu
    [    0.547579] Hardware name: Google Google, BIOS Google 01/01/2011
    [    0.548728]  ffffea00008ae140 ffff880024ee7db8 ffffffff817af92b 000000000000111e
    [    0.549004]  ffffffff81a7c7c8 ffff880024ee7e38 ffffffff817a976b ffff880024ee7dd8
    [    0.549004]  ffffffff00000010 ffff880024ee7e48 ffff880024ee7de8 ffff880024ee7e38
    [    0.549004] Call Trace:
    [    0.549004]  [] dump_stack+0x45/0x57
    [    0.549004]  [] panic+0xc1/0x1f5
    [    0.549004]  [] mount_block_root+0x210/0x2a9
    [    0.549004]  [] mount_root+0x54/0x58
    [    0.549004]  [] prepare_namespace+0x16d/0x1a6
    [    0.549004]  [] kernel_init_freeable+0x1f6/0x20b
    [    0.549004]  [] ? initcall_blacklist+0xc0/0xc0
    [    0.549004]  [] ? rest_init+0x80/0x80
    [    0.549004]  [] kernel_init+0xe/0xf0
    [    0.549004]  [] ret_from_fork+0x58/0x90
    [    0.549004]  [] ? rest_init+0x80/0x80
    [    0.549004] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
    [    0.549004] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)


回答1:

  1. How to handle these issues when I host some websites?

I'm not sure how you got into this situation, but it would be nice to have additional information (see my comment above) to be able to understand what triggered this issue.

  1. How to recover my data from old disk?

Attach and mount the disk

Assuming you did not delete the original disk when you deleted the instance, you can simply mount this disk from another VM to read the data from it. To do this:

  1. attach the disk to another VM instance, e.g.,

    gcloud compute instances attach-disk $INSTANCE --disk $DISK

  2. mount the disk:

    sudo mkdir -p /mnt/disks/[MNT_DIR]

    sudo mount [OPTIONS] /dev/disk/by-id/google-[DISK_NAME] /mnt/disks/[MNT_DIR]

    Note: you'll need to substitute appropriate values for:

    • MNT_DIR: directory
    • OPTIONS: options appropriate for your disk and filesystem
    • DISK_NAME: the id of the disk after you attach it to the VM

Unmounting and detaching the disk

When you are done using the disk, reverse the steps:

Note: Before you detach a non-root disk, unmount the disk first. Detaching a mounted disk might result in incomplete I/O operation and data corruption.

  1. unmount the disk

    sudo umount /dev/disk/by-id/google-[DISK_NAME]

  2. detach the disk from the VM:

    gcloud compute instances detach-disk $INSTANCE --device-name my-new-device



回答2:

What Causes This?

That is the million dollar question. After inspecting my GCE VM, I found out there were 14 different kernels installed taking up several hundred MB's of space. Most of the kernels didn't have a corresponding initrd.img file, and were therefore not bootable (including 3.19.0-39-generic).

I certainly never went around trying to install random kernels, and once removed, they no longer appear as available upgrades, so I'm not sure what happened. Seriously, what happened?

Edit: New response from Google Cloud Support.

I received another disconcerting response. This may explain the additional, errant kernels.

"On rare occasions, a VM needs to be migrated from one physical host to another. In such case, a kernel upgrade and security patches might be applied by Google."

1. "How can I handle these issues when I host some websites?"

My first instinct is to recommend using AWS instead of GCE. However, GCE is less expensive. Before doing any upgrades, make sure you take a snapshot, and try rebooting the server to see if the upgrades broke anything.

2. How can I recover my data from old disk?

Even Better - How to recover your instance...

After several back-and-forth emails, I finally received a response from support that allowed me to resolve the issue. Be mindful, you will have to change things to match your unique VM.

  1. Take a snapshot of the disk first in case we need to roll back any of the changes below.

  2. Edit the properties of the broken instance to disable this option: "Delete boot disk when instance is deleted"

  3. Delete the broken instance.

    IMPORTANT: ensure not to select the option to delete the boot disk. Otherwise, the disk will get removed permanently!!

  4. Start up a new temporary instance.

  5. Attach the broken disk (this will appear as /dev/sdb1) to the temporary instance

  6. When the temporary instance is booted up, do the following:

In the temporary instance:

# Run fsck to fix any disk corruption issues
$ sudo fsck.ext4 -a /dev/sdb1

# Mount the disk from the broken vm
$ sudo mkdir /mnt/sdb
$ sudo mount /dev/sdb1 /mnt/sdb/ -t ext4

# Find out the UUID of the broken disk. In this case, the uuid of sdb1 is d9cae47b-328f-482a-a202-d0ba41926661
$ ls -alt /dev/disk/by-uuid/
lrwxrwxrwx. 1 root root 10 Jan 6 07:43 d9cae47b-328f-482a-a202-d0ba41926661 -> ../../sdb1
lrwxrwxrwx. 1 root root 10 Jan 6 05:39 a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 -> ../../sda1

# Update the UUID in grub.cfg (if necessary)
$ sudo vim /mnt/sdb/boot/grub/grub.cfg

Note: This ^^^ is where I deviated from the support instructions.

Instead of modifying all the boot entries to set root=UUID=[uuid character string], I looked for all the entries that set root=/dev/sda1 and deleted them. I also deleted every entry that didn't set an initrd.img file. The top boot entry with correct parameters in my case ended up being 3.19.0-31-generic. But yours may be different.

# Flush all changes to disk
$ sudo sync

# Shut down the temporary instance
$ sudo shutdown -h now

Finally, detach the HDD from the temporary instance, and create a new instance based off of the fixed disk. It will hopefully boot.

Assuming it does boot, you have a lot of work to do. If you have half as many unused kernels as me, then you might want to purge the unused ones (especially since some are likely missing a corresponding initrd.img file).

I used the second answer (the terminal-based one) in this askubuntu question to purge the other kernels.

Note: Make sure you don't purge the kernel you booted in with!



回答3:

In my case grub's (/boot/grub/grub.cfg) first menuentry (3.19.0-51-generic) was missing an initrd entry and was unable to boot.

Upon further investigating, looking at dpkg for the specific kernel its marked as failed and unconfigured

dpkg -l | grep 3.19.0-51-generic
     iF  linux-image-3.19.0-51-generic       3.19.0-51.58~14.04.1         
     iU  linux-image-extra-3.19.0-51-generic 3.19.0-51.58~14.04.1 

This all stemmed from the Ubuntu image supplied by Google having unattended-upgrades enabled. For some reason the initrd was killed when it was being built and something else came along and ran update-grub2.

 unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: Generating /boot/initrd.img-3.19.0-51-generic
 Killed
 E: mkinitramfs failure cpio 141 xz -8 --check=crc32 137
 unattended-upgrades-dpkg_2016-03-10_06:49:42.550403.log:update-initramfs: failed for /boot/initrd.img-3.19.0-51-generic with 1.

To work around the immediate problem run.

 dpkg --force-confold --configure -a

Although unattended-upgrades in theory is a great idea, having it enabled by default can have unattended consequences.