Rescuing Failed Instances in OpenStack
When an OpenStack instance fails to boot or becomes inaccessible, rescue mode provides a way to recover your data and fix configuration problems without losing everything. This guide walks through the complete process of rescuing failed instances, from diagnosis to recovery.
What is Rescue Mode?
Rescue mode boots your instance from a temporary image while attaching your original root disk as a secondary device. This gives you access to your data and file system without relying on the broken operating system.
Think of it like booting from a live USB drive on a physical server. You can mount the original disk, recover files, fix configuration errors, or repair a broken bootloader.
When to Use Rescue Mode
Rescue mode is appropriate when:
- Instance won't boot due to kernel panic, filesystem corruption, or bootloader issues
- SSH access fails after a bad configuration change
- Password reset needed and cloud-init isn't working
- Critical files need recovery before rebuilding
- Filesystem repair required using fsck or similar tools
- Configuration files need fixing that prevent normal boot
Rescue mode is not appropriate for:
- Hardware-level failures (these require migration or rebuild)
- Severe disk corruption (backup recovery is safer)
- Performance troubleshooting (use normal boot with monitoring)
Prerequisites
Before entering rescue mode, ensure you have:
- Access to Horizon dashboard or OpenStack CLI tools installed
- Valid credentials for your OpenStack environment
- Instance ID or name of the failed instance
- SSH key configured for the rescue image
- Knowledge of the instance's original OS (helps with file paths and commands)
You can use either the Horizon dashboard for a visual interface or the OpenStack CLI for automation and scripting. Both methods provide the same rescue functionality.
Step 1: Identify the Failed Instance
First, locate the instance and verify its current state. You can use either Horizon or the CLI.
Using Horizon Dashboard
- Navigate to Project > Compute > Instances
- Review the instance list and locate your failed instance
- Check the Status column for the instance state
- Click the instance name to view detailed information
- Note the instance ID and current status
Common failure states include:
- ERROR indicates a serious problem
- SHUTOFF means the instance stopped
- ACTIVE but unreachable suggests OS-level issues
Using OpenStack CLI
List all instances:
1openstack server list
Find your failed instance and note its ID. Check its detailed status:
1openstack server show <instance-id>
Look at the status field for the same failure states listed above.
Step 2: Create a Snapshot (Optional but Recommended)
Before making changes, create a snapshot for rollback capability. You can create snapshots using either Horizon or the CLI.
Using Horizon Dashboard
- Navigate to Project > Compute > Instances
- Locate the failed instance in the list
- Click the dropdown arrow on the right side of the instance row
- Select Create Snapshot from the actions menu
- Enter a snapshot name like backup-before-rescue
- Click Create Snapshot
- Navigate to Project > Compute > Images to monitor progress
- Wait for the status to change to Active
When status shows Active, the snapshot is ready.
Using OpenStack CLI
Create a snapshot:
1openstack server image create --name backup-before-rescue <instance-id>
This snapshot captures the instance's current state. If rescue operations go wrong, you can rebuild from this point.
Wait for the snapshot to complete:
1openstack image list | grep backup-before-rescue
When status shows active, the snapshot is ready.
Step 3: Enter Rescue Mode
You can enter rescue mode using either the Horizon dashboard or the OpenStack CLI. Both methods achieve the same result.
How to Rescue an Instance in Horizon Dashboard
The Horizon dashboard provides a visual interface for entering rescue mode.
- Navigate to Project > Compute > Instances
- Locate the instance you want to rescue in the instance list
- Click the dropdown arrow on the right side of the instance row
- Select Rescue Instance from the actions menu
- In the dialog that appears, select a rescue image from the dropdown (or leave default to use a copy of the original image)
- Optionally set a password for the rescue environment
- Click Confirm to start the rescue process
The instance shuts down and boots from the rescue image. This takes 30 to 60 seconds.
Watch the Status column in the instance list. When it changes to RESCUE, the instance is ready.
To access the rescued instance, click the instance name to open its details page, then click the Console tab to access the rescue environment.
How to Rescue an Instance Using OpenStack CLI
For automation and scripting, use the OpenStack CLI to enter rescue mode.
Put the instance into rescue mode:
1openstack server rescue <instance-id>
You can specify a custom rescue image if needed:
1openstack server rescue --image <rescue-image-id> <instance-id>
If no image is specified, OpenStack uses a default rescue image based on your instance's original image type.
The rescue process takes 30 to 60 seconds. Monitor progress:
1openstack server show <instance-id> | grep status
When status changes to RESCUE, the instance is ready.
Step 4: Connect to the Rescue Environment
Once in rescue mode, connect via SSH using your configured key:
1ssh -i /path/to/key.pem rescue@<instance-floating-ip>
Some rescue images use different default users like ubuntu or centos. Check your cloud provider's documentation if rescue doesn't work.
After connecting, you're running in the temporary rescue environment. Your original root disk is attached but not mounted.
Step 5: Locate and Mount the Original Disk
List attached block devices to find your original root disk:
1lsblk
You'll see output like:
1NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT2vda 8:0 0 10G 0 disk3└─vda1 8:1 0 10G 0 part /4vdb 8:16 0 80G 0 disk5└─vdb1 8:17 0 80G 0 part
In this example:
- vda is the rescue image (currently booted)
- vdb is your original root disk (unmounted)
Create a mount point and mount the original disk:
1sudo mkdir /mnt/original2sudo mount /dev/vdb1 /mnt/original
If you have LVM volumes or multiple partitions, you may need to mount them differently:
1# For LVM volumes2sudo vgscan3sudo vgchange -ay4sudo mount /dev/mapper/vg-root /mnt/original56# For systems with separate /boot7sudo mount /dev/vdb1 /mnt/original8sudo mount /dev/vdb2 /mnt/original/boot
Verify the mount succeeded:
1ls /mnt/original
You should see the familiar directory structure from your instance (bin, etc, home, var, etc.).
Step 6: Recover Data or Fix Configuration
Now that your original filesystem is mounted, you can perform recovery operations.
Recover Important Files
Copy critical data to a safe location:
1# Create a tarball of important directories2sudo tar -czf /tmp/backup-data.tar.gz -C /mnt/original/home .34# Copy to object storage or another instance5# Example with Swift:6swift upload backups /tmp/backup-data.tar.gz
Fix Configuration Files
Edit configuration files that prevented boot:
1# Fix SSH config2sudo nano /mnt/original/etc/ssh/sshd_config34# Reset network configuration5sudo nano /mnt/original/etc/network/interfaces67# Fix fstab if mount issues exist8sudo nano /mnt/original/etc/fstab
Reset Root Password
If you need to reset the root password:
1# Chroot into the original system2sudo chroot /mnt/original34# Change the password5passwd root67# Exit chroot8exit
Check and Repair Filesystem
Run filesystem checks if corruption is suspected:
1# Unmount first2sudo umount /mnt/original34# Run fsck5sudo fsck -y /dev/vdb167# Remount8sudo mount /dev/vdb1 /mnt/original
The -y flag automatically answers yes to repair prompts. Remove it if you want manual control.
Fix Bootloader Issues
If GRUB is broken:
1# Chroot into the system2sudo mount /dev/vdb1 /mnt/original3sudo mount --bind /dev /mnt/original/dev4sudo mount --bind /proc /mnt/original/proc5sudo mount --bind /sys /mnt/original/sys6sudo chroot /mnt/original78# Reinstall GRUB9grub-install /dev/vdb10update-grub1112# Exit and clean up13exit14sudo umount /mnt/original/sys15sudo umount /mnt/original/proc16sudo umount /mnt/original/dev17sudo umount /mnt/original
Step 7: Exit Rescue Mode
Once you've completed recovery operations, exit rescue mode. You can unrescue using either Horizon or the CLI.
How to Unrescue an Instance in Horizon Dashboard
- First, unmount the original disk if you're still connected to the rescue environment:
1 sudo umount /mnt/original
- Exit the SSH or console session
- Navigate to Project > Compute > Instances
- Locate the rescued instance (status shows RESCUE)
- Click the dropdown arrow on the right side of the instance row
- Select Unrescue Instance from the actions menu
- Click Confirm in the dialog that appears
The instance reboots using its original root disk. This takes 1 to 2 minutes. The status returns to ACTIVE when complete.
How to Unrescue an Instance Using OpenStack CLI
From your local machine, exit the rescue environment:
1# First, unmount the original disk if still connected2sudo umount /mnt/original34# Exit SSH session5exit
Unrescue the instance:
1openstack server unrescue <instance-id>
The instance reboots using its original root disk. This takes 1 to 2 minutes.
Step 8: Verify Recovery
After unrescuing, verify the instance boots normally:
1openstack server show <instance-id> | grep status
Status should return to ACTIVE.
Test connectivity:
1ssh -i /path/to/key.pem user@<instance-floating-ip>
Check that services are running:
1systemctl status
Verify your configuration changes took effect.
Troubleshooting Common Issues
Cannot Mount Original Disk
If mounting fails with errors like "wrong fs type" or "bad superblock":
1# Check filesystem type2sudo file -s /dev/vdb134# Try forcing the filesystem type5sudo mount -t ext4 /dev/vdb1 /mnt/original
For severe corruption, try mounting read-only:
1sudo mount -o ro /dev/vdb1 /mnt/original
Rescue Image Doesn't Match Original OS
If you need a specific rescue image:
1# List available images2openstack image list34# Enter rescue with specific image5openstack server rescue --image <image-id> <instance-id>
Cannot Exit Rescue Mode
If unrescue fails, try:
1# Check current state2openstack server show <instance-id>34# Force stop and restart5openstack server stop <instance-id>6openstack server start <instance-id>
If the instance remains stuck, contact support.
Lost Changes After Unrescue
If your changes didn't persist:
- Verify you edited files in /mnt/original, not the rescue environment
- Check that you unmounted properly before unrescuing
- Ensure filesystem writes completed (run sync before unmounting)
Best Practices
Document what broke. Before fixing anything, note what failed and why. This helps prevent repeat issues.
Work on a copy. When possible, snapshot before rescue and work on a test instance first.
Use rescue mode sparingly. It's for emergency recovery, not routine maintenance. Fix the root cause to avoid needing rescue again.
Keep rescue images updated. Use recent rescue images that match your instance's OS version.
Automate backups. Regular snapshots and backups reduce the need for rescue operations.
Test your recovery process. Practice rescuing a test instance before you need it in production.
Alternative: Rebuild from Backup
If rescue mode doesn't resolve the issue, rebuilding from backup is often faster and safer:
1# Create new instance from snapshot2openstack server create --image <snapshot-id> --flavor <flavor> recovered-instance34# Or rebuild existing instance5openstack server rebuild --image <snapshot-id> <instance-id>
Rebuilding replaces the root disk entirely, which solves corruption problems that rescue mode cannot fix.
When to Contact Support
Contact your cloud provider if:
- Disk is completely inaccessible even in rescue mode
- Hardware issues suspected (repeated failures, IO errors)
- Rescue mode fails to boot
- Data recovery requires specialized tools
Provide your instance ID, error messages, and steps you've already tried.
Summary
Rescue mode is a powerful tool for recovering failed OpenStack instances. By booting from a clean image and mounting the original disk, you can fix configuration errors, recover data, and repair broken systems without losing everything.
The key steps are:
- Snapshot the failed instance for safety
- Enter rescue mode using OpenStack CLI
- Mount the original root disk
- Fix configuration or recover data
- Unmount and exit rescue mode
- Verify the instance boots normally
With practice, rescue operations become routine. The more familiar you are with the process, the faster you can recover from failures and minimize downtime.
