Skip to main content
IMHCloud Logo
Back to support home

OpenStack Instance Troubleshooting Guide

Introduction

OpenStack instances (virtual machines) can encounter various issues during their lifecycle. When an instance fails to boot, loses network connectivity, or experiences performance degradation, rapid troubleshooting is essential to minimize downtime and maintain service reliability. This guide walks through the most common OpenStack instance problems and provides systematic solutions to resolve them quickly.

Understanding OpenStack Instance States

Before troubleshooting, it's important to understand instance states. An OpenStack instance can exist in several states:

  • BUILD: Instance is being created
  • ACTIVE: Instance is running normally
  • ERROR: Instance failed during creation or operation
  • SHUTOFF: Instance is powered off
  • SUSPENDED: Instance has been suspended (RAM saved to disk)
  • PAUSED: Instance is paused (RAM kept in memory)
  • REBOOT: Instance is rebooting

Check your instance state using the OpenStack CLI:

1openstack server show <instance-id>

Or via the Horizon dashboard under Compute → Instances.

Common Issue 1: Instance Won't Boot

Symptoms

  • Instance stuck in BUILD state
  • Instance enters ERROR state immediately after creation
  • Instance shows ACTIVE but is unreachable

Diagnostic Steps

Check instance status:

1openstack server show <instance-id> -f json

Look for the fault field, which contains error details when present.

Review compute logs:

1# On the compute node
2sudo tail -f /var/log/nova/nova-compute.log

Common causes:

  1. Insufficient resources: The compute node lacks CPU, RAM, or disk space
  2. Image issues: The image is corrupted or incompatible
  3. Flavor mismatch: The flavor specifies more resources than available
  4. Volume attachment failure: Boot volume cannot be attached

Solutions

Insufficient resources:

  • Check available resources on compute nodes
  • Migrate existing instances to free resources
  • Add more compute capacity
  • Choose a smaller flavor

Image problems:

  • Verify image integrity: openstack image show <image-id>
  • Re-upload the image if corrupted
  • Use a known-working image for testing

Flavor issues:

  • List available flavors: openstack flavor list
  • Select a flavor that matches available resources
  • Create a custom flavor if needed

Volume attachment failures:

  • Check Cinder volume status: openstack volume list
  • Verify storage backend connectivity
  • Review Cinder logs: /var/log/cinder/cinder-volume.log

Common Issue 2: Network Connectivity Problems

Symptoms

  • Cannot SSH into instance
  • Instance cannot reach external networks
  • Instance cannot communicate with other instances

Diagnostic Steps

Check security group rules:

1openstack security group list
2openstack security group rule list <security-group-name>

Verify network configuration:

1openstack server show <instance-id> | grep addresses
2openstack port list --server <instance-id>

Test connectivity from the instance console:

  • Access instance via VNC console in Horizon
  • Run ping 8.8.8.8 to test external connectivity
  • Run ip addr to verify IP assignment
  • Run ip route to check routing table

Solutions

Security group blocking traffic:

Add rules to allow SSH and other required services:

1openstack security group rule create --proto tcp --dst-port 22 <security-group-name>
2openstack security group rule create --proto icmp <security-group-name>

No floating IP assigned:

Allocate and associate a floating IP:

1openstack floating ip create <external-network>
2openstack server add floating ip <instance-id> <floating-ip-address>

Network configuration issues:

  • Verify router is attached to subnet: openstack router show <router-id>
  • Check router gateway: openstack router show <router-id> | grep external_gateway
  • Restart network agent if needed (requires admin access)

DHCP not working:

Access instance via console and configure static IP:

1# Inside instance
2sudo ip addr add <ip-address>/24 dev eth0
3sudo ip route add default via <gateway-ip>

Then troubleshoot DHCP agent: openstack network agent list --agent-type dhcp

Common Issue 3: Performance Degradation

Symptoms

  • Instance responding slowly
  • High CPU wait time
  • Network throughput lower than expected
  • Disk I/O bottlenecks

Diagnostic Steps

Check instance metrics:

Access the instance and run:

1# CPU usage
2top
3htop
4
5# Disk I/O
6iostat -x 1
7iotop
8
9# Network usage
10iftop
11nethogs
12
13# Memory usage
14free -h
15vmstat 1

Check compute node load:

1# On compute node (requires admin access)
2uptime
3top
4virsh list --all

Review instance resource allocation:

1openstack server show <instance-id>
2openstack flavor show <flavor-id>

Solutions

CPU bottleneck:

  • Resize instance to larger flavor: openstack server resize <instance-id> <new-flavor>
  • Identify CPU-intensive processes and optimize
  • Spread load across multiple instances

Disk I/O bottleneck:

  • Move to SSD-backed storage if using HDD volumes
  • Increase volume IOPS allocation (if supported)
  • Optimize application disk usage
  • Consider using Cinder volume instead of ephemeral disk

Network bottleneck:

  • Check for network congestion on compute node
  • Verify QoS policies are not limiting bandwidth
  • Use SR-IOV or DPDK for high-performance networking (if available)

Memory pressure:

  • Resize to flavor with more RAM
  • Identify memory leaks in applications
  • Enable swap (not recommended for production)

Common Issue 4: Instance in ERROR State

Symptoms

  • Instance shows ERROR state
  • Cannot perform operations on instance
  • Previous operations failed

Diagnostic Steps

Check fault message:

1openstack server show <instance-id> -f json | grep fault

Review Nova logs:

1sudo grep <instance-id> /var/log/nova/nova-compute.log
2sudo grep <instance-id> /var/log/nova/nova-api.log

Solutions

Reset instance state (admin only):

1openstack server set --state active <instance-id>

Delete and recreate:

If state reset doesn't work, rebuild from scratch:

1openstack server delete <instance-id>
2openstack server create --image <image> --flavor <flavor> --network <network> <new-name>

Fix underlying issue:

The ERROR state usually indicates a deeper problem. Address the root cause before resetting state:

  • Storage backend failure
  • Network configuration error
  • Hypervisor issue
  • Image corruption

Common Issue 5: Console Access Not Working

Symptoms

  • Cannot access VNC console
  • Console shows blank screen
  • Console connection times out

Diagnostic Steps

Verify console URL:

1openstack console url show <instance-id>

Check Nova console service:

1# On controller node
2sudo systemctl status nova-novncproxy

Review console logs:

1sudo tail -f /var/log/nova/nova-novncproxy.log

Solutions

Restart console proxy:

1sudo systemctl restart nova-novncproxy

Verify console port forwarding:

Ensure firewall allows console proxy port (typically 6080):

1sudo firewall-cmd --list-ports
2sudo firewall-cmd --add-port=6080/tcp --permanent
3sudo firewall-cmd --reload

Use serial console instead:

1openstack console url show --serial <instance-id>

Advanced Troubleshooting Techniques

Using Instance Console Logs

Retrieve system console output without VNC access:

1openstack console log show <instance-id>

This shows boot messages and can reveal:

  • Kernel panics
  • Filesystem errors
  • Network configuration issues
  • Cloud-init failures

Checking Cloud-Init Status

If an instance boots but doesn't configure properly:

1# Inside instance
2sudo cloud-init status
3sudo cloud-init analyze show
4sudo cat /var/log/cloud-init.log

Verifying Volume Attachments

For instances with persistent volumes:

1openstack volume list
2openstack volume show <volume-id>
3openstack server volume list <instance-id>

Detach and reattach if needed:

1openstack server remove volume <instance-id> <volume-id>
2openstack server add volume <instance-id> <volume-id>

Network Namespace Debugging

For deep network troubleshooting (requires admin access on network node):

1# List network namespaces
2sudo ip netns list
3
4# Execute command in namespace
5sudo ip netns exec qrouter-<router-id> ip addr
6sudo ip netns exec qdhcp-<network-id> tcpdump -i any

Error Message Reference

"No valid host was found"

  • Meaning: Scheduler could not find a compute node with sufficient resources
  • Solution: Check compute node resources, verify placement service, check scheduler logs

"Build of instance failed: Block Device Mapping is Invalid"

  • Meaning: Volume or image specification is incorrect
  • Solution: Verify volume exists and is available, check image format compatibility

"Failed to allocate the network(s), not rescheduling"

  • Meaning: Network creation failed, often due to IP exhaustion
  • Solution: Check available IPs in subnet, verify network quotas

"Instance failed to spawn"

  • Meaning: Nova compute failed to create the VM
  • Solution: Check compute node logs, verify libvirt/KVM status, check disk space

"Exceeded maximum number of retries"

  • Meaning: Operation timed out after multiple attempts
  • Solution: Check service health, network connectivity, increase timeout values

Prevention and Best Practices

Before creating instances:

  1. Verify sufficient quota: openstack quota show
  2. Check compute node resources
  3. Validate image compatibility
  4. Test network configuration

During instance lifecycle:

  1. Monitor resource usage regularly
  2. Keep security groups updated
  3. Back up critical instances
  4. Document custom configurations

Logging and monitoring:

  1. Enable instance monitoring via Horizon or CLI
  2. Set up alerts for instance state changes
  3. Regularly review OpenStack service logs
  4. Use external monitoring tools (Prometheus, Nagios, etc.)

When to Contact Support

Contact InMotion Cloud support if you encounter:

  • Persistent ERROR states that cannot be resolved
  • Suspected hardware failures on compute nodes
  • Network-wide connectivity issues
  • Storage backend problems affecting multiple instances
  • OpenStack service failures (Nova, Neutron, Cinder)
  • Performance issues across multiple instances

Provide the following information when contacting support:

  • Instance ID
  • Error messages and fault details
  • Output of openstack server show <instance-id>
  • Recent actions performed on the instance
  • Network topology (if network-related)

Summary

OpenStack instance troubleshooting requires a systematic approach. Start by identifying the instance state and reviewing error messages. Check resource availability, network configuration, and security groups. Use OpenStack CLI commands and log files to diagnose issues. Most problems fall into categories of insufficient resources, network misconfiguration, or storage issues. When in doubt, recreate the instance with known-working configurations and escalate persistent issues to support.

Regular monitoring, proper documentation, and preventative measures significantly reduce troubleshooting time and improve instance reliability.