Load Balancer
A load balancer is a network device or service that distributes incoming traffic across multiple servers or instances, providing a single entry point for your application while preventing any single server from becoming overwhelmed.
What is a Load Balancer in cloud hosting?
A load balancer is a network device or service that distributes incoming traffic across multiple servers or instances (virtual machines). It sits between your users and your application servers, receiving all incoming requests and routing them to available backend resources based on distribution rules you configure.
When a user connects to your application, they connect to the load balancer's IP address. The load balancer then selects one of your backend instances to handle that request. This distribution prevents any single server from becoming overwhelmed while other servers sit idle.
Why Load Balancers Exist
Without load balancers, cloud applications face several practical problems. A single server can only handle a limited number of concurrent connections before performance degrades or it crashes. When that server fails, your entire application becomes unavailable. Manual traffic distribution requires complex DNS configurations that lack real-time awareness of server health.
Load balancers solve these problems by providing automated traffic distribution with health checking. They detect when a backend server fails and stop sending traffic to it. They spread load across multiple servers so no single instance becomes a bottleneck. They enable you to add or remove backend capacity without changing how users connect to your application.
What Do Load Balancers Actually Do?
- Receives incoming network connections on a configured port and protocol
- Selects a backend instance from your pool using the configured distribution algorithm (round-robin, least connections, or source IP hash)
- Forwards the connection or request to the selected instance
- Monitors backend instances by sending health check requests at regular intervals
- Removes unhealthy instances from the rotation when health checks fail
- Returns instances to the rotation when health checks succeed again
- Maintains session persistence (sticky sessions) when configured, ensuring repeat requests from the same client reach the same backend instance
- Terminates SSL/TLS encryption at the load balancer layer when configured, reducing CPU load on backend instances
When would I use a Load Balancer?
You use a load balancer when running multiple instances of the same application and you need a single entry point for incoming traffic. This is common for web applications that need to scale beyond a single server's capacity.
You use a load balancer when high availability matters. If one of your instances crashes or needs maintenance, the load balancer detects the failure through health checks and redirects traffic to healthy instances. Your users experience no downtime.
You use a load balancer when adding horizontal scaling to your infrastructure. Instead of upgrading to larger instances (vertical scaling), you add more instances of the same size. The load balancer automatically distributes new traffic across the expanded pool.
You use a load balancer for SSL/TLS termination. The load balancer handles encryption and decryption, presenting plain HTTP to your backend instances. This centralizes certificate management and reduces processing overhead on your application servers.
When would I NOT use a Load Balancer?
You do not need a load balancer for applications running on a single instance. The overhead and cost of a load balancer provide no benefit when there are no multiple backends to balance across.
You do not use a load balancer for internal database connections. Databases typically use replication and clustering mechanisms specific to the database software. Application connection pooling handles distribution within the application layer.
You do not use a load balancer when your application requires direct client-to-server connections with complex state that cannot be distributed. Some real-time applications, game servers, or stateful protocols require persistent connections to specific servers. While session persistence can help, these scenarios often need different architectural approaches.
You might not need a load balancer for development or staging environments where high availability and scaling are not required. A single instance with a floating IP (reserved IP address that can move between instances) may be sufficient.
Real-world example
Company A runs an e-commerce platform on InMotion Cloud. They deploy their web application across three instances in the same region. They create a load balancer configured to listen on ports 80 and 443, pointing to all three instances as backend members.
The load balancer receives incoming HTTPS requests from customers. It terminates the SSL connection and forwards the requests as HTTP to the backend instances using round-robin distribution. Every 5 seconds, the load balancer sends health check requests to each instance on port 80 at the path /health. If an instance fails to respond with HTTP 200 within 3 seconds, the load balancer marks it unhealthy and stops routing traffic to it.
During a product launch, traffic spikes to 5,000 requests per minute. The load balancer distributes this load evenly across the three instances, with each handling approximately 1,667 requests per minute. One instance experiences a software crash. The load balancer detects the failed health checks within 10 seconds and redistributes all traffic to the two remaining healthy instances. Company A's customers experience no interruption. The operations team brings the failed instance back online, and the load balancer automatically reintegrates it into the pool when health checks pass.
Frequently Asked Questions
Do I need more than one load balancer?
For most applications, a single load balancer provides sufficient availability because cloud providers implement load balancers as highly available services backed by redundant infrastructure. If your load balancer service itself becomes unavailable (rare but possible), your application will be unreachable regardless of how many healthy backend instances you have. Some cloud platforms offer load balancer redundancy through multiple availability zones, where the load balancer service automatically fails over between zones. Check with your provider about the high availability architecture of their load balancing service.
Does creating a load balancer affect existing resources?
Creating a load balancer does not affect existing instances until you explicitly add them as backend members. The load balancer exists as a separate resource with its own IP address. When you add an instance to the load balancer's backend pool, traffic starts flowing to that instance based on the load balancer's algorithm and health checks. You control when instances join or leave the pool.
What happens if I delete a load balancer?
Deleting a load balancer removes the traffic distribution mechanism, but your backend instances continue running unchanged. Any traffic directed at the load balancer's IP address will fail because that IP address no longer exists. Before deleting a load balancer, update your DNS records to point directly to an instance IP address, create a new load balancer, or prepare for downtime.
Can I add instances in different regions to the same load balancer?
Most cloud load balancers only distribute traffic to instances within the same region. Cross-region load balancing typically requires DNS-based load balancing or a global load balancer service (if your provider offers one). Regional load balancers optimize for low latency between the load balancer and backend instances. For multi-region deployments, you typically deploy a separate load balancer in each region and use DNS to distribute traffic between regions.
How do health checks determine if an instance is healthy?
Health checks send a request to each backend instance at a configured interval (such as every 5 seconds) to a specific port and path (such as HTTP GET to /health on port 80). If the instance responds with a success code (typically HTTP 200) within the timeout period (such as 3 seconds), it passes the health check. After a configured number of consecutive failures (such as 3), the load balancer marks the instance unhealthy and stops routing traffic to it. After a configured number of consecutive successes (such as 2), the load balancer marks it healthy again and resumes traffic. You configure all these parameters when setting up the load balancer.
Summary
- A load balancer distributes incoming traffic across multiple backend instances, providing a single entry point for your application while preventing any single server from becoming overwhelmed
- Load balancers enable high availability by detecting instance failures through health checks and automatically redirecting traffic to healthy instances
- You use load balancers when running multiple instances of the same application, when requiring high availability, or when implementing horizontal scaling
- Load balancers can terminate SSL/TLS connections, centralizing certificate management and reducing CPU load on backend instances
- Most cloud load balancers operate within a single region and provide high availability through the cloud provider's redundant infrastructure
Related Terms
- Instance - A load balancer distributes traffic across multiple instances (virtual machines) running the same application, such as web servers, API servers, or application containers
- Floating IP - A reserved IP address that can move between instances, providing a simpler alternative to load balancers for single-instance applications that need a stable IP address, such as a primary database server or a development environment
- Health Check - The automated test a load balancer performs to verify backend instances are responsive, such as sending HTTP requests to a
/healthendpoint every few seconds to detect failures - Security Group - Firewall rules that control network traffic to your load balancer and backend instances, such as allowing inbound HTTPS on port 443 to the load balancer and allowing HTTP on port 80 from the load balancer to backend instances
