Skip to main content

Load Balancing Strategies: Study Guide

·

Load balancing distributes workloads across multiple computing resources to prevent bottlenecks and ensure system reliability. It's essential for cloud computing, DevOps, and systems engineering roles. Understanding load balancing strategies prepares you for distributed systems challenges and technical interviews.

This topic combines theory with practical implementation. Flashcards help you master algorithms, architectural patterns, and real-world scenarios through active recall and spaced repetition. You'll learn when to use each strategy and how to apply it to modern applications.

Load balancing strategies - study with AI flashcards and spaced repetition

Fundamentals of Load Balancing

Load balancing distributes network traffic and computing tasks across multiple servers to achieve optimal performance. A load balancer acts as a reverse proxy between clients and backend servers, routing each request intelligently.

Core Goals of Load Balancing

Load balancing serves four main purposes:

  • Improve response time by spreading requests across servers
  • Maximize throughput and resource utilization
  • Prevent any single server from becoming overloaded
  • Increase system availability and reliability

How Load Balancers Work

When a client sends a request, the load balancer receives it first. It then forwards the request to a backend server based on a chosen algorithm. The load balancer continuously monitors server health, removing failed servers from the pool and restoring them when they recover.

Without load balancing, a popular service would direct all requests to one server. That server becomes overwhelmed while others remain idle. This creates bottlenecks and poor user experience.

Layer 4 vs. Layer 7 Load Balancing

Layer 4 load balancers handle TCP and UDP traffic, making decisions based on network information alone. Layer 7 load balancers understand HTTP and can route based on URL paths, hostnames, or other application-specific data. Layer 7 enables sophisticated routing strategies that consider application context rather than just network-level information.

Common Load Balancing Algorithms

Each algorithm determines how requests distribute to backend servers. Choosing the right algorithm depends on your application's session needs, server capacity, and connection types.

Simple Algorithms

Round Robin sends requests to servers in circular sequence, treating each equally. It works well when all servers have similar capacity but ignores current server load.

Weighted Round Robin assigns weights to servers based on capacity. More powerful servers receive proportionally more requests. For example, servers weighted 1, 2, and 3 would handle one-sixth, one-third, and one-half of traffic respectively.

Random selection distributes requests randomly and is surprisingly effective in some scenarios while remaining simple to implement.

Connection-Based Algorithms

Least Connections directs requests to the server handling the fewest active connections. This works best for long-lived connections or sessions.

Weighted Least Connections refines this by considering server capacity in addition to connection count.

Least Response Time combines connection count with average response time. It routes to whichever server will likely respond fastest.

Session and Resource-Based Algorithms

IP Hash uses the client's IP address to consistently route requests from the same client to the same server. This provides session persistence without server-side session storage. However, it can cause uneven distribution if many clients share the same IP.

Resource-Based Load Balancing considers actual server resources like CPU and memory usage. It routes to servers with available capacity.

Session-Persistence (Sticky Sessions) ensures a user's requests go to the same server consistently. This matters for applications maintaining in-memory session state.

Load Balancing Architecture and Implementation

Load balancing can be implemented at different levels depending on your application needs and infrastructure constraints.

Types of Load Balancers

Hardware load balancers are physical devices in the network. They offer high performance and reliability but cost significantly more.

Software load balancers run on standard servers. Tools like HAProxy, Nginx, and Apache provide flexibility and cost-effectiveness for many scenarios.

Cloud-based load balancers come from AWS (Elastic Load Balancing), Azure (Load Balancer), and Google Cloud. They provide managed solutions that automatically scale and integrate with cloud infrastructure.

Deployment Patterns

Active-Passive maintains a primary load balancer handling all traffic while a secondary remains on standby. It takes over if the primary fails.

Active-Active runs multiple load balancers simultaneously, each handling traffic independently. This provides better resource utilization and redundancy.

Advanced Load Balancer Features

Modern load balancers often provide:

  • SSL/TLS termination to offload encryption overhead from backend servers
  • Compression to reduce bandwidth usage
  • Request rate limiting to prevent abuse
  • Health checks to verify server availability
  • Geographic load balancing to distribute traffic across data centers in different locations

Handling Stateful Sessions

Load balancers must handle sticky sessions carefully to maintain user experience. This might involve consistent hashing or session replication across servers. Modern architectures often store session data externally in databases or cache systems like Redis instead.

Advanced Concepts and Modern Approaches

Modern distributed systems have introduced sophisticated load balancing requirements beyond traditional approaches.

Service Mesh and Intelligent Routing

Service mesh technologies like Istio and Linkerd provide intelligent load balancing at the application layer. They offer advanced features like circuit breaking, retries, and traffic splitting.

A circuit breaker pattern prevents cascading failures by stopping requests to failing services. This is a form of intelligent load balancing that protects system stability.

Deployment Strategies

Canary deployments use load balancers to gradually direct a small percentage of traffic to new service versions. This reduces deployment risk by catching issues early.

Blue-green deployments maintain two identical production environments. Load balancing switches traffic between them for safe deployments.

Advanced Algorithms and Systems

Least Outstanding Requests (LOR) considers both connection count and request processing time. It optimizes for overall system health better than basic least connections.

Maglev, developed by Google, uses consistent hashing with redundancy. It achieves high availability and even distribution in very large-scale systems.

Cloud-Native Environments

Load balancing in serverless environments faces unique challenges. Backend resources are ephemeral and auto-scaling is handled by the platform.

Kubernetes uses its own load balancing through Services. They provide stable endpoints for pods that may constantly be created and destroyed. The key insight is that load balancing achieves reliability, performance, and resilience in complex systems where failures are inevitable.

Why Flashcards are Ideal for Learning Load Balancing

Load balancing involves numerous algorithms, architectural patterns, and technical concepts. Spaced repetition and active recall strengthen retention of these distinctions.

Active Recall Strengthens Memory

Flashcards force you to retrieve information from memory. This strengthens neural connections far better than passive reading. When studying load balancing, you need to recall which algorithm works best for specific scenarios and explain each algorithm's strengths and weaknesses.

Traditional study methods like textbooks may not lock these distinctions into memory. Flashcards make the difference concrete and memorable.

Creating Cards Deepens Understanding

Creating flashcards encourages you to extract the most important information from your notes. This promotes deeper understanding.

For example, you might create a card asking "What is Round Robin load balancing?" with the answer describing sequential distribution. Another card might ask "When should you use Sticky Sessions?" with an answer explaining stateful applications.

Pattern Recognition and Terminology

The visual organization of flashcards helps you see patterns. You notice how different algorithms make different tradeoffs between simplicity and optimization.

Flashcards also work well for learning terminology and jargon: sticky sessions, health checks, failover, and session persistence.

Consistent Review and Mobile Learning

Regular review using spaced repetition ensures you maintain knowledge over time. This is crucial for interview preparation or exam success.

Flashcards support mobile learning, allowing you to study anywhere. This makes consistent review easier to maintain. By using flashcards systematically, you transform load balancing from abstract concepts into concrete, recalled knowledge ready for technical interviews or real engineering work.

Start Studying Load Balancing Strategies

Master algorithms, architectures, and implementation patterns with interactive flashcards designed for computer science students and engineering professionals.

Create Free Flashcards

Frequently Asked Questions

What is the difference between Round Robin and Weighted Round Robin?

Round Robin sends requests to each backend server in sequence, treating all servers equally. Every server gets the same number of requests in fair rotation.

Weighted Round Robin extends this by assigning weights to each server based on its capacity or processing power. Servers with higher weights receive proportionally more requests.

For example, three servers weighted 1, 2, and 3 would handle one-sixth, one-third, and one-half of traffic respectively. This makes Weighted Round Robin ideal when you have servers of different capabilities.

However, neither algorithm accounts for current server load or health status. Least Connections is more appropriate for scenarios where server conditions change dynamically.

How does sticky session (session persistence) work in load balancing?

Sticky sessions ensure that requests from the same client consistently route to the same backend server. This is crucial for stateful applications that store user session information in server memory.

The load balancer uses the client's IP address, a session cookie, or HTTP header information to identify returning clients. On the first request, the load balancer sends the client to a server and records this mapping. Subsequent requests go to the same server.

Without sticky sessions, a user's second request might go to a different server lacking their session data. They would appear logged out or lose their shopping cart.

However, sticky sessions reduce load balancing effectiveness. If one server fails, all clients with sessions on that server are disconnected. They also complicate horizontal scaling. Modern architectures often store session data externally in databases or cache systems like Redis instead.

What are health checks and why are they critical in load balancing?

Health checks are automated tests that load balancers periodically run against backend servers. They verify servers are functioning properly and ready to receive traffic.

A load balancer might send HTTP requests to an endpoint like /health and expect a successful response code. If a server fails to respond or returns an error code, the load balancer marks it as unhealthy and stops sending traffic to it.

Without health checks, a load balancer might continue routing requests to a failed server. This causes user requests to hang or fail. Health checks enable automatic failover. When a server goes down, it's immediately removed from the pool and clients are redirected to healthy servers.

When the failed server recovers, health checks detect this and restore it to the pool. Different health check strategies exist: simple TCP connection checks, HTTP checks, or application-specific checks that verify database connectivity.

How does load balancing handle SSL/TLS encryption?

SSL/TLS termination refers to the load balancer handling encryption and decryption of HTTPS traffic. It acts as a barrier between clients and backend servers.

Clients establish an SSL/TLS connection with the load balancer, not directly with backend servers. The load balancer decrypts incoming encrypted traffic, reads the request content, makes load balancing decisions, and forwards the request to a backend server either via HTTP or a fresh HTTPS connection.

This approach offers several advantages: it offloads expensive encryption and decryption work from backend servers, allowing them to focus on application logic. It also centralizes SSL certificate management in one place rather than requiring certificates on every server.

SSL passthrough is an alternative where the load balancer forwards encrypted traffic without decrypting it. This is useful when backend servers need to handle their own encryption or when you want end-to-end encryption. However, it prevents layer 7 load balancing decisions based on request content.

What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 load balancing operates at the transport layer of the OSI model. It makes decisions based on TCP/UDP information like IP addresses and ports. A Layer 4 load balancer examines the source IP, destination IP, source port, and destination port to route traffic.

It does not inspect actual request content. This makes it very fast and efficient since it doesn't need to parse application-level data. It works well for simple distribution and is commonly used for non-HTTP protocols.

Layer 7 load balancing operates at the application layer. It understands HTTP/HTTPS and makes decisions based on request content like URLs, hostnames, HTTP headers, and cookies.

A Layer 7 load balancer can route all requests for /api/v1 to one set of servers and requests for /images to another set. Or it can route based on the Host header to different backend services. This intelligence enables sophisticated routing patterns but adds latency since the load balancer must fully parse each request.

Layer 7 is ideal for microservices architectures where different services handle different functions. Choose Layer 4 for simple, high-throughput scenarios and Layer 7 when you need intelligent, content-aware routing decisions.