doc/load-balancing.md - platform/external/grpc-grpc - Git at Google

 Load Balancing in gRPC
 ======================

 # Scope

 This document explains the design for load balancing within gRPC.

 # Background

 ## Per-Call Load Balancing

 It is worth noting that load-balancing within gRPC happens on a per-call
 basis, not a per-connection basis.  In other words, even if all requests
 come from a single client, we still want them to be load-balanced across
 all servers.

 ## Approaches to Load Balancing

 Prior to any gRPC specifics, we explore some usual ways to approach load
 balancing.

 ### Proxy Model

 Using a proxy provides a solid trustable client that can report load to the load
 balancing system. Proxies typically require more resources to operate since they
 have temporary copies of the RPC request and response. This model also increases
 latency to the RPCs.

 The proxy model was deemed inefficient when considering request heavy services
 like storage.

 ### Balancing-aware Client

 This thicker client places more of the load balancing logic in the client. For
 example, the client could contain many load balancing policies (Round Robin,
 Random, etc) used to select servers from a list. In this model, a list of
 servers would be either statically configured in the client, provided by the
 name resolution system, an external load balancer, etc. In any case, the client
 is responsible for choosing the preferred server from the list.

 One of the drawbacks of this approach is writing and maintaining the load
 balancing policies in multiple languages and/or versions of the clients. These
 policies can be fairly complicated. Some of the algorithms also require client
 to server communication so the client would need to get thicker to support
 additional RPCs to get health or load information in addition to sending RPCs
 for user requests.

 It would also significantly complicate the client's code: the new design hides
 the load balancing complexity of multiple layers and presents it as a simple
 list of servers to the client.

 ### External Load Balancing Service

 The client load balancing code is kept simple and portable, implementing
 well-known algorithms (e.g., Round Robin) for server selection.
 Complex load balancing algorithms are instead provided by the load
 balancer. The client relies on the load balancer to provide _load
 balancing configuration_ and _the list of servers_ to which the client
 should send requests. The balancer updates the server list as needed
 to balance the load as well as handle server unavailability or health
 issues. The load balancer will make any necessary complex decisions and
 inform the client. The load balancer may communicate with the backend
 servers to collect load and health information.

 # Requirements

 ## Simple API and client

 The gRPC client load balancing code must be simple and portable. The
 client should only contain simple algorithms (e.g., Round Robin) for
 server selection.  For complex algorithms, the client should rely on
 a load balancer to provide load balancing configuration and the list of
 servers to which the client should send requests. The balancer will update
 the server list as needed to balance the load as well as handle server
 unavailability or health issues. The load balancer will make any necessary
 complex decisions and inform the client. The load balancer may communicate
 with the backend servers to collect load and health information.

 ## Security

 The load balancer may be separate from the actual server backends and a
 compromise of the load balancer should only lead to a compromise of the
 loadbalancing functionality. In other words, a compromised load balancer should
 not be able to cause a client to trust a (potentially malicious) backend server
 any more than in a comparable situation without loadbalancing.

 # Architecture

 ## Overview

 The primary mechanism for load-balancing in gRPC is external
 load-balancing, where an external load balancer provides simple clients
 with an up-to-date list of servers.

 The gRPC client does support an API for built-in load balancing policies.
 However, there are only a small number of these (one of which is the
 `grpclb` policy, which implements external load balancing), and users
 are discouraged from trying to extend gRPC by adding more.  Instead, new
 load balancing policies should be implemented in external load balancers.

 ## Workflow

 Load-balancing policies fit into the gRPC client workflow in between
 name resolution and the connection to the server.  Here's how it all
 works:

 ![image](images/load-balancing.png)

 1. On startup, the gRPC client issues a [name resolution](naming.md) request
    for the server name.  The name will resolve to one or more IP addresses,
    each of which will indicate whether it is a server address or
    a load balancer address, and a [service config](service_config.md)
    that indicates which client-side load-balancing policy to use (e.g.,
    `round_robin` or `grpclb`).
 2. The client instantiates the load balancing policy.
    - Note: If any one of the addresses returned by the resolver is a balancer
      address, then the client will use the `grpclb` policy, regardless
      of what load-balancing policy was requested by the service config.
      Otherwise, the client will use the load-balancing policy requested
      by the service config.  If no load-balancing policy is requested
      by the service config, then the client will default to a policy
      that picks the first available server address.
 3. The load balancing policy creates a subchannel to each server address.
    - For all policies *except* `grpclb`, this means one subchannel for each
      address returned by the resolver. Note that these policies
      ignore any balancer addresses returned by the resolver.
    - In the case of the `grpclb` policy, the workflow is as follows:
      1. The policy opens a stream to one of the balancer addresses returned
         by the resolver. It asks the balancer for the server addresses to
         use for the server name originally requested by the client (i.e.,
         the same one originally passed to the name resolver).
         - Note: In the `grpclb` policy, the non-balancer addresses returned
           by the resolver are used as a fallback in case no balancers can be
           contacted when the LB policy is started.
      2. The gRPC servers to which the load balancer is directing the client
         may report load to the load balancers, if that information is needed
         by the load balancer's configuration.
      3. The load balancer returns a server list to the gRPC client's `grpclb`
         policy. The `grpclb` policy will then create a subchannel to each of
         server in the list.
 4. For each RPC sent, the load balancing policy decides which
    subchannel (i.e., which server) the RPC should be sent to.
    - In the case of the `grpclb` policy, the client will send requests
      to the servers in the order in which they were returned by the load
      balancer.  If the server list is empty, the call will block until a
      non-empty one is received.
	Load Balancing in gRPC
	======================

	# Scope

	This document explains the design for load balancing within gRPC.

	# Background

	## Per-Call Load Balancing

	It is worth noting that load-balancing within gRPC happens on a per-call
	basis, not a per-connection basis. In other words, even if all requests
	come from a single client, we still want them to be load-balanced across
	all servers.

	## Approaches to Load Balancing

	Prior to any gRPC specifics, we explore some usual ways to approach load
	balancing.

	### Proxy Model

	Using a proxy provides a solid trustable client that can report load to the load
	balancing system. Proxies typically require more resources to operate since they
	have temporary copies of the RPC request and response. This model also increases
	latency to the RPCs.

	The proxy model was deemed inefficient when considering request heavy services
	like storage.

	### Balancing-aware Client

	This thicker client places more of the load balancing logic in the client. For
	example, the client could contain many load balancing policies (Round Robin,
	Random, etc) used to select servers from a list. In this model, a list of
	servers would be either statically configured in the client, provided by the
	name resolution system, an external load balancer, etc. In any case, the client
	is responsible for choosing the preferred server from the list.

	One of the drawbacks of this approach is writing and maintaining the load
	balancing policies in multiple languages and/or versions of the clients. These
	policies can be fairly complicated. Some of the algorithms also require client
	to server communication so the client would need to get thicker to support
	additional RPCs to get health or load information in addition to sending RPCs
	for user requests.

	It would also significantly complicate the client's code: the new design hides
	the load balancing complexity of multiple layers and presents it as a simple
	list of servers to the client.

	### External Load Balancing Service

	The client load balancing code is kept simple and portable, implementing
	well-known algorithms (e.g., Round Robin) for server selection.
	Complex load balancing algorithms are instead provided by the load
	balancer. The client relies on the load balancer to provide _load
	balancing configuration_ and _the list of servers_ to which the client
	should send requests. The balancer updates the server list as needed
	to balance the load as well as handle server unavailability or health
	issues. The load balancer will make any necessary complex decisions and
	inform the client. The load balancer may communicate with the backend
	servers to collect load and health information.

	# Requirements

	## Simple API and client

	The gRPC client load balancing code must be simple and portable. The
	client should only contain simple algorithms (e.g., Round Robin) for
	server selection. For complex algorithms, the client should rely on
	a load balancer to provide load balancing configuration and the list of
	servers to which the client should send requests. The balancer will update
	the server list as needed to balance the load as well as handle server
	unavailability or health issues. The load balancer will make any necessary
	complex decisions and inform the client. The load balancer may communicate
	with the backend servers to collect load and health information.

	## Security

	The load balancer may be separate from the actual server backends and a
	compromise of the load balancer should only lead to a compromise of the
	loadbalancing functionality. In other words, a compromised load balancer should
	not be able to cause a client to trust a (potentially malicious) backend server
	any more than in a comparable situation without loadbalancing.

	# Architecture

	## Overview

	The primary mechanism for load-balancing in gRPC is external
	load-balancing, where an external load balancer provides simple clients
	with an up-to-date list of servers.

	The gRPC client does support an API for built-in load balancing policies.
	However, there are only a small number of these (one of which is the
	`grpclb` policy, which implements external load balancing), and users
	are discouraged from trying to extend gRPC by adding more. Instead, new
	load balancing policies should be implemented in external load balancers.

	## Workflow

	Load-balancing policies fit into the gRPC client workflow in between
	name resolution and the connection to the server. Here's how it all
	works:

	![image](images/load-balancing.png)

	1. On startup, the gRPC client issues a [name resolution](naming.md) request
	for the server name. The name will resolve to one or more IP addresses,
	each of which will indicate whether it is a server address or
	a load balancer address, and a [service config](service_config.md)
	that indicates which client-side load-balancing policy to use (e.g.,
	`round_robin` or `grpclb`).
	2. The client instantiates the load balancing policy.
	- Note: If any one of the addresses returned by the resolver is a balancer
	address, then the client will use the `grpclb` policy, regardless
	of what load-balancing policy was requested by the service config.
	Otherwise, the client will use the load-balancing policy requested
	by the service config. If no load-balancing policy is requested
	by the service config, then the client will default to a policy
	that picks the first available server address.
	3. The load balancing policy creates a subchannel to each server address.
	- For all policies except `grpclb`, this means one subchannel for each
	address returned by the resolver. Note that these policies
	ignore any balancer addresses returned by the resolver.
	- In the case of the `grpclb` policy, the workflow is as follows:
	1. The policy opens a stream to one of the balancer addresses returned
	by the resolver. It asks the balancer for the server addresses to
	use for the server name originally requested by the client (i.e.,
	the same one originally passed to the name resolver).
	- Note: In the `grpclb` policy, the non-balancer addresses returned
	by the resolver are used as a fallback in case no balancers can be
	contacted when the LB policy is started.
	2. The gRPC servers to which the load balancer is directing the client
	may report load to the load balancers, if that information is needed
	by the load balancer's configuration.
	3. The load balancer returns a server list to the gRPC client's `grpclb`
	policy. The `grpclb` policy will then create a subchannel to each of
	server in the list.
	4. For each RPC sent, the load balancing policy decides which
	subchannel (i.e., which server) the RPC should be sent to.
	- In the case of the `grpclb` policy, the client will send requests
	to the servers in the order in which they were returned by the load
	balancer. If the server list is empty, the call will block until a
	non-empty one is received.