| GRPC Connection Backoff Protocol |
| ================================ |
| |
| When we do a connection to a backend which fails, it is typically desirable to |
| not retry immediately (to avoid flooding the network or the server with |
| requests) and instead do some form of exponential backoff. |
| |
| We have several parameters: |
| 1. INITIAL_BACKOFF (how long to wait after the first failure before retrying) |
| 1. MULTIPLIER (factor with which to multiply backoff after a failed retry) |
| 1. JITTER (by how much to randomize backoffs). |
| 1. MAX_BACKOFF (upper bound on backoff) |
| 1. MIN_CONNECT_TIMEOUT (minimum time we're willing to give a connection to |
| complete) |
| |
| ## Proposed Backoff Algorithm |
| |
| Exponentially back off the start time of connection attempts up to a limit of |
| MAX_BACKOFF, with jitter. |
| |
| ``` |
| ConnectWithBackoff() |
| current_backoff = INITIAL_BACKOFF |
| current_deadline = now() + INITIAL_BACKOFF |
| while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT)) |
| != SUCCESS) |
| SleepUntil(current_deadline) |
| current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF) |
| current_deadline = now() + current_backoff + |
| UniformRandom(-JITTER * current_backoff, JITTER * current_backoff) |
| |
| ``` |
| |
| With specific parameters of |
| MIN_CONNECT_TIMEOUT = 20 seconds |
| INITIAL_BACKOFF = 1 second |
| MULTIPLIER = 1.6 |
| MAX_BACKOFF = 120 seconds |
| JITTER = 0.2 |
| |
| Implementations with pressing concerns (such as minimizing the number of wakeups |
| on a mobile phone) may wish to use a different algorithm, and in particular |
| different jitter logic. |
| |
| Alternate implementations must ensure that connection backoffs started at the |
| same time disperse, and must not attempt connections substantially more often |
| than the above algorithm. |
| |
| ## Reset Backoff |
| |
| The back off should be reset to INITIAL_BACKOFF at some time point, so that the |
| reconnecting behavior is consistent no matter the connection is a newly started |
| one or a previously disconnected one. |
| |
| We choose to reset the Backoff when the SETTINGS frame is received, at that time |
| point, we know for sure that this connection was accepted by the server. |