1Flow Control 2============ 3 4Introduction to QUIC Flow Control 5--------------------------------- 6 7QUIC flow control acts at both connection and stream levels. At any time, 8transmission of stream data could be prevented by connection-level flow control, 9by stream-level flow control, or both. Flow control uses a credit-based model in 10which the relevant flow control limit is expressed as the maximum number of 11bytes allowed to be sent on a stream, or across all streams, since the beginning 12of the stream or connection. This limit may be periodically bumped. 13 14It is important to note that both connection and stream-level flow control 15relate only to the transmission of QUIC stream data. QUIC flow control at stream 16level counts the total number of logical bytes sent on a given stream. Note that 17this does not count retransmissions; thus, if a byte is sent, lost, and sent 18again, this still only counts as one byte for the purposes of flow control. Note 19that the total number of logical bytes sent on a given stream is equivalent to 20the current “length” of the stream. In essence, the relevant quantity is 21`max(offset + len)` for all STREAM frames `(offset, len)` we have ever sent for 22the stream. 23 24(It is essential that this be determined correctly, as deadlock may occur if we 25believe we have exhausted our flow control credit whereas the peer believes we 26have not, as the peer may wait indefinitely for us to send more data before 27advancing us more flow control credit.) 28 29QUIC flow control at connection level is based on the sum of all the logical 30bytes transmitted across all streams since the start of the connection. 31 32Connection-level flow control is controlled by the `MAX_DATA` frame; 33stream-level flow control is controlled by the `MAX_STREAM_DATA` frame. 34 35The `DATA_BLOCKED` and `STREAM_DATA_BLOCKED` frames defined by RFC 9000 are less 36important than they first appear, as peers are not allowed to rely on them. (For 37example, a peer is not allowed to wait until we send `DATA_BLOCKED` to increase 38our connection-level credit, and a conformant QUIC implementation can choose to 39never generate either of these frame types.) These frames rather serve two 40purposes: to enhance flow control performance, and as a debugging aid. 41However, their implementation is not critical. 42 43Note that it follows from the above that the CRYPTO-frame stream is not subject 44to flow control. 45 46Note that flow control and congestion control are completely separate 47mechanisms. In a given circumstance, either or both mechanisms may restrict our 48ability to transmit application data. 49 50Consider the following diagram: 51 52 RWM SWM SWM' CWM CWM' 53 | | | | | 54 | |<-- credit| -->| | 55 | <-|- threshold -|----->| | 56 -----------------> 57 window size 58 59We introduce the following terminology: 60 61- **Controlled bytes** refers to any byte which counts for purposes of flow 62 control. A controlled byte is any byte of application data in a STREAM frame 63 payload, the first time it is sent (retransmissions do not count). 64 65- (RX side only) **Retirement**, which refers to where we dequeue one or more 66 controlled bytes from a QUIC stream and hand them to the application, meaning 67 we are no longer responsible for them. 68 69 Retirement is an important factor in our RX flow control design, as we want 70 peers to transmit not just at the rate that our QUIC implementation can 71 process incoming data, but also at a rate the application can handle. 72 73- (RX side only) The **Retired Watermark** (RWM), the total number of retired 74 controlled bytes since the beginning of the connection or stream. 75 76- The **Spent Watermark** (SWM), which is the number of controlled bytes we have 77 sent (for the TX side) or received (for the RX side). This represents the 78 amount of flow control budget which has been spent. It is a monotonic value 79 and never decreases. On the RX side, such bytes have not necessarily been 80 retired yet. 81 82- The **Credit Watermark** (CWM), which is the number of bytes which have 83 been authorized for transmission so far. This count is a cumulative count 84 since the start of the connection or stream and thus is also monotonic. 85 86- The available **credit**, which is always simply the difference between 87 the SWM and the CWM. 88 89- (RX side only) The **threshold**, which is how close we let the RWM 90 get to the CWM before we choose to extend the peer more credit by bumping the 91 CWM. The threshold is relative to (i.e., subtracted from) the CWM. 92 93- (RX side only) The **window size**, which is the amount by which we or a peer 94 choose to bump the CWM each time, as we reach or exceed the threshold. The new 95 CWM is calculated as the SWM plus the window size (note that it added to the 96 SWM, not the old CWM.) 97 98Note that: 99 100- If the available credit is zero, the TX side is blocked due to a lack of 101 credit. 102 103- If any circumstance occurs which would cause the SWM to exceed the CWM, 104 a flow control protocol violation has occurred and the connection 105 should be terminated. 106 107Connection-Level Flow Control - TX Side 108--------------------------------------- 109 110TX side flow control is exceptionally simple. It can be modelled as the 111following state machine: 112 113 ---> event: On TX (numBytes) 114 ---> event: On TX Window Updated (numBytes) 115 <--- event: On TX Blocked 116 Get TX Window() -> numBytes 117 118The On TX event is passed to the state machine whenever we send a packet. 119`numBytes` is the total number of controlled bytes we sent in the packet (i.e., 120the number of bytes of STREAM frame payload which are not retransmissions). This 121value is added to the TX-side SWM value. Note that this may be zero, though 122there is no need to pass the event in this case. 123 124The On TX Window Updated event is passed to the state machine whenever we have 125our CWM increased. In other words, it is passed whenever we receive a `MAX_DATA` 126frame, with the integer value contained in that frame (or when we receive the 127`initial_max_data` transport parameter). 128 129The On TX Window Updated event expresses the CWM (that is, the cumulative 130number of controlled bytes we are allowed to send since the start of the 131connection), thus it is monotonic and may never regress. If an On TX Window 132Update event is passed to the state machine with a value lower than that passed 133in any previous such event, it indicates a peer protocol error or a local 134programming error. 135 136The Get TX Window function returns our credit value (that is, it returns the 137number of controlled bytes we are allowed to send). This value is reduced by the 138On TX event and increased by the On TX Window Updated event. In fact, it is 139simply the difference between the last On TX Window Updated value and the sum of 140the `numBytes` arguments of all On TX events so far; it is that simple. 141 142The On TX Blocked event is emitted at the time of any edge transition where the 143value which would be returned by the Get TX Window function changes from 144non-zero to zero. This always occurs during processing of an On TX event. (This 145event is intended to assist in deciding when to generate `DATA_BLOCKED` 146frames.) 147 148We must not exceed the flow control limits, else the peer may terminate the 149connection with an error. 150 151An initial connection-level credit is communicated by the peer in the 152`initial_max_data` transport parameter. All other credits occur as a result of a 153`MAX_DATA` frame. 154 155Stream-Level Flow Control - TX Side 156----------------------------------- 157 158Stream-level flow control works exactly the same as connection-level flow 159control for the TX side. 160 161The On TX Window Updated event occurs in response to the `MAX_STREAM_DATA` 162frame, or based on the relevant transport parameter 163(`initial_max_stream_data_bidi_local`, `initial_max_stream_data_bidi_remote`, 164`initial_max_stream_data_uni`). 165 166The On TX Blocked event can be used to decide when to generate 167`STREAM_DATA_BLOCKED` frames. 168 169Note that the number of controlled bytes we can send in a stream is limited by 170both connection and stream-level flow control; thus the number of controlled 171bytes we can send is the lesser value of the values returned by the Get TX 172Window function on the connection-level and stream-level state machines, 173respectively. 174 175Connection-Level Flow Control - RX Side 176--------------------------------------- 177 178 ---> event: On RX Controlled Bytes (numBytes) [internal event] 179 ---> event: On Retire Controlled Bytes (numBytes) 180 <--- event: Increase Window (numBytes) 181 <--- event: Flow Control Error 182 183RX side connection-level flow control provides an indication of when to generate 184`MAX_DATA` frames to bump the peer's connection-level transmission credit. It is 185somewhat more involved than the TX side. 186 187The state machine receives On RX Controlled Bytes events from stream-level flow 188controllers. Callers do not pass the event themselves. The event is generated by 189a stream-level flow controller whenever we receive any controlled bytes. 190`numBytes` is the number of controlled bytes we received. (This event is 191generated by stream-level flow control as retransmitted stream data must be 192counted only once, and the stream-level flow control is therefore in the best 193position to determine how many controlled bytes (i.e., new, non-retransmitted 194stream payload bytes) have been received). 195 196If we receive more controlled bytes than we authorized, the state machine emits 197the Flow Control Error event. The connection should be terminated with a 198protocol error in this case. 199 200The state machine emits the Increase Window event when it thinks that the peer 201should be advanced more flow control credit (i.e., when the CWM should be 202bumped). `numBytes` is the new CWM value, and is monotonic with regard to all 203previous Increase Window events emitted by the state machine. 204 205The state machine is passed the On Retire Controlled bytes event when one or 206more controlled bytes are dequeued from any stream and passed to the 207application. 208 209The state machine uses the cadence of the On Retire Controlled Bytes events it 210receives to determine when to increase the flow control window. Thus, the On 211Retire Controlled Bytes event should be sent to the state machine when 212processing of the received controlled bytes has been *completed* (i.e., passed 213to the application). 214 215Stream-Level Flow Control - RX Side 216----------------------------------- 217 218RX-side stream-level flow control works similarly to RX-side connection-level 219flow control. There are a few differences: 220 221- There is no On RX Controlled Bytes event. 222 223- The On Retire Controlled Bytes event may optionally pass the same event 224 to a connection-level flow controller (an implementation decision), as these 225 events should always occur at the same time. 226 227- An additional event is added, which replaces the On RX Controlled Bytes event: 228 229 ---> event: On RX Stream Frame (offsetPlusLength, isFin) 230 231 This event should be passed to the state machine when a STREAM frame is 232 received. The `offsetPlusLength` argument is the sum of the offset field of 233 the STREAM frame and the length of the frame's payload in bytes. The isFin 234 argument should specify whether the STREAM frame had the FIN flag set. 235 236 This event is used to generate the internal On RX Controlled Bytes event to 237 the connection-level flow controller. It is also used by stream-level flow 238 control to determine if flow control limits are violated by the peer. 239 240 The state machine handles `offsetPlusLength` monotonically and ignores the 241 event if a previous such event already had an equal or greater value. The 242 reason this event is used instead of a `On RX (numBytes)` style event is that 243 this API can be monotonic and thus easier to use (the caller does not need to 244 remember if they have already counted a specific controlled byte in a STREAM 245 frame, which may after all duplicate some of the controlled bytes in a 246 previous STREAM frame). 247 248RX Window Sizing 249---------------- 250 251For RX flow control we must determine our window size. This is the value we add 252to the peer's current SWM to determine the new CWM each time as RWM reaches the 253threshold. The window size should be adapted dynamically according to network 254conditions. 255 256Many implementations choose to have a mechanism for increasing the window size 257but not decreasing it, a simple approach which we adopt here. 258 259The common algorithm is a so-called auto-tuning approach in which the rate of 260window consumption (i.e., the rate at which RWM approaches CWM after CWM is 261bumped) is measured and compared to the measured connection RTT. If the time it 262takes to consume one window size exceeds a fixed multiple of the RTT, the window 263size is doubled, up to an implementation-chosen maximum window size. 264 265Auto-tuning occurs in 'epochs'. At the end of each auto-tuning epoch, a decision 266is made on whether to double the window size, and a new auto-tuning epoch is 267started. 268 269For more information on auto-tuning, see [Flow control in 270QUIC](https://docs.google.com/document/d/1F2YfdDXKpy20WVKJueEf4abn_LVZHhMUMS5gX6Pgjl4/edit#heading=h.hcm2y5x4qmqt) 271and [QUIC Flow 272Control](https://docs.google.com/document/d/1SExkMmGiz8VYzV3s9E35JQlJ73vhzCekKkDi85F1qCE/edit#). 273