xref: /openssl/doc/designs/quic-design/quic-fc.md (revision 508e087c)
1Flow Control
4Introduction to QUIC Flow Control
7QUIC flow control acts at both connection and stream levels. At any time,
8transmission of stream data could be prevented by connection-level flow control,
9by stream-level flow control, or both. Flow control uses a credit-based model in
10which the relevant flow control limit is expressed as the maximum number of
11bytes allowed to be sent on a stream, or across all streams, since the beginning
12of the stream or connection. This limit may be periodically bumped.
14It is important to note that both connection and stream-level flow control
15relate only to the transmission of QUIC stream data. QUIC flow control at stream
16level counts the total number of logical bytes sent on a given stream. Note that
17this does not count retransmissions; thus, if a byte is sent, lost, and sent
18again, this still only counts as one byte for the purposes of flow control. Note
19that the total number of logical bytes sent on a given stream is equivalent to
20the current “length” of the stream. In essence, the relevant quantity is
21`max(offset + len)` for all STREAM frames `(offset, len)` we have ever sent for
22the stream.
24(It is essential that this be determined correctly, as deadlock may occur if we
25believe we have exhausted our flow control credit whereas the peer believes we
26have not, as the peer may wait indefinitely for us to send more data before
27advancing us more flow control credit.)
29QUIC flow control at connection level is based on the sum of all the logical
30bytes transmitted across all streams since the start of the connection.
32Connection-level flow control is controlled by the `MAX_DATA` frame;
33stream-level flow control is controlled by the `MAX_STREAM_DATA` frame.
35The `DATA_BLOCKED` and `STREAM_DATA_BLOCKED` frames defined by RFC 9000 are less
36important than they first appear, as peers are not allowed to rely on them. (For
37example, a peer is not allowed to wait until we send `DATA_BLOCKED` to increase
38our connection-level credit, and a conformant QUIC implementation can choose to
39never generate either of these frame types.) These frames rather serve two
40purposes: to enhance flow control performance, and as a debugging aid.
41However, their implementation is not critical.
43Note that it follows from the above that the CRYPTO-frame stream is not subject
44to flow control.
46Note that flow control and congestion control are completely separate
47mechanisms. In a given circumstance, either or both mechanisms may restrict our
48ability to transmit application data.
50Consider the following diagram:
52    RWM   SWM           SWM'   CWM         CWM'
53     |     |             |      |           |
54     |     |<--    credit|   -->|           |
55     |   <-|- threshold -|----->|           |
56                          ----------------->
57                                 window size
59We introduce the following terminology:
61- **Controlled bytes** refers to any byte which counts for purposes of flow
62  control. A controlled byte is any byte of application data in a STREAM frame
63  payload, the first time it is sent (retransmissions do not count).
65- (RX side only) **Retirement**, which refers to where we dequeue one or more
66  controlled bytes from a QUIC stream and hand them to the application, meaning
67  we are no longer responsible for them.
69  Retirement is an important factor in our RX flow control design, as we want
70  peers to transmit not just at the rate that our QUIC implementation can
71  process incoming data, but also at a rate the application can handle.
73- (RX side only) The **Retired Watermark** (RWM), the total number of retired
74  controlled bytes since the beginning of the connection or stream.
76- The **Spent Watermark** (SWM), which is the number of controlled bytes we have
77  sent (for the TX side) or received (for the RX side). This represents the
78  amount of flow control budget which has been spent. It is a monotonic value
79  and never decreases. On the RX side, such bytes have not necessarily been
80  retired yet.
82- The **Credit Watermark** (CWM), which is the number of bytes which have
83  been authorized for transmission so far. This count is a cumulative count
84  since the start of the connection or stream and thus is also monotonic.
86- The available **credit**, which is always simply the difference between
87  the SWM and the CWM.
89- (RX side only) The **threshold**, which is how close we let the RWM
90  get to the CWM before we choose to extend the peer more credit by bumping the
91  CWM. The threshold is relative to (i.e., subtracted from) the CWM.
93- (RX side only) The **window size**, which is the amount by which we or a peer
94  choose to bump the CWM each time, as we reach or exceed the threshold. The new
95  CWM is calculated as the SWM plus the window size (note that it added to the
96  SWM, not the old CWM.)
98Note that:
100- If the available credit is zero, the TX side is blocked due to a lack of
101  credit.
103- If any circumstance occurs which would cause the SWM to exceed the CWM,
104  a flow control protocol violation has occurred and the connection
105  should be terminated.
107Connection-Level Flow Control - TX Side
110TX side flow control is exceptionally simple. It can be modelled as the
111following state machine:
113        ---> event: On TX (numBytes)
114        ---> event: On TX Window Updated (numBytes)
115        <--- event: On TX Blocked
116        Get TX Window() -> numBytes
118The On TX event is passed to the state machine whenever we send a packet.
119`numBytes` is the total number of controlled bytes we sent in the packet (i.e.,
120the number of bytes of STREAM frame payload which are not retransmissions). This
121value is added to the TX-side SWM value. Note that this may be zero, though
122there is no need to pass the event in this case.
124The On TX Window Updated event is passed to the state machine whenever we have
125our CWM increased. In other words, it is passed whenever we receive a `MAX_DATA`
126frame, with the integer value contained in that frame (or when we receive the
127`initial_max_data` transport parameter).
129The On TX Window Updated event expresses the CWM (that is, the cumulative
130number of controlled bytes we are allowed to send since the start of the
131connection), thus it is monotonic and may never regress. If an On TX Window
132Update event is passed to the state machine with a value lower than that passed
133in any previous such event, it indicates a peer protocol error or a local
134programming error.
136The Get TX Window function returns our credit value (that is, it returns the
137number of controlled bytes we are allowed to send). This value is reduced by the
138On TX event and increased by the On TX Window Updated event. In fact, it is
139simply the difference between the last On TX Window Updated value and the sum of
140the `numBytes` arguments of all On TX events so far; it is that simple.
142The On TX Blocked event is emitted at the time of any edge transition where the
143value which would be returned by the Get TX Window function changes from
144non-zero to zero. This always occurs during processing of an On TX event. (This
145event is intended to assist in deciding when to generate `DATA_BLOCKED`
148We must not exceed the flow control limits, else the peer may terminate the
149connection with an error.
151An initial connection-level credit is communicated by the peer in the
152`initial_max_data` transport parameter. All other credits occur as a result of a
153`MAX_DATA` frame.
155Stream-Level Flow Control - TX Side
158Stream-level flow control works exactly the same as connection-level flow
159control for the TX side.
161The On TX Window Updated event occurs in response to the `MAX_STREAM_DATA`
162frame, or based on the relevant transport parameter
163(`initial_max_stream_data_bidi_local`, `initial_max_stream_data_bidi_remote`,
166The On TX Blocked event can be used to decide when to generate
169Note that the number of controlled bytes we can send in a stream is limited by
170both connection and stream-level flow control; thus the number of controlled
171bytes we can send is the lesser value of the values returned by the Get TX
172Window function on the connection-level and stream-level state machines,
175Connection-Level Flow Control - RX Side
178        ---> event: On RX Controlled Bytes (numBytes)       [internal event]
179        ---> event: On Retire Controlled Bytes (numBytes)
180        <--- event: Increase Window (numBytes)
181        <--- event: Flow Control Error
183RX side connection-level flow control provides an indication of when to generate
184`MAX_DATA` frames to bump the peer's connection-level transmission credit. It is
185somewhat more involved than the TX side.
187The state machine receives On RX Controlled Bytes events from stream-level flow
188controllers. Callers do not pass the event themselves. The event is generated by
189a stream-level flow controller whenever we receive any controlled bytes.
190`numBytes` is the number of controlled bytes we received. (This event is
191generated by stream-level flow control as retransmitted stream data must be
192counted only once, and the stream-level flow control is therefore in the best
193position to determine how many controlled bytes (i.e., new, non-retransmitted
194stream payload bytes) have been received).
196If we receive more controlled bytes than we authorized, the state machine emits
197the Flow Control Error event. The connection should be terminated with a
198protocol error in this case.
200The state machine emits the Increase Window event when it thinks that the peer
201should be advanced more flow control credit (i.e., when the CWM should be
202bumped). `numBytes` is the new CWM value, and is monotonic with regard to all
203previous Increase Window events emitted by the state machine.
205The state machine is passed the On Retire Controlled bytes event when one or
206more controlled bytes are dequeued from any stream and passed to the
209The state machine uses the cadence of the On Retire Controlled Bytes events it
210receives to determine when to increase the flow control window. Thus, the On
211Retire Controlled Bytes event should be sent to the state machine when
212processing of the received controlled bytes has been *completed* (i.e., passed
213to the application).
215Stream-Level Flow Control - RX Side
218RX-side stream-level flow control works similarly to RX-side connection-level
219flow control. There are a few differences:
221- There is no On RX Controlled Bytes event.
223- The On Retire Controlled Bytes event may optionally pass the same event
224  to a connection-level flow controller (an implementation decision), as these
225  events should always occur at the same time.
227- An additional event is added, which replaces the On RX Controlled Bytes event:
229        ---> event: On RX Stream Frame (offsetPlusLength, isFin)
231  This event should be passed to the state machine when a STREAM frame is
232  received. The `offsetPlusLength` argument is the sum of the offset field of
233  the STREAM frame and the length of the frame's payload in bytes. The isFin
234  argument should specify whether the STREAM frame had the FIN flag set.
236  This event is used to generate the internal On RX Controlled Bytes event to
237  the connection-level flow controller. It is also used by stream-level flow
238  control to determine if flow control limits are violated by the peer.
240  The state machine handles `offsetPlusLength` monotonically and ignores the
241  event if a previous such event already had an equal or greater value. The
242  reason this event is used instead of a `On RX (numBytes)` style event is that
243  this API can be monotonic and thus easier to use (the caller does not need to
244  remember if they have already counted a specific controlled byte in a STREAM
245  frame, which may after all duplicate some of the controlled bytes in a
246  previous STREAM frame).
248RX Window Sizing
251For RX flow control we must determine our window size. This is the value we add
252to the peer's current SWM to determine the new CWM each time as RWM reaches the
253threshold. The window size should be adapted dynamically according to network
256Many implementations choose to have a mechanism for increasing the window size
257but not decreasing it, a simple approach which we adopt here.
259The common algorithm is a so-called auto-tuning approach in which the rate of
260window consumption (i.e., the rate at which RWM approaches CWM after CWM is
261bumped) is measured and compared to the measured connection RTT. If the time it
262takes to consume one window size exceeds a fixed multiple of the RTT, the window
263size is doubled, up to an implementation-chosen maximum window size.
265Auto-tuning occurs in 'epochs'. At the end of each auto-tuning epoch, a decision
266is made on whether to double the window size, and a new auto-tuning epoch is
269For more information on auto-tuning, see [Flow control in
271and [QUIC Flow