1Error handling in QUIC code 2=========================== 3 4Current situation with TLS 5-------------------------- 6 7The errors are put on the error stack (rather a queue but error stack is 8used throughout the code base) during the libssl API calls. In most 9(if not all) cases they should appear there only if the API call returns an 10error return value. The `SSL_get_error()` call depends on the stack being 11clean before the API call to be properly able to determine if the API 12call caused a library or system (I/O) error. 13 14The error stacks are thread-local. Libssl API calls from separate threads 15push errors to these separate error stacks. It is unusual to invoke libssl 16APIs with the same SSL object from different threads, but even if it happens, 17it is not a problem as applications are supposed to check for errors 18immediately after the API call on the same thread. There is no such thing as 19Thread-assisted mode of operation. 20 21Constraints 22----------- 23 24We need to keep using the existing ERR API as doing otherwise would 25complicate the existing applications and break our API compatibility promise. 26Even the ERR_STATE structure is public, although deprecated, and thus its 27structure and semantics cannot be changed. 28 29The error stack access is not under a lock (because it is thread-local). 30This complicates _moving errors between threads_. 31 32Error stack entries contain allocated data, copying entries between threads 33implies duplicating it or losing it. 34 35Assumptions 36----------- 37 38This document assumes the actual error state of the QUIC connection (or stream 39for stream level errors) is handled separately from the auxiliary error reason 40entries on the error stack. 41 42We can assume the internal assistance thread is well-behaving in regards 43to the error stack. 44 45We assume there are two types of errors that can be raised in the QUIC 46library calls and in the subordinate libcrypto (and provider) calls. First 47type is an intermittent error that does not really affect the state of the 48QUIC connection - for example EAGAIN returned on a syscall, or unavailability 49of some algorithm where there are other algorithms to try. Second type 50is a permanent error that affects the error state of the QUIC connection. 51Operations on QUIC streams (SSL_write(), SSL_read()) can also trigger errors, 52depending on their effect they are either permanent if they cause the 53QUIC connection to enter an error state, or if they just affect the stream 54they are left on the error stack of the thread that called SSL_write() 55or SSL_read() on the stream. 56 57Design 58------ 59 60Return value of SSL_get_error() on QUIC connections or streams does not 61depend on the error stack contents. 62 63Intermittent errors are handled within the library and cleared from the 64error stack before returning to the user. 65 66Permanent errors happening within the assist thread, within SSL_tick() 67processing, or when calling SSL_read()/SSL_write() on a stream need to be 68replicated for SSL_read()/SSL_write() calls on other streams. 69 70Implementation 71-------------- 72 73There is an error stack in QUIC_CHANNEL which serves as temporary storage 74for errors happening in the internal assistance thread. When a permanent error 75is detected the error stack entries are moved to this error stack in 76QUIC_CHANNEL. 77 78When returning to an application from an SSL_read()/SSL_write() call with 79a permanent connection error, entries from the QUIC_CHANNEL error stack 80are copied to the thread local error stack. They are always kept on 81the QUIC_CHANNEL error stack as well for possible further calls from 82an application. An additional error reason 83SSL_R_QUIC_CONNECTION_TERMINATED is added to the stack. 84 85SSL_tick() return value 86----------------------- 87 88The return value of SSL_tick() does not depend on whether there is 89a permanent error on the connection. The only case when SSL_tick() may 90return an error is when there was some fatal error processing it 91such as a memory allocation error where no further SSL_tick() calls 92make any sense. 93 94Multi-stream-multi-thread mode 95------------------------------ 96 97There is nothing particular that needs to be handled specially for 98multi-stream-multi-thread mode as the error stack entries are always 99copied from the QUIC_CHANNEL after the failure. So if multiple threads 100are calling SSL_read()/SSL_write() simultaneously they all get 101the same error stack entries to report to the user. 102