| # Considerations around Event Loops |
| |
| Much of the software we use is written around an **event loop**. Some examples |
| |
| - Chrome / Chromium, transmission, tmux, ntp SNTP... [libevent](https://libevent.org/) |
| - node.js / cjdns / Julia / cmake ... [libuv](https://archive.is/64pOt) |
| - Gstreamer, Gnome / GTK apps ... [glib](https://people.gnome.org/~desrt/glib-docs/glib-The-Main-Event-Loop.html) |
| - SystemD ... sdevent |
| - OpenWRT ... uloop |
| |
| Many applications roll their own event loop using poll() or epoll() or similar, |
| using the same techniques. Another set of apps use message dispatchers that |
| take the same approach, but are for cases that don't need to support sockets. |
| Event libraries provide crossplatform abstractions for this functoinality, and |
| provide the best backend for their event waits on the platform automagically. |
| |
| libwebsockets networking operations require an event loop, it provides a default |
| one for the platform (based on poll() for Unix) if needed, but also can natively |
| use any of the event loop libraries listed above, including "foreign" loops |
| already created and managed by the application. |
| |
| ## What is an 'event loop'? |
| |
| Event loops have the following characteristics: |
| |
| - they have a **single thread**, therefore they do not require locking |
| - they are **not threadsafe** |
| - they require **nonblocking IO** |
| - they **sleep** while there are no events (aka the "event wait") |
| - if one or more event seen, they call back into user code to handle each in |
| turn and then return to the wait (ie, "loop") |
| |
| ### They have a single thread |
| |
| By doing everything in turn on a single thread, there can be no possibility of |
| conflicting access to resources from different threads... if the single thread |
| is in callback A, it cannot be in two places at the same time and also in |
| callback B accessing the same thing: it can never run any other code |
| concurrently, only sequentially, by design. |
| |
| It means that all mutexes and other synchronization and locking can be |
| eliminated, along with the many kinds of bugs related to them. |
| |
| ### They are not threadsafe |
| |
| Event loops mandate doing everything in a single thread. You cannot call their |
| apis from other threads, since there is no protection against reentrancy. |
| |
| Lws apis cannot be called safely from any thread other than the event loop one, |
| with the sole exception of `lws_cancel_service()`. |
| |
| ### They have nonblocking IO |
| |
| With blocking IO, you have to create threads in order to block them to learn |
| when your IO could proceed. In an event loop, all descriptors are set to use |
| nonblocking mode, we only attempt to read or write when we have been informed by |
| an event that there is something to read, or it is possible to write. |
| |
| So sacrificial, blocking discrete IO threads are also eliminated, we just do |
| what we should do sequentially, when we get the event indicating that we should |
| do it. |
| |
| ### They sleep while there are no events |
| |
| An OS "wait" of some kind is used to sleep the event loop thread until something |
| to do. There's an explicit wait on file descriptors that have pending read or |
| write, and also an implicit wait for the next scheduled event. Even if idle for |
| descriptor events, the event loop will wake and handle scheduled events at the |
| right time. |
| |
| In an idle system, the event loop stays in the wait and takes 0% CPU. |
| |
| ### If one or more event, they handle them and then return to sleep |
| |
| As you can expect from "event loop", it is an infinite loop alternating between |
| sleeping in the event wait and sequentially servicing pending events, by calling |
| callbacks for each event on each object. |
| |
| The callbacks handle the event and then "return to the event loop". The state |
| of things in the loop itself is guaranteed to stay consistent while in a user |
| callback, until you return from the callback to the event loop, when socket |
| closes may be processed and lead to object destruction. |
| |
| Event libraries like libevent are operating the same way, once you start the |
| event loop, it sits in an inifinite loop in the library, calling back on events |
| until you "stop" or "break" the loop by calling apis. |
| |
| ## Why are event libraries popular? |
| |
| Developers prefer an external library solution for the event loop because: |
| |
| - the quality is generally higher than self-rolled ones. Someone else is |
| maintaining it, a fulltime team in some cases. |
| - the event libraries are crossplatform, they will pick the most effective |
| event wait for the platform without the developer having to know the details. |
| For example most libs can conceal whether the platform is windows or unix, |
| and use native waits like epoll() or WSA accordingly. |
| - If your application uses a event library, it is possible to integrate very |
| cleanly with other libraries like lws that can use the same event library. |
| That is extremely messy or downright impossible to do with hand-rolled loops. |
| |
| Compared to just throwing threads on it |
| |
| - thread lifecycle has to be closely managed, threads must start and must be |
| brought to an end in a controlled way. Event loops may end and destroy |
| objects they control at any time a callback returns to the event loop. |
| |
| - threads may do things sequentially or genuinely concurrently, this requires |
| locking and careful management so only deterministic and expected things |
| happen at the user data. |
| |
| - threads do not scale well to, eg, serving tens of thousands of connections; |
| web servers use event loops. |
| |
| ## Multiple codebases cooperating on one event loop |
| |
| The ideal situation is all your code operates via a single event loop thread. |
| For lws-only code, including lws_protocols callbacks, this is the normal state |
| of affairs. |
| |
| When there is other code that also needs to handle events, say already existing |
| application code, or code handling a protocol not supported by lws, there are a |
| few options to allow them to work together, which is "best" depends on the |
| details of what you're trying to do and what the existing code looks like. |
| In descending order of desirability: |
| |
| ### 1) Use a common event library for both lws and application code |
| |
| This is the best choice for Linux-class devices. If you write your application |
| to use, eg, a libevent loop, then you only need to configure lws to also use |
| your libevent loop for them to be able to interoperate perfectly. Lws will |
| operate as a guest on this "foreign loop", and can cleanly create and destroy |
| its context on the loop without disturbing the loop. |
| |
| In addition, your application can merge and interoperate with any other |
| libevent-capable libraries the same way, and compared to hand-rolled loops, the |
| quality will be higher. |
| |
| ### 2) Use lws native wsi semantics in the other code too |
| |
| Lws supports raw sockets and file fd abstractions inside the event loop. So if |
| your other code fits into that model, one way is to express your connections as |
| "RAW" wsis and handle them using lws_protocols callback semantics. |
| |
| This ties the application code to lws, but it has the advantage that the |
| resulting code is aware of the underlying event loop implementation and will |
| work no matter what it is. |
| |
| ### 3) Make a custom lws event lib shim for your custom loop |
| |
| Lws provides an ops struct abstraction in order to integrate with event |
| libraries, you can find it in ./includes/libwebsockets/lws-eventlib-exports.h. |
| |
| Lws uses this interface to implement its own event library plugins, but you can |
| also use it to make your own customized event loop shim, in the case there is |
| too much written for your custom event loop to be practical to change it. |
| |
| In other words this is a way to write a customized event lib "plugin" and tell |
| the lws_context to use it at creation time. See [minimal-http-server.c](https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/http-server/minimal-http-server-eventlib-custom/minimal-http-server.c) |
| |
| ### 4) Cooperate at thread level |
| |
| This is less desirable because it gives up on unifying the code to run from a |
| single thread, it means the codebases cannot call each other's apis directly. |
| |
| In this scheme the existing threads do their own thing, lock a shared |
| area of memory and list what they want done from the lws thread context, before |
| calling `lws_cancel_service()` to break the lws event wait. Lws will then |
| broadcast a `LWS_CALLBACK_EVENT_WAIT_CANCELLED` protocol callback, the handler |
| for which can lock the shared area and perform the requested operations from the |
| lws thread context. |
| |
| ### 5) Glue the loops together to wait sequentially (don't do this) |
| |
| If you have two or more chunks of code with their own waits, it may be tempting |
| to have them wait sequentially in an outer event loop. (This is only possible |
| with the lws default loop and not the event library support, event libraries |
| have this loop inside their own `...run(loop)` apis.) |
| |
| ``` |
| while (1) { |
| do_lws_wait(); /* interrupted at short intervals */ |
| do_app_wait(); /* interrupted at short intervals */ |
| } |
| ``` |
| |
| This never works well, either: |
| |
| - the whole thing spins at 100% CPU when idle, or |
| |
| - the waits have timeouts where they sleep for short periods, but then the |
| latency to service on set of events is increased by the idle timeout period |
| of the wait for other set of events |
| |
| ## Common Misunderstandings |
| |
| ### "Real Men Use Threads" |
| |
| Sometimes you need threads or child processes. But typically, whatever you're |
| trying to do does not literally require threads. Threads are an architectural |
| choice that can go either way depending on the goal and the constraints. |
| |
| Any thread you add should have a clear reason to specifically be a thread and |
| not done on the event loop, without a new thread or the consequent locking (and |
| bugs). |
| |
| ### But blocking IO is faster and simpler |
| |
| No, blocking IO has a lot of costs to conceal the event wait by blocking. |
| |
| For any IO that may wait, you must spawn an IO thread for it, purely to handle |
| the situation you get blocked in read() or write() for an arbitrary amount of |
| time. It buys you a simple story in one place, that you will proceed on the |
| thread if read() or write() has completed, but costs threads and locking to get |
| to that. |
| |
| Event loops dispense with the threads and locking, and still provide a simple |
| story, you will get called back when data arrives or you may send. |
| |
| Event loops can scale much better, a busy server with 50,000 connections active |
| does not have to pay the overhead of 50,000 threads and their competing for |
| locking. |
| |
| With blocked threads, the thread can do no useful work at all while it is stuck |
| waiting. With event loops the thread can service other events until something |
| happens on the fd. |
| |
| ### Threads are inexpensive |
| |
| In the cases you really need threads, you must have them, or fork off another |
| process. But if you don't really need them, they bring with them a lot of |
| expense, some you may only notice when your code runs on constrained targets |
| |
| - threads have an OS-side footprint both as objects and in the scheduler |
| |
| - thread context switches are not slow on modern CPUs, but have side effects |
| like cache flushing |
| |
| - threads are designed to be blocked for arbitrary amounts of time if you use |
| blocking IO apis like write() or read(). Then how much concurrency is really |
| happening? Since blocked threads just go away silently, it is hard to know |
| when in fact your thread is almost always blocked and not doing useful work. |
| |
| - threads require their own stack, which is on embedded is typically suffering |
| from a dedicated worst-case allocation where the headroom is usually idle |
| |
| - locking must be handled, and missed locking or lock order bugs found |
| |
| ### But... what about latency if only one thing happens at a time? |
| |
| - Typically, at CPU speeds, nothing is happening at any given time on most |
| systems, the event loop is spending most of its time in the event wait |
| asleep at 0% cpu. |
| |
| - The POSIX sockets layer is disjoint from the actual network device driver. |
| It means that once you hand off the packet to the networking stack, the POSIX |
| api just returns and leaves the rest of the scheduling, retries etc to the |
| networking stack and device, descriptor queuing is driven by interrupts in |
| the driver part completely unaffected by the event loop part. |
| |
| - Passing data around via POSIX apis between the user code and the networking |
| stack tends to return almost immediately since its onward path is managed |
| later in another, usually interrupt, context. |
| |
| - So long as enough packets-worth of data are in the network stack ready to be |
| handed to descriptors, actual throughput is completely insensitive to jitter |
| or latency at the application event loop |
| |
| - The network device itself is inherently serializing packets, it can only send |
| one thing at a time. The networking stack locking also introduces hidden |
| serialization by blocking multiple threads. |
| |
| - Many user systems are decoupled like the network stack and POSIX... the user |
| event loop and its latencies do not affect backend processes occurring in |
| interrupt or internal thread or other process contexts |
| |
| ## Conclusion |
| |
| Event loops have been around for a very long time and are in wide use today due |
| to their advantages. Working with them successfully requires understand how to |
| use them and why they have the advantages and restrictions they do. |
| |
| The best results come from all the participants joining the same loop directly. |
| Using a common event library in the participating codebases allows completely |
| different code can call each other's apis safely without locking. |