The way to async I/O

Concurrency vs Parallelism

  • Concurrency: Do multiple tasks at the same time.

  • Parallelism: Do one task at a time, but the task can be splitted to multiple sub tasks which can be executed in parallel.

CPU intensive vs I/O intensive

If your program is not interacting with disks, media, devices, network and peripheries, then it is CPU intensive, otherwise it is I/O intensive.

The mode really effects the performance of your program. For example, if you want to know how many RPS my program can handle:

  • For CPU intensive: (Number of Cores) / time_to_complete_a_request_in_seconds

  • For I/O intensive: (RAM / worker memory) / time_to_complete_a_request_in_seconds

    In I/O intensive scenarios, CPU is doing nothing, so the performance is limited by how many workers are running, thus memory related.

Blocking vs Non-blocking

When a program spending most of the time dealing with I/O and not doing anything else, then it will be blocked by the I/O operations, thus CPU stay there and do nothing.

In order to reuse CPU during waiting, we need to make it non blocking.

Essentially, it means instead of waiting, it periodically checking the status of I/O operation, only back to handle it if it finishes, otherwise, allow the system to do other tasks.

Synchronous vs asynchronous

Blocking and synchronous are almost the same, a thread focuses on doing one task, no distractions.

But the difference between non-blocking and asynchronous sometimes is hard to understand. They can be the same in many ways, especially when you don’t deep dive into it.

If we understand it at thread level, asynchronous means task can be delegated to a different thread, responses can be communicated by other ways like event driven or an callback mechanism, whereas, non-blocking, the thread needs to periodically checking result until task finished.

Different models

You can design a program as:

  • synchronous, non-blocking I/O: Since we still need to periodically checking task status in the same thread, and concurrency can only be achieved by spawning more threads, thus more overhead doing context switch.

  • asynchronous, non-blocking I/O: This is preferred by modern web servers. E.g you can achieve this by one thread using event loop. Some real life examples are Python Twisted, Java Netty.