Multithreading and More Cores Don’t Lead to a Pot of Performance Gold

At least, not all the time.

While I’ll try not to make this sound like a ranting blog, a lot of times my inspirations to write come from when I read stuff around tech communities and whatnot. In this case, the inspiration for this post:

Can’t wait for more programs to look into utilising those new thread counts

From the internet

The context of the quote is a response to someone pointing out how CPU cores are in processors, and thus how many threads a CPU can run, have increased more or less dramatically over the past 3 years. My issue with the response is the person doesn’t seem to be aware of the complexities and limitations of multithreading.

So what exactly are said complexities and limitations?

Some basics before I get started

Before I get started though, it’s helpful to lay out a couple of basics:

There are software threads and hardware threads. A software thread is a basic unit of work that can be scheduled by the OS to run. A hardware thread is more about how many of these software threads a processor can run. If a processor has a specification of being an 8-thread processor, it can service 8 software threads. So there can be any number of software threads, but hardware threads are limited.

Most modern multitasking OSes schedule work based on software threads. These threads are scheduled via a queue with priority. In other words, the OS puts threads to run on the processor in the order that they came in when they’re ready to run.

The most basic limitation: is the problem CPU bound or I/O bound?

The most basic limitation of whether or not a program will scale up with more cores is down to what the program is waiting on most of the time. If the program is stalling because the CPU is too busy running it, then the program is CPU bound. Otherwise, if the program is stalling because it’s waiting for something (data from somewhere, a user input, an event, etc.), then the program is I/O bound.

If the program is waiting for something and isn’t running on the CPU most of the time, adding more threads to the program isn’t going to boost its performance. Imagine if you have a factory with four workers assembling something. Would adding more workers raise the average output of your factory over time if the amount of raw materials you had only satisfied those four workers?

But some programs, simply by design, are stuck where they’re at. Basic text editors like Notepad and nano simply cannot be made to perform better by having it spawn more threads. On the timescale of the CPU, it’s waiting forever for the human to input something. Spawning more threads would be like hiring 12 workers when one would get the job done easily.

The laws that govern multithreading improvement

There are two laws that govern how much improvement multithreading adds to a program: Amdahl’s Law and Gustafson’s Law.

Amdahl’s Law describes an upper limit of how much of a performance benefit you’ll gain depending on how much of the program can run in parallel. The example Wikipedia gives is a good one. Given a task that runs in 20 hours on a single thread, if 1 hour cannot be ran in parallel, then at the minimum, no matter how many processing cores you throw at this problem, you’ll have 1 hour of work.

Gustafson’s Law on the other hand describes how many processing units you need to keep a target time of execution for a given task. It’s another way of looking at performance with the amount of processing power you have.

Both laws can be seen as an application of the Law of Diminishing Returns. Also note that it assumes data flow in and out of the processor scale with its needs. If RAM cannot feed the processor with enough data, then the processor will run into a bottleneck.

Some tasks by design may not allow for sufficient parallelization

A common one I see is games. In the past, games were designed with a single core processor in mind. Recently, games are starting to be designed with multiple cores in mind. However, there’s still a limit because the design itself doesn’t lend the game to be completely parallelized. Let’s think about some things that need to be done in a game loop:

  • Handle player input
  • Process physical interactions, like hitbox collisions and physics.
  • Run the AI
  • Mix audio or send audio samples and what to do with it to the sound chip
  • Begin the process of rendering the graphics

Not all of this can run at the same time in parallel. For example, if the graphics rendering task was done in the beginning, what you’d get is the last state of the game world because that’s the only information it can go off of. Player input probably should go before physics are applied so that the calculation of the current game world state accounts for that.

So while each individual task may have components you can parallelize, such as processing 20 AI agents or running a dozen independent physics simulations, as a whole, the operation of running a game is not.

Pitfalls when making multithreaded programs

Another thing to consider with multithreaded programs is there are many pitfalls that could break the program if not considered carefully. A common factor with all of these is essentially resource contention. One example is in a system like my PC. It has a Ryzen 2700X, a 16-thread processor, but all those threads have one pool of memory to work with. On top of this, RAM can only service memory requests in the order that they come in.

Resource contention does come in many flavors, these are:

Deadlock

  • Thread A has resource A and it wants resource B. Thread B has resource B, but it wants resource A. Neither will give up the resource they own until they acquire the one they want.
  • Real-life example: Two nations doing a prisoner exchange meet at a bridge, but the nations won’t give up their prisoner until the other one does first.

Livelock

  • Thread A and B want a resource, but Thread A wants Thread B to have it while Thread B wants Thread A to have it.
  • Real-life example: Two people walking into their own paths in a hallway, then play the shuffle game as they try to move past each other.

Starvation

  • Thread A is a low-priority thread that wants Resource A. Thread B is a higher-priority thread that also wants Resource A. Since Thread B is higher priority, it always gets to run before Thread A and so Thread A never runs and gets the resource it wants.
  • Real-life example: You’re in the lowerclassmen group in college and you’re trying to get into a class that’s required, but all the upperclassmen and other “priority registration” groups are taking the available seats.

Priority Inversion

  • Thread A is a higher priority thread that wants Resource A. Thread B is a lower priority thread that currently has Resource A. Thread A can’t run because it’s waiting on Resource A to be free, even though it’s a higher priority thread.
  • Real-life example: You need to use the computer to write up a school report but someone else is using it to play video games.

Race Condition

  • Thread A and Thread B are updating a resource and both threads’ outputs are valid. However depending on who gets done first, one thread overwrites the other’s, rather than both existing on the resource.
  • Real-life example: Two people are adding something to a Wikipedia page. When they both save, the one who saved last has their content posted instead. Note that this has been fixed by way of notifying the person who submitted their edit later that someone else edited the article

Non Atomic Operations

  • An atomic operation is one that cannot be interrupted, either because the operation itself is the most basic task or the processor was told to not allow for interrupts. If an task is not atomic, then something can interrupt the task , modify the resource in question, then when control is handed back to the task, it may not realize the resource was modified.
  • Real-life example: A captain of an airplane is reading off a navigation chart, but has to use the restroom. The co-pilot takes over but receives new information that alters the current plan. When the captain comes back, if they were not aware of the change and they weren’t told, then something bad could happen.

Maybe we could use those cores for more things (but why?)

Maybe in the future, more cores will allow us to run more distinct apps and programs at once. To which I ask… why? Having the capability to run more things at once does seem nice, but at the same I ask myself, do I really need all of those things ran at the same time? “Requiring” more apps to run increases the complexity of the system, increasing the likelihood of errors or other issues like the increasing attack surface for vulnerabilities (which also means more things to update and maintain).

It’s good to be excited for the future, but don’t get too excited

There’s almost never been a free lunch in the computer world. When you introduce one thing that sounds good, there’s always a drawback or few with it. So while it’s good to be excited, it might be a good idea to tame that excitement lest your own hype leads to disappointment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website at WordPress.com
Get started
%d bloggers like this: