Why is my Java software only using one core, on my multi-core AWS ECS Docker instance?

There is no way to get Java to automatically use find out many cores are available and use them all, on AWS ECS using Docker. You have to specify how many cores you want to use via the -XX:ActiveProcessorCount option.

On JDK >= 10, use the following JDK options:

-XX:ActiveProcessorCount=2

On JDK >= 8, use the following JDK options:

-XX:+UnlockExperimentalVMOptions
-XX:ActiveProcessorCount=2

Discussion

Having optimized my CPU-bound code to make use of multi-core processors, I was disappointed to deploy the software on AWS's Docker implementation, ECS, while paying for an expensive 8-core server, and see all the work happen in sequence on a single core.

The good news is that if you just start threads, they will be scheduled to multiple cores fine. The following code produces the following output on a 4-core server, you can see that up to 4 cores the total performance is (approximately) linear as the code is executed in parallel.

1 threads: 1.149 sec   <-- 1 core  (of 4) in use
2 threads: 1.178 sec   <-- 2 cores (of 4) in use
3 threads: 1.294 sec   <-- 3 cores (of 4) in use
4 threads: 1.363 sec   <-- 4 cores (of 4) in use
5 threads: 2.141 sec   <-- more work than cores => takes longer

So having threads assigned to cores is not the problem.

However, when a thread pool starts up, with the intent to start one thread per CPU core, it typically uses the Runtime.getRuntime().availableProcessors() call to determine how many CPU cores are available. It starts that many threads, to get the work distributed over all the cores.

In my case, the Runtime.getRuntime().availableProcessors() call was returning 1, even on my expensive multi-core server. Therefore only one thread was started, and all my code ran sequentially, even though multiple cores were available (and would have been used if the threads had been started.)

To understand why that was the case, you have to understand that how Docker and AWS ECS share CPU resources between tasks is not the same as how a normal operating system shares its CPU resources.

Imagine if you bought a dual-core laptop. You started Word. You started Excel. Then you tried to start a browser and Windows told you "you can't start any more tasks. Word might use the whole of a single core. Excel might use the whole of the other single core. You only have two cores. If I allowed you to start the browser as well and it also used the whole of a single core, your computer wouldn't have enough cores to prevent slowdown."

But that's not how time-sharing operating systems such as Unix, Linux, and Windows work. They assume it's unlikely that all tasks will be doing CPU intensive activities all the time. Excel might need to do a recalculation every now and again and it'll use all cores to do that as fast as possible, probably Word won't be doing anything at the same time. If Word is doing something at the same time, the operating system shares the cores between the processes via time-slicing.

Docker on AWS ECS doesn't work that way. If you have a dual-core server (= 2048 cpu units in Docker's terminology), and you say one Docker task needs one core (= 1024 cpu units), and another Docker task needs one core, and you try to start a third task, ECS won't allow that, on the grounds the server is full.

service xxx was unable to place a task because no container instance met all of its requirements. The closest matching container-instance xxx has insufficient CPU units available. For more information, see the Troubleshooting section.

I do see merit in Docker's approach. If you really are sharing a multi-core server between different tasks, and they really are all CPU-bound, then it's good to know when your server is "full". (In my case, my server processes only need CPU time occasionally to recalculate some things, so I want my processes to have access to all the cores. And in the unlucky scenario of two processes needing to recalculate at the same time, I'm fine with timesharing via the normal mechanisms provided by the operating system.)

And I do also understand that, if you're running Word and Excel on your laptop, they are "cooperating" processes. They both serve you. Excel is not thinking "If I use lots of resources, I can starve out Word, nice!" But if you have two Docker containers running, if they're from different sources (e.g. different customers), maybe one has the fancy idea to mine Bitcoins at the expense of the other, so limiting the CPU might be necessary. (That's not my scenario either, all my processes are started by me.)

By default, when you start a process on ECS, without the "cpu" option specified, Runtime.getRuntime().availableProcessors() always returns 1. So you won't get the benefit of your multi-core server at all. That was what was happening to me.

If you specify an amount of "cpu", then availableProcessors() will return an corresponding value. So if you have a 4-core server, you have two tasks each running with "cpu":2048 then availableProcessors() will return 2 for each process. So each process still won't be able to use all CPU cores.

Java has an option to ignore this whole Docker mechanism, and override how many cores should be returned from availableProcessors(). You can specify the value you want to return with the -XX:ActiveProcessorCount=nn JDK option, as described here.

Alas, there is no option to allow the process running in Docker to see all the cores via availableProcessors() (which works out-of-the-box if you run Java on directly). You have to specify the number of cores it sees. So, each time you change your instance size, you also need to remember to update this parameter, which is annoying.

P.S. I recently created a nerdy privacy-respecting tool called When Will I Run Out Of Money? It's available for free if you want to check it out.