(1) Here is the single threaded (Main only) version of the sim: https://github.com/MrGrak/Gameloops/blob/master/CsharpConsoleAppExamples/main%2B0.cs
This is an example of how to write a game loop on a single thread, Main() only. In this example, the actors in the sim are grouped into classes and put on a list in a public static Pool object, then iterated over, each actor getting move then collision phases. I consider the move and collision phases to be the 'work' being done, and I consider that each time the while loop iterates over the actor list to be a 'frame'. So this way I can get information about how many frames it took the simulation to complete, and how long the simulation took to complete. This seems redundant, but when I move the 'work' being done to another thread, like in the next example, the main thread wont be doing much work and the frames will skyrocket, while the simulation times will drop as well.. for this 'baseline' example, the sim times are around 1300ms and always finish on frame 2040.
(2) Which is exactly what happens here: https://github.com/MrGrak/Gameloops/blob/master/CsharpConsoleAppExamples/main%2B1.cs
All simulation work has been moved out of Main and into a public static method of program that main calls using a thread. Frame times for completion jump into the millions, while simulation time (on my computer) drops to around 550ms. I've only moved the work being done to a different thread than Main, so nothing too special here - but I feel it's important for people to understand what's happening step by step. One thing that is important to note is that I've 'locked' the work method using a boolean. When work completes, that boolean is unlocked and work is free to run again. Main constantly runs, checking to see if that boolean is in the ready position, and if it is, then work is called on a separate thread. This was done to make sure that work happens sequentially, and not concurrently.
(3) Now, I'm going to split the work being done in half: https://github.com/MrGrak/Gameloops/blob/master/CsharpConsoleAppExamples/main%2B2.cs
This is where I get a little fuzzy about how the data is flowing to the cpu cores, and how efficient this design is in reality, because I don't know how to inspect the flow of data through the chipset to the cores. I can look at my cpu core useage, and I see more cores being used, and I see the sim times drop and the main frame times drop too - so I'm making the assumption that the OS is scheduling these threads in atleast a rational manner. In this example, the actor object is decomposed into two lists, each describing a property of the actor class that existed in previous examples. The x, y, aiType, and active fields still exist for each actor, they are just split into lists. Two lists per value. One list is used to read from, and the other list is used to write to. Then the move phase of work has been put into it's own method of program, which locks itself using the boolean strategy from the previous example. This means there are two booleans, each locking two different program methods, that together describe the move and collision phase of the gameloop. Main now checks these two booleans to see if they are ready, and starts the move work on a thread, and the collision work on a different thread, then waits for both of them to complete. An interesting note here: move does 256 iterations of work, while collisions do 256*256 iterations of work, so collisions have a ton of work to do in comparison to move work. Because of that difference in the size of work, in the next example, I split the collision work in half, with very minimal effort:
(4) Here the collision work is divided between two threads, bringing the work being done to 3 threads, plus main, which is a design for 4 core systems: https://github.com/MrGrak/Gameloops/blob/master/CsharpConsoleAppExamples/main%2B3.cs
Here I simply copied the collision work method and divided the iterators to work on the two halves of collision work. For 4, the average sim completion times were 259.3ms. For 3, the average sim completion times were 297.9ms. So I see improvement which leads me to believe "things are working as expected".
(5) Finally, I jobify the work being done so I can easily scale this strategy up for systems with more than 4 cores available. This is done by associating indices to job instances. For this example, the collision work is divided up between two threads, and move on one thread, just like in (4), but it's very easy to add more jobs and threads and divide that work up using just a few lines of code in c#: https://github.com/MrGrak/Gameloops/blob/master/CsharpConsoleAppExamples/main%2B3asJobs.cs
Between (4) and (5), sim completion times are similar. So, based on how .Net and the OS decide to schedule the threads, this design either scales up to 4+ core systems, or it doesn't, and that's what I'm a bit fuzzy on. Anyone care to weigh in?