Project Description
This was a group project where the goal was to create a zombie survival game.
The player would be placed in random rooms with a random challenge. If the player survives and completes the challenge they will get the ability to pick between 3 stat-boosts.
My role in this project was to create the Engine/Framework for the game.
The gameplay of this project was not the focal point, but rather the engine and so the game is basic and not polished/refined.
Video of Gameplay
Overall, there are a lot of things that could be added to the gameplay to enhance the feeling of the game. More feedback, more balancing, as well as some overall refinement; however, my goal was more the backend, so that was my priority. I will discuss briefly below more details of the engine below.
Engine Architecture and Iterations
For this project I created a hybrid between and Entity Component System (ECS) and an Entity System (ES). The very first iteration of the engine was a simple Inheritance based ES system where there are 3 concepts:
Scene:
A scene is a collection of Entities and their Components.Entity:
An abstract concept of an "object" in the world. An Entity has no behaviour. The behaviours are added with Components. Entities always have a Transform by default (position, rotation and scale).Component:
A behaviour that is added to an entity.
As the project progressed I started to try convert the ES into more of an ECS. This would mainly be for a performance boost. Changing to an ECS system would mean that in my systems I could request all of that component and iterate over the items. This helps minimise the number of cache misses. Additionally, doing this would mean retrieving a components was constant time, essentially just method calls, mask check and array retrieval - no matter the component or entity, this procedure would be the same.
The second iteration of the engine meant that adding and retrieving components is constant time, however, removing an entity would involve copying the last component used in the array to the removal position. This design decision was made due to the fact that I wanted components to be a non-sparse array in order to fully utilize the contiguous memory in component systems.
I started with an ES approach as this is a very easy and quick method to get something working. This enabled my team members to start with the gameplay programming.
Rendering
The renderer had a few iterations to make it more efficient throughout the project. The renderer was written in OpenGL.
The engine makes use of a batch renderer to reduces the draw calls, and additional uses a texture array to avoid having a separate batch per texture.
Additionally, to achieve layering, I used the z-axis, again, preventing multiple batches/sorting before uploading to the GPU. The use of the ECS design also assists with performance due to contiguous storage of components – reducing cache misses during iteration.
Collision
The collision system supports three shapes (Boxes, Circles, and Capsules) as well as two body types (Kinematic and Static). Each shape can collide with all the other shapes and supports a callback system.
Static bodies are ones that do not move and do not check collision between each other. A good example of this type of body is: walls. In most games walls are fixed and do not move. Another way to look at static is that it's a body that creates the "push" force and cannot be "pushed" back.
Kinematic bodies are bodies that move around in the world, experience collision between Static as well as Kinematic, however do not experience Forces such as gravity.
Due to multiple shapes and bodies, the performance for brute force – iterating over each type – became too slow (please refer to the testing section below and the image for collision manager) and so one layer of a quadtree was added. Due to our game having a camera that follows the player, centring the quadtree around the centre of the screen would result in the most applicable collision checks from being sorted. For instance, if the quadtree was centred around the world origin and the level was all in Quadrant 4, this would not optimise the collision checks at all, therefore centring the quadtree with the camera ensured that there would be sorting.
Testing and Refactoring
The engine went through multiple iterations in order to try and make it more efficient. The main systems that were refactored were the Architecture (trying to convert from ES to ECS), Renderer and Collision.
To find the bottlenecks of the engine and execution I used Superluminal Performance to profile our project.
This was very useful in helping me find which parts of the code actually bottlenecked the engine and took the most time. Once I identified this, I could optimise that specific problem. An example of where this helped is with the collision and renderer. Whilst adding the different bodies and shapes, the performance dropped significantly. With aid of Superluminal I managed to find that the issue was in the brute-force "every object vs every object" approach, as well as the callback system taking a lot of computation time. To counteract this I added the quadtree as well as a way to "deactivate" entities which would be ignored during collision checking.
Please refer to the images below for some screenshots of the profiling with Superluminal (please select the image for a better size preview if needed).
The stress test of the renderer trying to render 5000 quads. This help identify that most of the bottleneck was coming from having a shader parsed and created per quad.
Comparison of times before and after optimising texture loading by using texture caching. Left is before and right is after.
After adding multiple bodies and shapes, collision checking became a massive bottleneck. Going further into Superluminal's breakdown, it was due to the brute-force method of each object being checked against all other objects. Additionally, iterating over each object to determine the callback method was took a good portion of time.
A stress test after adding batch rendering and some optimisations to the renderer. This is rendering 5.2 million quads. 5.2 million were rendered at an average of 4 frames per second. The next section will discuss CPU/GPU balance to alleviate the 61% compute time on CPU to render at higher FPS.
The final performance for the running final version of the game. The SleepEx shows that the overall performance was sufficient enough to run the game at a stable 120 FPS and still have options for additions at a later stage.
Issues and Improvements
ES/ECS:
To achieve the full benefit of an ECS system, pooling entities is a requirement, as well as ID based design instead of a class; however, due to most of the gameplay programming already storing the pointers to the entities as well as the entities methods been called (for adding component etc.), changing the architecture to entities to an ID would result in a lot of refactoring, for very littler performance gains (due to there being no performance bottlenecks with the current system).
Additionally, an improvement would be to fully convert the engine into an ECS system and remove most of the unneeded "OnComponentAdd" and "OnComponentRemove" methods.
Scenes:
Due to changing to a more data-oriented design with the components, when changing scenes, we run into a problem of “What happens with the current scenes components?”. An approach that I would like to try is serializing the scenes where the entities and their components are stored on disk instead, however, that would not benefit the game as we only have around 3 scenes, and never get anywhere near the maximum of 65,535 entities, so the best solution was that components were not reset on scene changes and that a scene was more just a collection/grouping of entities.
An example of how I would like to explore sterilization for components is shown below:
Game: //< Scene name.
Player: //< Entity name.
Transform: //< Components.
Position: 1.0 2.0 0.0 //< Component data.
Rotation: 0.0
Animation:
Frames: 4
TimeBetweenFrames: 0.15
This would represent an entity with the name "Player" in the "Game" scene. The entity has the components Transform and Animation which hold the data of each of the components.
Rendering (Textures and GPU/CPU balance):
The current renderer can have up to 32 textures bound, and anywhere after, a draw call should happen, a flush occur and the next batch start; however, due to our project having so few textures, with the use of sprite sheets, this 32 upper-bound was sufficient. This is still a limitation of the renderer that could be improved upon and something that would make the engine more versatile.
Additionally, further testing would have to occur to find the balance between maximum number of quadrilaterals per batch to find the balance between time on GPU and CPU so that neither are waiting on the other.
For example, when stress testing the renderer with 5.2 million quads, most of the time is spent doing the vertex calculations per quad, which is a bottleneck on the CPU side, and if optimised, would result in rendering 5.2 million quads at a higher frame rate.