When the DreamBuildPlay 2017 challenge was announced, I immediately put aside all personal projects I was working on at the time (they were all rather boring anyway). I created a new instance of my game engine, copied some animations from my previous games, baked some draft 3D models and used all that to build a quick prototype. Despite of being placeholders, when the main character started kicking butts (literally), I got pretty excited. As I went further on, I created more animation sequences as well as a new “combo attack” manager, and I was in the middle of adding some fancy graphics… when I noticed a small stuttering every now and then. I got somewhat alarmed since having performance problems this early in the development phase is pretty much a dark omen.
Wondering what on Earth I had done wrong this time, I ran the Graphic Debugger feature included in Visual Studio. The numbers I got were unexpected, but they pointed out what my problem was. I fine-tuned my code and soon enough I had the performance issue under control – I hope. The process was so useful and smooth that I thought about writing a quick blog about it to share the experience.
Microsoft’s Graphic Debugger
To give a quick introduction history, the Graphic Debugger is a feature in Microsoft’s Visual Studio, introduced in version 2012 Professional Edition and became widely available with Visual Studio 2015 Community Edition. The early version was a little bit confusing. However, the new version is so easy to use that, in just a few clicks, I got an analysis of the tasks that were run by the computer’s GPU, along with the time it took to execute each and every one of them.
So, without further ado, let me give a quick account of what was done in hope that this same process could be useful to other indie devs: Having my project loaded in Visual Studio, I went to the “Debug” pull down menu, then “Graphics” and then “Start Graphics Debugging”.
The Graphic Debugger application requested to run in high elevation mode, and then my prototype was launched. As the game session went by, a number in the top left corner kept me informed about the time it took to draw the current frame. This number never went below 17 milliseconds (equivalent to 60 frames per second), which is consistent with the fact that my game engine synchronizes with the display’s refresh rate (as we all should, for there is no point of presenting frames that the player will never see). I pressed the “Print Screen” button a couple of times to take snapshots of the frame at hand, and then I quit the prototype.
The next screen showed me a summary of the collected data.
- The first graphic is a plot of the time that the GPU took to draw the requested frame. Overall, each frame too around 17 milliseconds to draw, for the exception of six instances:
- The biggest chunk (which can be interpreted as a drop in performance) happens at the beginning of the session. This is expected since that is when the game loads most of the assets in memory, making the game a little bit unresponsive for about 10 seconds, just before the main menu is shown.
- The second biggest chunk happens when the game loads all assets needed to execute the first level of the game. Likewise, this drop is expected.
- The next two chunks, linked to an orange triangle on top, are the screenshots taken for the frame analysis. These two drops are also expected.
- The last two chunks, and this is something I need to work on, is when the game compiled, at run-time, the animation sequences for the main character. In other words, this is a problem and I better add some code to cache those animations.
- The second graphic on the Report tab is the same information as the previous plot, but seen as the number of frames per second. In other words, this is the mathematic “inverse” of the previous plot.
- The bottom section has a list of individual frames that were captured by the diagnose tool.
From the frames listed at the bottom, I clicked on the second frame (the first one usually is not that accurate). The next screen provided a detailed list of steps executed by the GPU for that given frame.
Those tasks that have the icon of a black paintbrush have a screenshot attached to it, so clicking on them will show, on the right, what was being drawn at the time.
To the right of the “Render Target” tab, there is a tab called “Frame Analysis”. On that tab, there are two options: “Quick Analysis” and “Extended Analysis”. The latter option breaks all the time, but the first one is good enough for most diagnoses.
After a quick analysis, the application shows a report of the collected data for the given frame.
The first section of this output report puts the collected data per task in a bars graphic, focusing on the time it took to process each of them. When seen like this, it is very easy to spot problematic draws. The second part of the report has the information collected in numbers. Likewise, the potentially problematic draws are highlighted in orange (actually, it’s salmon, however this is technical post – the graphic design ones come later).
On the screenshot shown, in both sections, there are two tasks that pop out: tasks 467 and 470. Going back to the “Render Target” tab and clicking on the suspected tasks, I found out that my little trees shown in the background were being a process hog for my game. Altogether, these two tasks alone were consuming a third of the 17 ms threshold for a 60 fps game.
Although I did implement a custom routine to have these trees drawn in different colors (part of the fancy graphic feature I was trying to implement), the shaders I created were not in any way complex. Moreover, these trees have a very low number of polygons (as seen on the report, each one has about 174 polygons) and that is why I was so surprised about them being the culprit.
Anyway, long story short, the problem can be summarized as follows: these trees are huge. It’s not quite evident on the screenshot, but these trees are instances of a 3D model, and that is why they look different based on the angle of view. This means that some branches are drawn on top of each other, using some transparency effect, meaning that the color of a given pixel is not final until all branches are drawn (not to mention that these trees overlap each other at some point). Now, given that a pixel shader is executed at least once for every pixel on the screen, and that the current resolution of the screen is 1680 x 1050 pixels (making the trees about 500 x 500 pixels), this is A LOT of calculations, especially for texture sampling. Moreover, as the report shows, the GPU tried to draw 18 trees (9 on each pass), however only 8 of them are actually visible, which means that almost half of the time spent was pretty much wasted (I state that almost half of the time because I still need two pair of trees on each side in case the player needs to go side to side).
Usually, every time I work with custom shaders I run the Graphic Debugger to see how many things I broke in the process. Knowing where the time is allocated when drawing a given frame helps me focus my attention to specific shaders. In this case, after reviewing what I had typed, I did find a way to optimize the background trees’ pixel shader and that took care of the stuttering – for now. To be fair, my development system has an NVIDIA GeForce 8500 GT, which has a pretty poor performance (it has a PassMark of 139, making it pretty far below on the list), so, if my game can perform well on this system then it should run fine on most computers out there.
I will continue to use the Graphic Debugger as often as I can. However, if in the final product you see brick walls instead of trees, then you know what happened.