Diagnostics – Napland Games

Ever have a bright idea and think to yourself “oh, man, this is going to be so much faster”? Well, I do often, and often I find out that I’m a dummy. But that’s OK! We need to learn. So how do you find out your bright idea is as bright as a black hole? Diagnostics. The tool I tend to use most is simply the Stopwatch provided by the System.Diagnostics namespace. It’s pretty simple and provides that much-needed insight to the age-old question: “Is this method better than that method”. In this article I’ll take you through a couple of methods that are commonly frowned upon because they are slow with some simple usage examples of the Stopwatch, then I’ll show you how to make a more generalized diagnostics tool that’s more flexible and is good for your toolbox (I call it the DiagnosticsRecorder). Let’s begin!

Simple Usage of Stopwatch

We’re often told that such and such method is slow and should be avoided at all costs. Two common ones are using Vector magnitudes vs. square magnitudes and repeated calls to Unity’s GetComponent() method. Let’s start off by testing the difference between Vector3.magnitude vs. Vector3.sqrMagnitude. Place your bets now on which will be faster and what the time difference is!

First simply create a MonoBehaviour derived script with just an empty Start() method. Then pop in the following method.

The code should be self-explanatory. Basically we just set up a bunch of random Vector3s and record the time it takes to do a bunch of iterations of sqrMagnitude and magnitude. Now for the results! On my system (AMD 8350 CPU and Unity 5.3.4) the results are:
Magnitude test results:
total sqrMagnitude time: 14ms
average sqrMagnitude time: 1.4E-05ms
total magnitude time: 35ms
average magnitude time: 3.5E-05ms

So, as expected, magnitude calls are much slower than sqrMagnitude, approximately 3 times slower. However, this is 1 million iterations! In conclusion, although magnitude is definitely slower than square magnitude, you’d have to call it over 70,000 times to see an impact of 1ms. Though I’ll probably still suggest usage of square magnitude, the difference is not as big as what people make of it. Let’s check something else.

Another commonly frowned upon method call is gameObject.GetComponent<T>(). This is considered quite slow and we’re often told avoid doing this in methods that are called frequently, like Update(). But how slow is it exactly? Place the following code into your testing script to find out.

The above code is basically the same concept as the magnitude test method, but here I’m calling GetComponent. Not only do I call GetComponent, but I do a a trivial setting (enable = true) just to make the call relevant. In the comparison test I only call GetComponent once then perform the setting of enable = true on that component repeatedly. The results are as follows:
GetComponent test results:
total GetComponent time: 211ms
average GetComponent time: 0.000211ms
total single GetComponent time: 60ms
average single GetComponent time: 6E-05ms

As we would expect, calling GetComponent is indeed pretty slow. Like our comparison of sqrMagnitude and magnitude, GetComponent also seems to be about 3x slower than a single call to GetComponent . What’s a bit unexpected is how slow the loop with the single call to GetComponent is. After some more testing I found that about 50% of the time spent in the second loop is due to the conditional if(al != null). Wow! That’s sort of huge and leads to an important lesson: The results are only as good as the test. So in my test method, there is a flaw that skews the results. However, I usually check if a GameObject or component is null before accessing it to prevent those lovely null reference exception errors. Maybe I should be doing try-catches instead? It’s another test for another day. In conclusion: GetComponent is pretty slow, don’t use it in Update methods and cache it whenever possible. How slow? It’ll take about 17,000 calls per frame to see a 1ms impact. But when we’re trying to get over 60fps we only get 16ms to spend per frame, so beware!

As you can see, Stopwatch is extremely useful in testing out your ideas on how to optimize your code. However, imagine the pain of setting this up for 15 methods over 100 classes. Ouch! So let’s explore a simpler and more valuable class for diagnostics.

DiagnosticsRecorder Class

The DiagnosticsRecorder class was made to ease my pain of checking the performance of multiple methods in a class. It also provides more valuable statistics such as min, max, and average. I also find myself in need of a simple counter class to tell me how often calls to a particular section of code are happening. So I added in a counter recorder that works similarly to the stopwatch recorder. One caveat though – these classes do have a cost! The DiagnosticsRecorder class makes use of Dictionaries and key lookups (Dictionary.ContainsKey()) which can be slow if the dictionary is large. So best to be cautious of how many diagnostics to run in a single instance of DiagnosticsRecorder. Also, remember to keep in mind that tests are best as relative comparisons. So look for your slow points in comparison to other tests.

On to the code! I’m not going to embed the entire code here, but instead simply provide a link to each of the three classes.

DiagnosticsRecorder.cs

StopwatchRecorder.cs

Counter.cs

The DiagnosticsRecorder class requires both StopwatchRecorder and Counter, but the other two can stand on their own. The code is documented so it should be pretty self-explanatory, but I’ll talk a bit about the classes and how to use them.

The StopwatchRecorder and Counter record times and counts respectively. They record these into lists so that they can be summed easily later when you want the test results. If the list overflows then the sums are stored in another list. So you can run these for quite a while without issue.

DiagnosticsRecorder allows us to name our StopwatchRecorders and Counters so that we have a compact way of recording a bunch of tests and outputting the results.

Let’s take a look at how to use them. Below is the complete code for the two tests we ran earlier, but making use of the DiagnosticsRecorder instead.

As you may see, the setup per test is a lot more compact and calling the results is a single method call instead of the need to set up string.formats for every test. I run through all of the tests 5 times each to give a nice average instead of a one-off because averages are much more representative. The results I got are as follows:
Timers (ms)
Square Magnitude
Last: 17 Min: 14 Max: 17 Avg: 14.8 Total: 74
Test Magnitude
Last: 36 Min: 36 Max: 37 Avg: 36.2 Total: 181
GetComponent repeat
Last: 196 Min: 190 Max: 196 Avg: 192.4 Total: 962
GetComponent once
Last: 27 Min: 26 Max: 27 Avg: 26.8 Total: 134

As you can see they are mostly the same as the original tests, with some variance of course. But so much less code setup to run the timers and get the results.

Your challenge, should you accept it, is to show us some results for method calls that are commonly frowned upon. Are for loops actually faster than for each? Are string comparisons slower than int comparisons? Try out your ideas for optimizing your code and share what you find out.

As always, thanks for reading!

Stopwatch image courtesy of Designed by Freepik