June 18, 2018 - Optimizing code: The Profiler is my new best friend!

First I'll start with my latest playtest of the lightsaber game which now has much faster cutting (no more lag between swinging the sword and seeing the mesh cut) and a few other optimizations and improvements.

The last tweak I made before capturing this video was to encode the "heat" signal into the uv2 coordinate of each vertex so I no longer have to assign a different shader to each cut face. To my surprise this did not result in a drastic improvement in render speed which I learned literally minutes ago might be the result of using too many pixel lights.

Anyway, things are looking good and running very smoothly right now and that's all that matters... Check it out!


The Profiler!

I recently opened up the Unity profiler to identify bottlenecks that I could potentially open up by using the new job system to take advantage of multiple threads. But what I ended up doing was much simpler and the speed increase was massive!

Here's a before and after video showing some delayed cuts in the old version and nearly instantaneous cuts in the new version:


So, what I did was to play in the Unity editor and cut the sphere which is the heaviest piece of geometry in the scene. I did this with the profiler window open and deep profiling enabled. Then I was able to click through each frame until I found the "tooluser" script and dive into the meshcut class and its methods.

By the way, if any of these class and method names sound familiar it's because my meshcut script is a modified version of the class written by Dustin Whirle. Check out his youtube videos here and you'll find links to his code in the video descriptions.

So, here are some images taken from the profiler.

Frame 781:
Cutting starts on frame 781 where in the hierarchy since we're sorting by time, right off the bat you can see Mesh.get_vertices() taking up 12.98 ms and I dove down into it to find all these CopyChannels calls which interested me at the time but probably doesn't matter...



Another side-note, this mesh cutting operation will span multiple frames because I'm limiting the amount of time the cutting process can take up per frame to maintain a high frame rate for VR. So, in this case the cut spanned 18 frames

I like looking at the hierarchy view sorted by time to see where the most time-consuming operations are. Then I switch to the timeline so you can get a big picture of where the processing is taking place.
It's also satisfying for me to see my multithreaded code working properly here.


Armed with this information I started looking through my code for calls to Mesh.get_vertices(). What I ended up finding deep into the mesh cutting loop was this method call paraphrased here:

Cut_this_Face (
    new Vector3[]{ testmesh.vertices [p1], testmesh.vertices [p2], testmesh.vertices [p3] },
    new Vector3[]{ testmesh.normals [p1], testmesh.normals [p2], testmesh.normals [p3] },
    new Vector2[]{ testmesh.uv [p1], testmesh.uv [p2], testmesh.uv [p3] },
    new Vector4[]{ testmesh.tangents [p1], testmesh.tangents [p2], testmesh.tangents [p3] }
);

In a nutshell, after identifying which tris need to get split I looped through that list and sent them to Cut_this_Face(), and in doing so I was making multiple calls to get_vertices or get_normals, etc... Anyway, they all looked guilty to me so I decided to cache them all above the loop like this:

Vector3[] testmesh_vertices = testmesh.vertices;
Vector3[] testmesh_normals = testmesh.normals;
Vector2[] testmesh_uv = testmesh.uv;
Vector4[] testmesh_tangents = testmesh.tangents;

And the new Cut_this_Face() call looked something like this:

Cut_this_Face (
    new Vector3[]{ testmesh_vertices [p1], testmesh_vertices [p2], testmesh_vertices [p3] },
    new Vector3[]{ testmesh_normals [p1], testmesh_normals [p2], testmesh_normals [p3] },
    new Vector2[]{ testmesh_uv [p1], testmesh_uv [p2], testmesh_uv [p3] },
    new Vector4[]{ testmesh_tangents [p1], testmesh_tangents [p2], testmesh_tangents [p3] }
);

So, I had mentioned above that the cut I was profiling took 18 frames to process. After the optimization I tested a similar cut and it spanned only 4 frames. The frame times vary so this isn't solid data, but as you can see from the video above it made a difference you can see/hear and in the case of playing in VR, you can feel it!

All I'm trying to do in this post other than to celebrate this optimization victory is to inspire anyone out there to get to know the profiler and subsequently get to know how Unity's C sharp implementation has many hidden behaviors worth investigating. And occasionally you will get much faster code from a very small change.

Inspiration

I've posted this before but this video explains a similar process of using the profiler to get faster frame rates. Check out the segment given by Søren Trautner Madsen.







Comments

Popular Posts