Recently, I have been helping a former colleague and friend with a personal project of his, a close-to-be-finished sci-fi full CG short film (quite a mouthful!). This collaboration came to be after a series of conversations and exchange of ideas about how to take advantage of diverse existing technologies that are not commonly used together, or at least not to their full potential (although this landscape is already changing and it is doing do so rapidly). At the time, I was starting to explore Unreal Engine as the cornerstone of a cost effective toolset for Virtual Production with a focus in keeping strong connection with other authoring software. My goal was simple: Making the process of producing the content as rewarding as possible, ideally just as fun as consuming it. If I am to invest my spare time in producing something it better be fun!
However, in this particular instance, the main workflows for the movie had already been established as it was well into production. It was important to decide which problems or inconveniences of the more traditional VFX pipeline would make sense to tackle at that point, keeping in mind that these new workflows would need to co-exist with what was already being used… and so the conversations about defining a “hybrid pipeline” began: Where can GPU rasterisation be used without falling short of the target quality/look? What is easier/harder to achieve with a real-time renderer vs. offline rendering commonly used in VFX? Some aspects might seem obvious, some others not so much (but then again, this is also changing as you are reading this).
An area where we identified a clear benefit in going for a hybrid approach was crowds. More often than not, this is a discipline where resources are not used as effectively as possible: think bloated assets, large cache files (slow to read and write) or slow rendering times that don’t really have a real impact on the final result (just to name a few!). One thing is to have crowd characters half the size of the screen, and a different one is to have them in the background, just a few pixels tall, defocused, motion-blurred, behind a cloud of dust (or even totally occluded the whole time!). Of course these are extreme examples at an extreme side of the spectrum but you get the point… Yes, LODs are a thing, and truth must be told, they partially patch up the situation. However, they do not generally help much in preventing having to package up the agents and push them through a lot of the same bottlenecks than they would without LODing (more on this later).
The goal that was established for this projects aimed to tackle all the aforementioned problems. These are the main points:
- Reduce the amount of Software packages involved. From mocap-data to published crowd, all within Houdini (everything from processing and setting up the assets, selecting and retargeting the motion, authoring the crowd and a solution to export background crowds to Unreal).
- Super-lightweight data exchange files (we needed to share these via email at times!).
- Fast real-time crowd rendering of up to 10.000 agents.
- Fast iteration cycles.
- Not closing the door to rendering midground and foregroung crowds with an offline renderer, but establishing and alternative path to do render backgroung crowds effectively.
- Make it fun.
As aforementioned, a Houdini toolset was developed for this project as suites of nodes that would would be used at different stages to realise the crowd. The choice of using Houdini was one of the easiest to make. The power of Houdini is that, if they don’t have a solution out of the box, a bit of coding and wiring things around can get you there. The different tools developed could be used in a single live network in a working scene (this could be useful when the complexity of the setup is not too high) but, preferably, a pipeline could be established by using the tools in a set of Houdini template scenes, approximately as the following diagram depicts:

I’ll only go though some of the steps, the criteria being touching upon the main elements that define the fast track between the source assets and motion and a real-time render in Unreal Engine.
Asset Import [2]: Not a lot of surprises here. This is the stage where assets (the characters) were brought into the system. The only assumption is that these assets are rigged. LODs could be provided, but failing that they can be generated easily within Houdini. (as well as a super-lightweight version of the agents for working on the viewport and achieving the fastest possible interactivity). A special emphasis on the “budget” was put thorough the whole system, and this proves particularly beneficial when it comes to exporting and rendering the crowds. Three kinds of LODing or complexities are considered: geometry (polygon count), motion (skeleton joint count) and shading (specially important in real-time graphics, related to the number of instructions of the shaders). The different geometry LODs were generated by simplifying the geometry as one might expect. Motion LODs required stripping out some joints (think removing the finger joints ane re-weighting the geometry so it is deformed by the hand joint and that kind of thing). This is desirable in some circumstances, but did not make a difference in this case due to the way crowds were exported (which will be explained a bit further down). Shading LODs could also be defined here.
In any event, let me introduce you to the agents:

A few tools were put in place to define the different look variations of the agents as well as defining some properties that would be useful later for retargeting, mirroring motion, etc.
Motion Selection [6]: Fortunately, the motion library for this project was quite extensive. A blessing that could quickly turn into a curse without a bit of infrastructure in place. Once again, the goal was to limit the amount of software packages used as much as possible to prevent round trips, loss of information, buying licenses for other Software packages, etc. It was extremely important to have an effective and easy to use solution for selecting the bits of motion from the source mocap library and retarget those onto the crowd agents. Selection happened within Houdini with a simple tool to be able to determine which sections would be turned into clips as singles (for individual agents) or vignettes (groups of agents performing a joint action).

Selection session for singles and vignettes.
Motion Retargeting [8]: After clips were selected, motion would already be in a native Houdini format (bclip) although we would also save out an fbx for reference and potential future use. However, the clips would still be on the source skeleton and an intermediate step of retargeting would be necessary. Retargeting can be seen as translating the motion across skeletons and is tricky business, as much art as science. There are multiple software packages to do this (HumanIK, MotionBuilder, IKinema) but stubbornness and the scope of this project, required the development of a toolset to do motion retargeting live in Houdini. The biggest benefit of this approach was how easy was automation given the procedural nature of the nodes. Of course, there are some limitations when you comparatively look at what you can do with specialised packages such as any of the aforementioned. Therefore, we left the door open to using those (even though we didn’t end up needing it for crowds) and leaving the door open to importing that nicely polished motion into the system easily (just the same way we do with the mocap).

The retargeting solution provided was mainly based on skeleton FK bone chain correspondences and some secondary IK adjustments, which means that the skeleton hierarchies could have different number of joints with their own naming conventions. A bit out of the scope of this post, but a useful workflow is using this suite of nodes to bring the mocap motion easily into Houdini, doing some procedural choreography, quick terrain adaptation, ragdoll physics, etc. and getting FBX files out for an animator to have an advanced starting point to further refine the motion.
Character-Motion Setup and Procedural Choreography [10, 12]: Not much to say here. A few tools to load certain clips onto the different agents. At this point, we are ready to do our crowd layout either using the off the shelf Houdini solution or any additional procedural choreography setups or tools (some were put in place to work faster and have more control). In addition to the crowd behaviour, at this point you are just were you want to be if you want to achieve all sorts of cool effects with the agents (or put them through a world of pain). Do you want to blow them up? Need to simulate their muscles or cloth? Maybe making them furry? All that very well covered thanks to the multiple solvers available in Houdini. In our case, since we are looking at large background crowds to render in real-time, let’s not get too excited with the Creature FX stuff and focus on getting the agents into Unreal.
Export and Import to Unreal [13, 14, 15]: Once the crowd is ready, one could decide it’s best to render it in Mantra (Houdini’s renderer, although soon they will replace it with Karma). Choosing ray-tracing, for whatever reason, remains a possibility. However, if the crowd needs to make it to Unreal, this is the point when it would be run through the exporter. Since the first moment the assets are brought into the system, they are provided with some data and different representations or LODs (of the different kinds mentioned earlier). It is really beneficial to have all this information worked out beforehand since at this point we only need to consider the budget for the shot to decide among our export options. It is no secret that we live in times of converging technologies in the field of Computer Graphics (i.e. Virtual Production making use of technologies employed in Video Games) and it is now more than ever that it makes sense to take a step back and rethink some workflows. Probably now, more than it has been the case for a while, it is harder to find universal solutions to some problems and adaptability is key.
The notion of Hardware allowance or what I am calling here “the budget” is often overlooked in VFX. As an industry, it is accustomed to making use of vast amounts of resources in the form of large clusters of computers to exploit parallel execution of tasks: render farms. When your pipeline rests on top of such infrastructures it is easy to lose the notion of where the real bottlenecks or resource wasters are since, “you throw it to the farm” and eventually the result will come back. That is a luxury that will bite a lot of companies if they do not adapt their pipelines in time,
Following this idea, and for the needs of each shot, we can quickly adapt the export to different budgets. Specially if you are using a Graphics Card (whether to produce or consume contents), the idea of budget becomes very tangible. GPUs have a limited amount of memory and exceeded that, performance drops drastically. Are the contents going to be watched by the public in real-time? What is the target frame-rate in that case? Otherwise… Are we talking about hard real real-time? Is it enough with relatively interactive frame-rates for the artist to work on the contents? Does a lighter need the same frame-rates as an animator and hence, do they need the same data complexity?
On a case by case basis we would determine if and how we needed to split the crowd in different packages. As a general rule of thumb:
- < 10: Hero o semi-hero, they could be exported as FBX or complex rigs.
- ~10 to 100 (Optional, if necessary for the shot). “Close crowd”. FBX or Alembic.
- ~100 to 10000s: Crowd. Exported as texture memory buffers.
The third scenario is the one that we are going through in this use-case since this memory format can easily be read in Unreal via the Material Editor. Even in the event that a crowd was going to be partially rendered as “close crowd” using Alembic, there is still a clear benefit in exporting the whole crowd as texture memory buffer. This is an extremely compact format in data size but also leads to really high frame-rates so it is really useful to work with it in Unreal since it allows for interactive set-dressing with crowd in-place, responsive lighting, etc. One can decide later, once everything in the shot has been worked out with interactive frame-rates, to bring different types of layers.
One of the hardest performance killers in real-time applications is introduced when sending data from main computer memory to the graphics card. If we think about it naively, sounds like rendering a crowd using the GPU would imply that we send all the skeleton matrices for all agents to the GPU every frame, and that is a lot of data being sent repeatedly. It also might mean that there could be a huge amount of data replication being pushed to the GPU repeatedly (multiple agents playing the same frame of a given clip). What if we could pack everything we need to represent the crowd on any frame of the shot and push it to the GPU memory?
That is exactly the idea that this approach is exploiting. On the Houdini side the exporter will analyse the agents and determine the motion catalog and render that to a texture (which is extremely convenient for GPUs). The other bits of information that we need baked onto a texture are those that represent the agents position and orientation at any given frame. In doing so, we are effectively saving 1000s of agents as lightweight pngs that take under three seconds to write, and very little amount of space from your disc. And as a bonus, a variety of colorful works of fine tapestry:


Of course, the whole idea relies on packaging the information in blocks of memory that match the agent geometry LOD reserved for this purpose so it can be deformed in Unreal via the powerful Vertex Animation feature in the Material Editor. With a bit of Blueprinting and exploiting Instanced Static Meshes it was easy to create the Crowd Reader component used in the project. Another key element was to make use of Unreal Engine’s Sequencer to sync the crowds with the timeline which was pretty straightforward. At this point bringing the crowds into the Unreal Scene becomes a Drag&Drop task.

Having made it to this point, the benefits of using a real-time renderer such as Unreal’s become apparent: Interactive lighting, motion blur, depth of field effects, real-time shadows, you name it…
If there is “budget” for it, instead of saving out all the motion catalog in a single texture (to prevent sending information to the GPU every frame) we could have a sequence of files. That would open the door to have unique poses per agent via texture memory. This could be used for crowds that have been put through a ragdoll physics simulation and hence, the skeleton poses are unique to every agent on each frame. The performance impact would need to be weighted against having a secondary cache in a different format for those agents (probably the preferred option).
Does this development present a universal solution to export crowds? No. But as I mentioned earlier, I believe these are times of rethinking workflows, exploring possibilities and technologies and work with systems that can adapt easily to different scenarios. By opening the door to using the GPU to render crowds in a real-time engine the goals that were set out for this project were more easily reached. On a different note, staying as contained as possible within the chosen Software packages, required some development time but, ultimately, made the workflow easier. Furthermore, the whole idea is not too far from being extended to open the posibility of authoring the crowd directly in the real-time engine or adapted to different scenarios such as simulated vegetation. It also represents a cost-effective pathway to high performance visualisation for virtual production in VR or low-end devices. In a way this was an exploratory exercise aimed to mitigate some of the problems that arise during CG projects inside and outside of a big company. But more importantly, it was fun.
Please, let me know if you found it interesting or if you would like to know more and I might try to come up with a tutorial when I have a bit of time.