Not that long ago I started building some Machine Learning (ML) toys in Houdini. The main idea was to implement everything from scratch (using VEX) which meant that I had to invest some time upfront laying the groundwork to have the fundamentals in place and start developing the more interesting stuff later on. I confess: I had been looking forward to getting to the point where I had all the building blocks to be able to deal natively inside Houdini with one of the most fascinating areas of ML: Neural Networks (NNs).
A couple of quick notes:
The toolset described in this post does not aim to compete with any of the existing mature ML frameworks. However, it strives to present a great way to play, visualise and understand what goes on under the hood of these networks. It is important to note that the type of NNs that I considered in this study is limited to Feedforward NNs (for now).
What I will be doing in this article is going through the thought process and describing the overall approach I took. Although these tools are generic, I will lean on a specific example (Digit Recognition) to explain some design ideas. What I will not be doing is getting into too much detail about common ML and NN concepts (feel free to check the hyperlinks provided).
Just to put things in context and also as a way to summarise the action plan I followed, here’s a list of the main steps that I took:
- Built some utilities to load and visualise data.
- Built Linear Regression and Logistic Regression toolsets.
- Implemented Linear Algebra toolset to manipulate and operate on large matrices (used by other algorithms or techniques such as Normal Equation).
- Trained a Digit Recognition NN model in Octave to be used as ground truth in the following steps.
- Defined the Neural Network architecture representation in Houdini.
- Imported the parameters learnt in Octave, fed them to the NN representation in Houdini and used them to implement and validate the Feed-Forward propagation technique.
- Implemented the Back-Propagation and Gradient-Decent algorithms to train the Neural Network natively in Houdini.
- Compared and validated results against those achieved in Octave.
- Used same dataset to train different NN architectures/hyperparameters (hidden layer and neuron counts, learning rates and regularisation values).
- Built a procedural geometry representation of the Digit Recognition NN example to present this work (what you see on most of the images attached in this post).
In this article, I will focus on 5, 6 and 7 (for the most part).
The Neural Network Architecture:
One of the questions that needed an answer first was how to represent the NN “object” in Houdini. Apart from presenting a data structure that made sense for the purpose, I also wanted to make sure the choice allowed for easy inspection and visualisation of the data flowing through the network at any given point in time.
Since the underlying geometry structure of a NN is a Directed Graph, an obvious representation choice inside Houdini is a geometry where its points are the neurons and primitives (single-edge polylines) constitute the synapses. Luckily this is straightforward to accomplish given a few input parameters (the number of layers and the neuron count for each of them).
In the animated image below you can see a few different NN architectures (3 hidden layers, input layer on the left, output layer on the right) resulting in a varying number of neurons. The red points on the image represent the bias unit (or neuron) for each layer that needs it.
Needless to say the connectivity structure is not enough to implement the algorithms that are described in the following sections. Luckily we can store extra data as geometry attributes of the points and primitives:
- Parameters of the NN (also referred to as the “weights”): A parameter (commonly named “theta”) on each of the primitives (connections) between neurons of different layers that represent how much the activation of one of them contributes to the activation of the next.
- Activations: A value on each of the points (neurons) that represent the activation state. From layer to layer, this activation will be cumulatively propagated given an activation function (i.e. sigmoid).
There are some other attributes that will make things easier when implementing algorithms that operate on NNs. These will be stored temporarily as supporting geometry attributes on points or primitives alongside the aforementioned parameters and activation data.
There is also something important to note here. Especially during the training phase, a large amount of data needs to be propagated forwards and backwards through the network iteratively in the process of optimising the parameters. The way the NN has been implemented is one so it can hold arrays of data instead of single values in order for computations to happen at a large scale in parallel. In a Houdini implementation of the forward/backwards propagation algorithms we will exploit the SIMD nature of Attribute Wrangle nodes (more on this later).
The Neural Network Parameters:
There is little to no use on an untrained NN (which is the same as saying a NN with uninitialised or random parameters). As mentioned in the previous section, these are the theta or “weight” values stored in the synapses between the neurons (the primitives connecting points in our Houdini Geometry). These parameters are what make a NN architecture “work” for a given task (activate the output layer neurons as expected) when we propagate data forwards from the input layer all the way through to the output layer.
I initially relied on externally trained parameters that could just be plugged into the NN. The reason for doing this was simply to be able to make sure the feed-forward propagation solution that I would implement shortly was 100% correct and free of any silly bugs. Gradient descent with backwards propagation can be a bit finicky to implement and it could have gotten really confusing otherwise, as the training process that produces these parameters relies on both forwards and backwards propagation. In layman terms: it was a bit of a chicken and egg situation.
I used Octave (a.k.a. the “mostly compatible” alternative to Matlab), a well-known Software package for Numerical Computation to calculate the parameters for a specific NN architecture that I would as my example during development. Data was written out from Octave in csv format, which made it really easy to import into Houdini with a couple of utility nodes developed for the purpose that can be seen in the image below:
Not that it tells us much, but one can visualise the weight parameters on the primitives of the NN geometry by just remapping them to a colour attribute.
Do not confuse the aforementioned colours with the activation values of the neurons. The latter will be determined by a linear combination of the NN parameters with the activation values on the neurons connected to the other end of the synapse. Do the aforementioned starting on the input layer and finishing on the output layer is known as Forward Propagation.
The Feedforward algorithm:
The Feedforward algorithm or Forward Propagation is useful in two different contexts:
- On a previously trained NN: To predict the outputs given some input values.
- While the NN is being trained: Like the previous case but done prior to running Back Propagation on the NN (in order to calculate “how wrong” the Forward Propagation process got it and adjusting parameters accordingly). This sequence is repeated over and over again during training to reduce error rate with each iteration (see next section for details).
There is plenty and really great resources online about this technique (so this section will not be a deep dive into the algorithm). At the end of the day, propagating the activations of the neurons forward is pretty straightforward (no pun intended). The somehow trickier bit is to think about implementation details to make this process as fast as possible as it will need to run over and over again on large amounts of data, especially during training.
The most common approach to compute as much as possible in parallel is to use vectorisation. There are a few options that one can consider for doing this in VEX as Attribute Wrangle nodes allow you to write snippets of code that run in parallel for all points or primitives (the two relevant options in our case). My initial implementation for the forward propagation node relied on a Linear Algebra toolset I worked on not too long ago. See image below;
For those familiar with Houdini, you will immediately spot the inner loop on feedback mode iterating over the layers of the NN (sequentially, because the activations of any specific layer’s neurons are determined by the activations of the previous layer). However, in this implementation, the data that is forwarded from one iteration to the next on the aforementioned Houdini loop is not the NN’s geometry itself but a “matrix geometry” (using a custom Linear Algebra suite of nodes developed for such purpose).
At each iteration, we will multiply the activation matrix of the previous layer by the matrix of parameters corresponding to the current one, which will give us the propagated activation values. This process will be repeated (result of previous iteration will be forwarded to the next) until we calculate the activations of the output layer (at which point the loop ends and the values are fed back into the NN geometry).
Note the sigmoid operation that takes place before forwarding the activation matrix for the next iteration of the loop. The sigmoid is a commonly used activation function in NNs. All it does is to turn the numerical values resulting from the linear combinations for each neuron into a logistic value (a number between 0 and 1).
This Houdini implementation works well and is quite performant but, unfortunately, it has a problem: all the intermediate data is left behind. This happens because the for loop forwards matrix geometries (Linear Algebra toolset’s matrices) and not the NN itself. If we were only forwarding the NN we would be forced to extract and update the values on the NN geometry every iteration which would make the whole process slower and defeats the purpose of using the matrix multiplication approach.
Ideally there would be some way in a “Houdini for loop” to forward two different geometries resulting in two different output nodes so we could update all the intermediate data in the NN as we iterate through the layers. If anyone knows how to do this without merging geometries, I am all ears! 🙂
In any event, keeping all the intermediate values on the hidden layers is quite important, not only for inspection and visualisation purposes but also because this data will be necessary for the Back-propagation algorithm (see next section), Because of this fact, I implemented a second version of the Feedforward algorithm that essentially does the same thing as the previous one but does not use secondary Matrix objects to calculate the neuron activations. As a result, the inner loop forwards the NN itself so it is a lot easier to keep as much data as required in its geometry.
Not using the Linear Algrabra toolset for Matrix multiplication is not that bad after all, as Houdini Attribute Wrangle nodes can do a very similar job but relying in the connectivity data of the geometry instead of constructing and multiplying matrices. In fact, when we are forwarding a relatively small amount of data (i.e. predicting the output for a single input sample), this is a faster method with less intermediate steps and less memory allocation requirements than the previous option.
The Back-Propagation algorithm:
The Forward Propagation (of Feedforward) algorithm computes the output layer’s neuron activations given some data on the input layer’s neurons (it propagates the activations forwards). The Back-Propagation algorithm works in a similar fashion but in reverse: Given the output layer’s neuron activations, it propagates them “backwards” towards the input layer. However, the purpose of doing such thing is to calculate how much the NN “got it wrong” for the given output, therefore it is expected that the input corresponding to the propagated output data is also provided (and propagated forwards beforehand). This algorithm is at the core of the learning process and the idea is to run it over and over again while the parameters of the network are repeatedly adjusted to get it “less and less wrong” with each iteration (see next section).
Just like in the Forward Propagation section, I will not be getting into the details of this technique as there are already fantastic resources online that describe all the implementation details. Alternatively, I will only be concerned here about discussing, at a high level, how this algorithm can be implemented in Houdini. Unsurprisingly, it looks very similar to the Feedforward network shown in the previous section:
These are the biggest differences between the networks of each of the two algorithms (forward/backwards propagation):
- An extra attribute is necessary on the neurons (sigma). Intuitively we can think about sigma as the activation error propagated backwards.
- Sigma data on the last layer is simply the difference between the output activations and the expected result (hence, we calculate outside of the main loop).
- The sigmoid activation function is replaced by the Sigmoid Gradient Function.
- At the end of the loop, the sigma and activation values are used to update the parameters of the NN.
An important thing to note is the fact that Back-propagation needs the activation values for all hidden layers to be available. This is because, intuitively, we are calculating the errors on the hidden layer activations, which would only be there if there has been a Forward Propagation pass beforehand. We can make sure of this by checking whether or not that’s the case and stop execution at the error node.
Yes, there are a lot of math details to keep in mind when one implements Back-Propagation. It can be a bit daunting if not taken one bit at a time. However, with a good skeleton network, it was a lot easier to nail down the implementation details, which basically meant translating math formulas into VEX code.
The Gradient Descent Algorithm:
Back-Propagation provides us with a method to calculate the gradients of the NN parameters. Intuitively one can think about these gradients as a pointer in the direction towards a model that better fits the training data. However, gradients are derivatives, which means that they only give us a sense of directionality of the change, not the amount of the adjustment required. Since the magnitude is not predetermined it becomes necessary to repeat this process over and over again taking small steps that, one bit at a time, improve the model’s performance (with each iteration the NN will become a better model for the data provided).
Updating the parameters, calculating how they perform and repeating these steps over and over again is most of what the Gradient Descent algorithm is: an iterative optimisation method frequently used to train NNs. There are some important concepts to consider:
- How large/small the adjustments to the NN parametes are is controlled by a step value (commonly referred to as alpha).
- It is necessary to have some measure method to evaluate the performance of the model (commonly referred to as the cost or loss function).
- To help preventing overfitting an extra value will be introduced to control the amount of regularisation (commonly referred to as lambda).
In the image below you can see the interface of the Gradient Descent tool which takes the NN, and training data samples as inputs (features and label indices) and provides the trained NN in its first output as well as history data of the cost values over the course of the training process.
A thought for future work: It could make sense to provide in the tool’s UI a parameter to control the percentage of the training data to use as validation (which is really important to prevent overfitting (your model becoming a too specific representation of the training data and not performing well with samples that have never been seen before).
In the image below you can see the inner contents of the tool:
As aforementioned (and as can be seen in the previous image), the cost function is applied each time the parameters are adjusted inside a for loop (which runs as many times as indicated in the UI of the node). A NN can be trained in stages by using multiple Gradient Descent nodes sequentially as long as they do not have the initial parameter randomisation option enabled.
With each iteration, the loss function value for the NN should decrease as can shown in the image below:
Putting it all together:
We have covered all the main building blocks that will allow us to carry out real ML tasks inside Houdini just by building simple networks. In this section, we can use the “Hello World” of NNs and train a model for Digit Recognition. The MNIST dataset will be used in this example.
With a bit of procedural geometry generation and manipulation we can present the aforementioned use-case in a slightly more visually appealing way:
What the previous animation aims to show is the activations for the different neurons as the data is propagated through the network. Pay attention to the predicted value in green and how it is determined by the highest activation value neuron of the output layer.
Another thing to test is live recognition of digits that have never been seen before. The most immediate option that comes to mind is using the DrawCurve SOP to create the input features to be used for prediction by Feedforwarding the data through the NN. It is quite interesting to see how the neuron activation patterns change with the stroke (lighter neurons have a higher activation value),
We can also easily display the features that the neurons have learnt to recognise (and hence cause them to activate) by normalising the parameter values of the input connections of any given neuron.
In the previous image that particular neuron learnt to recognise input data that looks somehow similar to the displayed picture. By combining activations and independent neurons being in charge of recognising different sets of features is how the predictions propagate through the NN.
Conclusion & Future Work:
This has certainly been quite an interesting project.
Machine Learning and, in particular, Neural Networks (even in their simplest form) are really fascinating. Unfortunately, the entry barrier is often too high if one is not familiar with their key concepts.
Certainly, even if just as a learning exercise, I believe there is great value in implementing the algorithms and data structures I described in this post. More importantly, once the building blocks are ready, it opens up the possibility to do little experiments and play around with more fun and visual examples. I believe it is by creating and interacting with these little toys that it becomes a lot easier to intuitively get a grasp of what’s going on under the hood of a NN. Certainly, Houdini sounds like a great option to do this due to the transparency and openness of the platform, but also because of how easy it is to generate and manipulate data, run simulations, etc. One can think of it as their very own virtual laboratory 🙂
It is also quite interesting to have two NN models running side by side on different platforms (Octave and Houdini) performing exactly the same way as long as they are both using the same parameters and architecture of the network. This makes perfect sense, as that is pretty much what a NN model is (just a NN architecture and its parameters) but still amazes me. It makes it so much easier to port tools between platforms without having to worry about problem specific implementation details (which sounds quite empowering as it has been demonstrated that NNs can approximate any problem!)
I see this exercise as a proof of concept of a more sophisticated and mature toolset for Machine Learning inside Houdini. I imagine one could have very similar utilities with the very same interface and visual representation on the viewport but using a well-established ML framework under the hood. That would probably be the best way to go to get user-friendly access to many state-of-the-art implementations for many modern ML techniques such as convolutional NNs.
To wrap it up: I decided to create this article as it felt like I had reached a milestone that made it worth sharing. However, there are a couple of areas that I am quite keen to keep exploring:
- ML-assisted creative content creation: I believe this is one the areas with the greatest potential for bringing AI into Houdini. In other words: making the content creation process a more collaborative and exploratory joint effort between the artist and the machine. I have been thinking of some small examples of this that build on top of the toolset presented in this article. I am hoping to have a bit of time soon to prototype them.
- Hyperparameter exploration with PDG/TOPs: Finding good hyper-parameters for your ML models is often a quite exploratory endeavour where it becomes necessary to sample the space of possibilities by means of trial and error. It becomes obvious to think about using PDG (or TOPs inside Houdini) to carry out this exploration. For instance, one could put a procedural pipeline in place to explore a given set of NN architectures (or even carrying out an automatic exploration of different options) to evaluate which hyper-paramters (layer and neuron counts, learning rates, regularisation paramters, etc) work best for a given problem. As there are so many possible combinations that lead to different models it makes perfect sense to automate this as much as possible. Another area that PDG could be leveraged for is data synthesis and augmentation.
In summary: there is so much more to do and so much more fun to be had! 🙂
I hope you found some value in the article and do not hesitate to reach out if you have any questions!