Please note: This master’s thesis presentation will be given online.
Wei
Sun, Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
Predictive Coding is a hierarchical model of neural computation that approximates backpropagation using only local computations and local learning rules. An important aspect of Predictive Coding is the presence of feedback connections between layers. These feedback connections allow Predictive Coding networks to potentially be generative as well as discriminative. However, Predictive Coding networks trained on supervised classification tasks cannot generate accurate input samples close to the training inputs from the class vectors alone.
This problem arises from the fact that generating inputs from classes requires solving an underdetermined system, which contains an infinite number of solutions. Generating the correct inputs involves reaching a specific solution in that infinite solution space. But by imposing a minimum-norm constraint on the state nodes and the synaptic weights of a Predictive Coding network, the solution space collapses to a unique solution that is close to the training inputs. This minimum-norm constraint can be enforced by adding decay to the Predictive Coding equations.
Decay is implemented in the form of weight decay as well as a new type of decay, \textit{activity decay}. Analyses done on linear Predictive Coding networks show that applying weight decay during training helps the network learn weights that can generate the correct input samples from the class vectors, while applying activity decay during input generation helps to lower the variance in the network's generated samples. Additionally, weight decay regularizes the values of the network weights, avoiding extreme values, and improves the rate at which the network converges to equilibrium by regularizing the eigenvalues of the Jacobian at the equilibrium.
Experiments on the MNIST dataset of handwritten digits provide evidence that decay makes Predictive Coding networks generative even when the network contains deep layers and uses nonlinear $\tanh$ activations. A Predictive Coding network equipped with weight and activity decay successfully generates images resembling MNIST digits from the class vectors alone.