Nolan
Shaw,
Master’s
candidate
David
R.
Cheriton
School
of
Computer
Science
In this work, I study the relationship between a local, intrinsic update mechanism and a synaptic, error-based learning mechansim in ANNs. I present a local intrinsic rule that I developed, dubbed IP, that was inspired by the Infomax rule. Like Infomax, this IP rule works by controlling the gain and bias of a neuron to regulate its rate of fire. I discuss the biological plausibility of this rule and compare it to batch normalisation.
This work demonstrates that local information maximisation can work in conjunction with synaptic learning rules to improve learning. I show that this IP rule makes deep networks more robust to increases in synaptic learning rates, and that it increases the average value for the slope of the activation functions. I also compare IP to batch normalisation and Infomax, whose family of solutions were shown to be the same.
In addition, an alternative rule is developed that has many of the same properties as IP, but instead uses a weighted moving average to compute the desired values for the neuronal gain and bias rather than the Adamised update rules used by IP. This rule, dubbed WD, demonstrates universally superior performance when compared to both IP and standard networks. In particular, it shows faster learning and an increased robustness to increases in synaptic learning. The gradients of the activation function are compared to those in standard networks, and the WD method shows drastically larger gradients on average, suggesting that this intrinsic, information-theoretic rule solves the vanishing gradient problem. The WD method also outperforms Infomax and a weighted moving average version of batch normalisation.
Supplementary analysis is done to reinforce the relationship between intrinsic plasticity and batch normalisation. Specifically, the IP method centers its activation over the median of its input distribution, which is equivalent to centering it over the mean of the input distribution for symmetric distributions. This is done in an attempt to contribute to the theory of deep ANNs.
Analysis is also provided that demonstrates the IP rule results in neuronal activities with levels of entropy similar to that of Infomax, when tested on a fixed input distribution. This same analysis shows that the WD version of intrinsic plasticity also improves information potential, but fails to reach the same levels as IP and Infomax. Interestingly, it was observed that batch normalisation also improves information potential, suggesting that this may be a cause for the efficacy of batch normalisation — an open problem at the time of this writing.