August 29, 2018 Neural Arithmetic Logic Unit
Neural networks are being widely adopted across disciplines with regards to computation. One fundamental flaw with neural networks is that they are unable to count. The neural arithmetic logic unit (NALU) was created to extend the neural accumulator model (NAC) to allow for better arithmetic computation.
Neural networks have been shown to compute very well with the training sets they are given. However, outside of the training sets, even simple tasks like learning the scalar identity function is impossible outside of the training sets. However, with the introduction of NACs and NALUs, the ability for neural networks to train and retain mathematically complex models outside of their original training sets has been increased.
Building a NALU involves building a NAC. NACs outputs the linear transformation of the inputs. Simple arithmetic operations such as additions and subtractions are performed. In Python, the following code is provided for the NAC. W is the transformation matrix, M is the matrix of interest, A is the accumulation vector, and G is the learned sigosmodial gate. For some freedom, one can determine the standard of deviation for the layers. Note that one will need TensorFlow and NumPY for the implementation:
import tensorflow as tensor import numpy as np #This function will calculate the vector for W and M that will be outputted. We use the layers #provided through the NAC chip. def vector_calc(layer, sdev): return tensor.Variable(tensor.truncated_normal(layer, sdev)) #This is the product for the transformation matrix provided in the paper. def W_calc(W_vector, M_vector): return tensor.tanh(W_vector) * tensor.sigmoid(M_vector) def nac(sdev, array_layers, outputs): #The layer is taken based on the amount of outputs that are needed. layer = (int(array_layers.shape[-1]), outputs) W_vector = vector_calc(layer, sdev) M_vector = vector_calc(layer, sdev) W = W_calc(W_vector, M_vector) A = tensor.matmul(array_layers, W) G = vector_calc(layer, sdev)
Now for the NALU code. The NALU code will take the output from the NAC cells where the layers and standard of deviation as well as the number of inputs is given. The NALU will learn the weighted sums of the two subcells. The NALU will output what is passed through the NACs.
import tensorflow as tensor import numpy as np def vector_calc(layer, sdev): return tensor.Variable(tensor.truncated_normal(layer, sdev)) def W_calc(W_vector, M_vector): return tensor.tanh(W_vector) * tensor.sigmoid(M_vector) def nalu(ep, sdev, array_layers, outputs): layer = (int(array_layers.shape[-1]), outputs) #NAC cell. Should have seen it before. W_vector = vector_calc(layer, sdev) M_vector = vector_calc(layer, sdev) W = W_calc(W_vector, M_vector) A = tensor.matmul(array_layers, W) G = vector_calc(layer, sdev) #Take the output of the cells. M_out will take the exponent of W and the layers in the #tensor. abs_val = tensor.abs(array_layers) + ep M_out = tensor.exp(tensor.matmul(tensor.log(abs_val), W)) #Learned sigmoidal gate. G_out = tensor.sigmoid(tensor.matmul(array_layers, G)) #The output from the NALU as per the paper. out = G_out * A + (1-G_out) * M return out
This NALU can now be used in Python-based applications for any type of training and then used in general. The NALU implemented comes from this Python implementation. The NALU can be trained and tested to do the scalar identity function. If you want to learn more about NALUs, check out Andrew Trask’s paper. These concepts may be useful for challenges related to machine learning and neural networks.