July 19, 2019 Intoduction to Deep Learning using Pytorch
This main focus of the blog post is to introduce readers to Deep Learning and how Pytorch, a popular deep learning framework, is being used in the Deep Learning era.
So first things first, let’s get introduced to Deep Learning.
What is Deep Learning:
Are Machine Learning and Deep Learning the same or different? Let’s first understand what these two terms are exactly. In simple layman’s terms Machine Learning is a process where a machine tries to learn by itself with experience and without human intervention. How will the computer/machine learn? The machine will learn by feeding the data with features or attributes, choosing an algorithm to construct a classifier/model, training the model, and finally the machine will make predictions. The feature extraction part is done manually in traditional Machine Learning.
Traditional Machine Learning Process:
Example: Suppose you want to identify whether an image is of a cat or not. Using the Machine Learning approach you can classify the image.
Deep Learning, on the other hand, is a subset of Machine Learning in which the learning happens through a neural network/Artificial Neural Network (ANN). The structure of a neural network is inspired by the human brain. Like a human brain ANN consists of artificial neurons . An artificial neuron is the basic unit, there are many neurons in an ANN organized in layers and all of the neurons are densely connected to each other.
A Simple Artificial Neural Network
In the above image the network consists of an input layer, a hidden layer with 4 neurons, and an output layer with a single output. The term deep indicates the number of hidden layers in the network, i.e the more hidden layers in a neural network, the more Deep Learning it will do to solve complex problems. A simple neural network can consist of 2-3 layers whereas a deep neural network can consist of hundreds of hidden layers. In Deep Learning models are being trained by providing a large set of labeled data, and the neural network architecture learns features directly from data without explicitly doing manual feature extraction.
Popular Deep Learning methods
- Convolutional Neural Networks
- Recurrent Neural Networks
- Recursive Neural Networks
- Long short -term memory
So now we can say that both Machine learning and Deep Learning fall under the umbrella of Artificial Intelligence and even though Deep Learning is a subset of Machine Learning, Deep Learning is specific to problems which are more complex and have a more diverse and unstructured dataset. Feature extraction is automatic in Deep Learning, whereas Machine Learning requires manual feature extraction. Also, DL performs better for larger datasets, whereas Machine Learning is good for small or medium sized datasets.
The analogy to deep learning is that the rocket engine is the deep learning model and the fuel is the huge amounts of data we can feed to these algorithms.– source (“Andrew NG”)
How Deep Learning works:
The basic building block of a neural network is an artificial neuron also known as perceptron. A Deep neural network consists of the following:
- An Input Layer X
- One or more hidden layers
- An Output Layer (y)
- A set of weights and biases between each layer denoted as W ,b
- An Activation function for each hidden layer .
All the nodes/neurons in the input layer have some associated input weight, and the weight signifies the importance of the feature in the output prediction. Multiplication of the input feature and associated weight is summed up (this whole process is Matrix Multiplication) and passed to an Activation function to get the predicted output. Activation functions are used to introduce non-linearity in neural networks. In real world problems the data is most often nonlinear so in order to make neurons learn we use Activation functions.
Mathematical notation of the same is as follows:
In the above image the first layer contains 2 input units/neurons (x1,x2) with weight (w1,w2). The arrows are known as synapses, which take input and multiply it by weight, sum it together and assign it to hidden unit h. After that the activation function is applied to get the predicted output.
Let’s understand the computation of Neural Network. We have 2 input values (x1,x2) with their associated weights (w11,w21,w31) and (w12,w22,w32), 3 nodes in the hidden layer (h1,h2,h3) nd target output value as 0.
Sum of products
h1 = 1*0.8+2*0.2 = 0.8+0.4 = 1.2
h2 = 1*0.4 + 2*0.9 = 0.4 +1.8 = 2.2
h3 = 1*0.3 + 2*0.1 = 0.3 + 0.2 = 0.5
Now apply Activation functionS(x) on these sums. There are many Activation functions. For this example let’s take sigmoid activation function.
Equation of Sigmoid func:
S(1.2) = 0.76852 rounded to 0.77
S(2.2) = 0.90025 rounded to 0.90
S(0.5) = 0.62246 rounded to 0.62
Now we sum all the values of hidden layer after taking dot product with the second set of weights.
0.77*0.3 + 0.90*0.5 + 0.62*0.9 = 1.239
Finally apply activation function to get target output:
S(1.239) = 0.77539
What is Pytorch:
Pytorch is a popular Deep Learning library. It is a DL research platform which provides maximum speed and flexibility. Pytorch was developed as an open source library by the Facebook research team in October, 2016 and was publicly released in January, 2017. Pytorch gives us the power of scientific computation similar to NumPy with strong GPU acceleration.
Pytorch implements imperative programming. The key feature of Python is dynamic computation graph. This capability is also known as Define by Run. Dynamic computation graph gives us the flexibility to change how the network behaves on the fly, which in turn makes debugging a lot easier.
Tensor is the primary data structure in Pytorch which is similar to arrays in Numpy. Pytorch tensors run on high computational unit GPU. Tensor can be scalar, vector or nd-array.
- A scalar is a 0-dimensional tensor
- A vector is a 1 dimensional tensor
- A matrix is a 2 dimensional tensor
- A nd-array is an n dimensional tensor
Building Neural Networks using Pytorch
#importing pytorch library import torch #Define Activation function, we are using Sigmoid activation function in this example def activation(x): """ Sigmoid activation function Argument : x = torch.tensor""" return 1/(1+torch.exp(-x)) #Initializing features # Features are 3 random normal variables features = torch.randn((1, 3)) """torch.randn creates tensor with shape[1,3] one row and 3 columns""" #initializing the size of input layer,hidden layer and output layer ninput = features nhidden = 2 noutput = 1 # weights for input to hidden layer W1 = torch.randn(ninput,nhidden) #Weights for hidden layer to output Layer W2 = torch.randn(nhidden,noutput) #bias terms for hidden and output layer B1 = torch.randn(1,nhidden) B2 = torch.randn(1,noutput) #Calculating the output by applying activation function on dot product #torch.mm will do the matrix multiplication/dot product h = activation(torch.mm(features,W1) + B1) #torch.mm will do matric Predicted_output = activation(torch.mm(h,W2) + B2) print(output)
The predicted output is tensor([[0.3171]])
Our Target Output was 0 and the predicted output we got is 0.3171.
In the above code the weights are initialized randomly. The above neural network consists of 3 input units, 2 hidden units and an output unit.
I hope you enjoyed learning about the difference between traditional Machine Learning and Deep Learning, what Neural Network is and how it works, why we call Deep Learning “Deep” and this gentle introduction to Pytorch.