CNN by PyTorch via Anaconda
Finally, I’ve got some time to write something about PyTorch, a popular deep learning tool. We suppose you have had fundamental understanding of Anaconda Python, created Anaconda virtual environment (in my case, it’s named condaenv), and had PyTorch installed successfully under this Anaconda virtual environment condaenv.
Since I’m using Visual Studio Code to test my Python code (of course, you can use whichever coding tool you like), I suppose you’ve already had your own coding tool configured. Now, you are ready to go!
In my case, I’m giving a tutorial, instead of coding by myself. Therefore, Jupyter Notebook is selected as my presentation tool. So, I’ll demonstrate everything both in .py files, as well as .ipynb files. All codes can be found at Longer Vision PyTorch_Examples. However, ONLY Jupyter Notebook presentation is given in my blogs. Therefore, I suppose you’ve already successfully installed Jupyter Notebook, as well as any other required packages under your Anaconda virtual environment condaenv.
Now, let’s pop up Jupyter Notebook server.
Clearly, Anaconda comes with the NEWEST version. So far, it is Python 3.6.4.
PART A: Hello PyTorch
Our very FIRST test code is of only 6 lines including an empty line:
After popping up Jupyter Notebook server, and click Run, you will see this view:
Clearly, the current torch is of version 0.3.1, and torchvision is of version 0.2.0.
PART B: Convolutional Neural Network in PyTorch
1. General Concepts
1) Online Resource
We are NOT going to discuss the background details of Convolutional Neural Network. A lot online frameworks/courses are available for you to catch up:
Several fabulous blogs are strongly recommended, including:
2) Architecture
One picture for all (cited from Convolutional Neural Network )
3) Back Propagation (BP)
The ONLY concept in CNN we want to emphasize here is Back Propagation, which has been widely used in traditional neural networks, which takes the solution similar to the final Fully Connected Layer of CNN. You are welcome to get some more details from https://brilliant.org/wiki/backpropagation/.
Pre-defined Variables
Training Database
- \(X={(\vec{x_1},\vec{y_1}),(\vec{x_2},\vec{y_2}),…,(\vec{x_N},\vec{y_N})}\): the training dataset \(X\) is composed of \(N\) pairs of training samples, where \((\vec{x_i},\vec{y_i}),1 \le i \le N\)
- \((\vec{x_i},\vec{y_i}),1 \le i \le N\): the \(i\)th training sample pair
- \(\vec{x_i}\): the \(i\)th input vector (can be an original image, can also be a vector of extracted features, etc.)
- \(\vec{y_i}\): the \(i\)th desired output vector (can be a one-hot vector, can also be a scalar, which is a 1-element vector)
- \(\hat{\vec{y_i} }\): the \(i\)th output vector from the nerual network by using the \(i\)th input vector \(\vec{x_i}\)
- \(N\): size of dataset, namely, how many training samples
- \(w_{ij}^k\): in the neural network’s architecture, at level \(k\), the weight of the node connecting the \(i\)th input and the \(j\)th output
- \(\theta\): a generalized denotion for any parameter inside the neural networks, which can be looked on as any element from a set of \(w_{ij}^k\).
Evaluation Function
- Mean squared error (MSE): \[E(X, \theta) = \frac{1}{2N} \sum_{i=1}^N \left(||\hat{\vec{y_i} } - \vec{y_i}||\right)^2\]
- Cross entropy: \[E(X, prob) = - \frac{1}{N} \sum_{i=1}^N log_2\left({prob(\vec{y_i})}\right)\]
Particularly, for binary classification, logistic regression is often adopted. A logistic function is defined as:
\[f(x)=\frac{1}{(1+e^{-x})}\]
In such a case, the loss function can easily be deducted as:
\[E(X,W) = - \frac{1}{N} \sum_{i=1}^N [y_i log({\hat{y_i} })+(1-y_i)log(1-\hat{y_i})]\]
where
\[y_i=0/1\]
\[\hat{y_i} \equiv g(\vec{w} \cdot \vec{x_i}) = \frac{1}{(1+e^{-\vec{w} \cdot \vec{x_i} })}\]
Some PyTorch explaination can be found at torch.nn.CrossEntropyLoss.
BP Deduction Conclusions
Only MSE is considered here (Please refer to https://brilliant.org/wiki/backpropagation/):
\[ \frac {\partial {E(X,\theta)} } {\partial w_{ij}^k} = \frac {1}{N} \sum_{d=1}^N \frac {\partial}{\partial w_{ij}^k} \Big( {\frac{1}{2} (\vec{y_d}-y_d)^2 } \Big) = \frac{1}{N} \sum_{d=1}^N {\frac{\partial E_d}{\partial w_{ij}^k} } \]
The updating weights is also determined as:
\[ \Delta w_{ij}^k = - \alpha \frac {\partial {E(X,\theta)} } {\partial w_{ij}^k} \]