build a neural network for mnist from scratch

-by gagan

in this blog we are gonna build a neural network to classify handwritten digits from the MNIST dataset, but instead of using any ML libraries like tensorflow or pytorch, we’ll be coding it completely from scratch using only numpy. This will be a continuation for the knowledge that we built upon from the last blog

the goal is to understand what really happens behind the scenes when we train a neural net – the feedforward, the activations, softmax, loss, backprop and all that stuff.

if you have basic knowledge of numpy and a little linear algebra, you’re good to go, although I highly recommend you to go through this blog. checkout this video, which inspired me to code this up.

you can find the entire code here https://github.com/Gagancreates/mnist-from-scratch

loading and preprocessing the data

I recommend to use the notebook within kaggle itself

we’ll be using the kaggle MNIST dataset (train.csv). it has 784 pixel values (28x28 image) and a label column at the start. As usual we will keep the input in the shape$(sample\ size, number\ of\ features)$, here each pixel value is a feature

data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
data = np.array(data)

np.random.seed(42)
np.random.shuffle(data)

we split the data into training and development sets (80/20 split):

data_dev = data[0:4000]
Y_dev = data_dev[:, 0]
X_dev = data_dev[:, 1:].astype(np.float32)
X_dev /= 255.     #divide by 255 to normalise the data which contains the pixel values                     # from 0 to 255, 0 being black and 255 being white

data_train = data[4000:]
Y_train = data_train[:, 0]
X_train = data_train[:, 1:].astype(np.float32)
X_train /= 255.

coding the feedforward

our model has 2 layers:

first one is a hidden layer with ReLU
second one is the output layer with Softmax

def relu(x):
    return np.maximum(x, 0)

def softmax(x):
    e = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e / np.sum(e, axis=1, keepdims=True)

one-hot encoding the labels

our model gives probs like [0.1, 0.2, ..., 0.05], but our labels are just numbers (like 3). to compare both, we convert labels into vectors, only the correct index is 1, rest are 0.

example:

3 → [0 0 0 1 0 0 0 0 0 0]