-by gagan
in this blog we are gonna build a neural network to classify handwritten digits from the MNIST dataset, but instead of using any ML libraries like tensorflow or pytorch, we’ll be coding it completely from scratch using only numpy. This will be a continuation for the knowledge that we built upon from the last blog
the goal is to understand what really happens behind the scenes when we train a neural net – the feedforward, the activations, softmax, loss, backprop and all that stuff.
if you have basic knowledge of numpy and a little linear algebra, you’re good to go, although I highly recommend you to go through this blog. checkout this video, which inspired me to code this up.
you can find the entire code here https://github.com/Gagancreates/mnist-from-scratch
I recommend to use the notebook within kaggle itself
we’ll be using the kaggle MNIST dataset (train.csv). it has 784 pixel values (28x28 image) and a label column at the start. As usual we will keep the input in the shape$(sample\ size, number\ of\ features)$, here each pixel value is a feature
data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
data = np.array(data)
np.random.seed(42)
np.random.shuffle(data)
we split the data into training and development sets (80/20 split):
data_dev = data[0:4000]
Y_dev = data_dev[:, 0]
X_dev = data_dev[:, 1:].astype(np.float32)
X_dev /= 255. #divide by 255 to normalise the data which contains the pixel values # from 0 to 255, 0 being black and 255 being white
data_train = data[4000:]
Y_train = data_train[:, 0]
X_train = data_train[:, 1:].astype(np.float32)
X_train /= 255.
our model has 2 layers:
def relu(x):
return np.maximum(x, 0)
def softmax(x):
e = np.exp(x - np.max(x, axis=1, keepdims=True))
return e / np.sum(e, axis=1, keepdims=True)
our model gives probs like [0.1, 0.2, ..., 0.05]
, but our labels are just numbers (like 3
). to compare both, we convert labels into vectors, only the correct index is 1, rest are 0.
example:
3 → [0 0 0 1 0 0 0 0 0 0]