this is gonna be a very short blog about KNN aka K- Nearest Neighbors. Before we move on in depth let me give you a brief overview which sums up KNN, it literally means “you are whom you hang out with”. So instead of other ML algorithms which updates its weights and trains on data, KNN only memorises the data. Lets say we want to predict the gender of a person based on their features which are height and weight( a classic binary classification problem).

In logistic regression we can label females as 0 and males as 1 and then we compute a linear combination of features:

$$ z = w_1 \cdot \text{height} + w_2 \cdot \text{weight} + b $$

Then pass it through the sigmoid:

$$ \hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}} $$

Where:


If we had to do the same thing using KNN it would look something like this:

lets say we have a new data point we want to classify, we have the features (height and weight). In KNN, we simply:

  1. Calculate the distance between this new point and every example in our training data
  2. Pick the K closest neighbors (those with the smallest distances)
  3. Find the Y_label associated with these examples.
  4. Take a vote among these K neighbors
  5. Assign the majority class as our prediction

For instance, if we set K=5 and 3 of the 5 nearest neighbors are female( 0 ) while 2 are male( 1 )

$k\ closest\ neighbours=[0, 0, 1, 0, 1]$

We'd classify our new data point as female.