-by gagan

Introduction

Gradient descent is one of the most important and fundamental algorithms which is used in Machine Learning. It is used to optimize the machine learning model to give an output which is as close to the true value as possible. Now how exactly does it optimize? To answer this question let us look at the output function also known as hypothesis.

Let us take the standard example of predicting the house prices based on its size. The input based on which the output is decided is known as feature. So let us define a simple function that represents the output function.

$f(x)_w,_b=wx_1 +wx_2 + wx_3+ ....+ b$

Here $x_1, x_2$…… are the input features , $w$ is the parameter or the weight and $b$ is known as the bias.

Since we are only using one feature i.e the size of the house, the output function becomes:

$f(x)_w,_b=wx_1 + b$

Now the most important thing is to find the value of $w$ such that our prediction will be as close to the true value as possible. But now the question arises how do we find this value?

This is where Gradient Descent comes into the picture.

Cost Function

Now that we know how gradient descent helps in finding out the parameter $w$ let us now understand how it works.

But before we move on we must learn about something known as the cost function.

Cost function represented by $J(x)_w,_b$ represents how good or worse a model performs. Lower is the cost function the better the model is. The cost function is given by: