#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Introduction To Python

Why ReLU is used only on hidden layers specifically? What are local and global scope? Open a text file and find the longest word in the text file and find the length. How to plot Bubble plot with Encircling? What is use of Heat map ? How to plot heat map? Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training? What is TF/IDF vectorization? Which sorting technique is used by sort() and sorted() functions of python? Join Discussion

4 (4,001 Ratings)

220 Learners

Kajal Pawar

a year ago

In statistics,
the **mean squared error** (**MSE**) or **mean squared deviation** (**MSD**) of an estimator
(of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that
is, the average squared difference between the estimated values and the actual
value. MSE is a risk function,
corresponding to the expected value of the squared error loss. The fact that MSE is almost
always strictly positive (and not zero) is because of randomness
or because the estimator does not account for
information that could produce a
more accurate estimate.

Now let me give you a simple definition

Mean
squared error (MSE) is one of the most commonly used loss functions for
regression problems.

Let’s first try
to understand actually what this equation means.

- The character that looks like E is called summation and in Greek known as sigma. It is the sum of a sequence of numbers, form i=1 to i=n.

- Here, y represents the actual values and y’ represent the predicted values. When we subtract the y-y’ and then square them and take the sum of all the (y-y’)².

- Then we divide this (y-y’)² value with n where n is the number of data points to get the mean, which is known as mean-squared-error ( MSE ).

Let’s take an
example an see why we actually need mead squared error.

I will take an example and I will draw a regression line
between the different data points. Don’t consider it as the best fit regression
line, I am only taking it as example show how it actually works.

Now you might be
thinking why I am plotting this graph. Let’s me explain to you.

Here I have taken
10 data points randomly and plotted them a graph

- The Blue points are our data points which will be having x and y coordinates. When we plot them on a graph they will look as shown above.

- The line passing through all the data points is called prediction line or Regression line. There may be different numbers of prediction line but the line which best fitted all the data points are called best fit Regression line.

- The vertical line between the data points and prediction line is called errors. It is also known as residuals.

Now as most of us
may already be familiar about the equation of a straight line from our school
days.

Where,

To get the **best-fit regression line** we want to
minimize the error value.

Now let me give
you the mathematical aspects behind equation **Mean squared error (MSE).**

As you know, the straight-line equation is **y=mx+b, **where m is the slope and b are the y-intercept of the straight line.

So, we can get the MSE equation of different data points
as follows:

We can simplify the above equation and write it as:

Now, let’s open
all the bracket of the above equation and write it in a simpler way as shown
below.

Now, let’s perform some other manipulation to simplify it
more. Taking each part and put it together. We will take all the y, and (-2ymx)
and etc., and we will put them all side-by-side which will help us to simplify
it more as shown.

Now at this
point we’re getting messy, so will take the mean of all squared values for y,
xy, x, x² respectively.

We will take a
new character for each one which will represent the mean of all the squared
values.

So, to take the
mean we will take all the y values, and divide them by n and call it **y **as shown below.

Multiplying both sides of the equation by n we get the
equation as:

Finally, we will get following equation as shown below:

We can see from the
above equation that we are having m and b as the coefficients of the equation.

Now our aim is
to find value of m and b which minimizes the function.

So, how to find
it?

We will take a
partial derivative with respect to m and a partial derivative with respect to
b. Since we are trying to find a minimum value. So, we will take the partial
derivatives and compare this value with 0 as shown below.

Partial derivatives formula

Partial Derivatives

Taking the two equations what we received above, isolating
the variable b from both, and then subtracting the upper equation from the
bottom equation as shown below.

Now subtracting the first equation from the second
equation we get

Now, let me for simplified these equations for you do that
you may not be wondering that what each element represents here.

Sum of x divided by n

Sum of x divided by n

Sum of x² divided by n

Sum of xy divided by n

Sum of xy divided by n

Sum of y divided by n

As of now, you
may feel quite comfortable with the equation and concepts of the MSE.

So, to make it
clearer and give you a deeper understanding. Let’s take an example.

Let’s take 3
points on (1,2), (2,1), (4,3) and plot them on a graph. The points will look
like as shown below

Let’s try to
find the value of **slope m** and **intercept b** for the equation **y=mx+b.**

We can find the
value of **Sum the x values and divide by n** as shown below.

We can find the
value of **Sum the y values and divide by n** as shown below.

We can find the
value of **Sum the xy values and divide by n** as shown below.

We can find the
value of **Sum the x² values and divide by n** as shown below.

As of now, we have calculated the different values of line
equation, let’s put them together and calculate the value of slope m and
intercept y.

After we’ve calculated the relevant parts for our M
equation and B equation, let’s put those values inside the equations and get
the slope and y-intercept.

Let’s put all the above calculated value into the line equation
y=mx+b.

So, this is the equation line
which will give us the **best-fit
regression line.**

Let’s draw the line using the above equation and see how
the line passes through the lines in such a way that it minimizes the squared
distances and provides us a best-fit regression line.

We can use **MSE** when you are dealing with any type’s regression problems,
believing that our target, depends on the input, which is normally distributed,
and want large errors to be significantly more penalized than small ones.

```
# mlp for regression with mse loss function
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)
# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]
# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_error', optimizer=opt)
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))
# plot loss during training
pyplot.title('Loss / Mean Squared Error')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()
```

The above code first output** **the mean squared error for the model on the train and test
datasets as

Then it will plot training
and testing loss as shown below:

After reading this article,
finally you came to know the importance of **Mean
squared error (MSE)**. For more blogs/courses in data science, machine
learning, artificial intelligence and new technologies do visit us at InsideAIML.

Thanks for reading…

We're Online!

Chat now for any query