Regression loss- Mean Squared Error

Kajal Pawar

a year ago

According to Wikipedia definition:

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.
Now let me give you a simple definition
Mean squared error (MSE) is one of the most commonly used loss functions for regression problems.
It’s the mean of the squared difference between the actual values and the predicted values and the formula can be given as follows:
Let’s first try to understand actually what this equation means.
  • The character that looks like E is called summation and in Greek known as sigma. It is the sum of a sequence of numbers, form i=1 to i=n.
  • Here, y represents the actual values and y’ represent the predicted values. When we subtract the y-y’ and then square them and take the sum of all the (y-y’)².
  • Then we divide this (y-y’)² value with n where n is the number of data points to get the mean, which is known as mean-squared-error ( MSE ).
Let’s take an example an see why we actually need mead squared error.
I will take an example and I will draw a regression line between the different data points. Don’t consider it as the best fit regression line, I am only taking it as example show how it actually works.
Now you might be thinking why I am plotting this graph. Let’s me explain to you.
Here I have taken 10 data points randomly and plotted them a graph
  • The Blue points are our data points which will be having x and y coordinates. When we plot them on a graph they will look as shown above.
  • The line passing through all the data points is called prediction line or Regression line. There may be different numbers of prediction line but the line which best fitted all the data points are called best fit Regression line.
  • The vertical line between the data points and prediction line is called errors. It is also known as residuals.
Now as most of us may already be familiar about the equation of a straight line from our school days.
m describes the slope of the line and b is the y-intercept which describes where the line crosses the y-axis.
To get the best-fit regression line we want to minimize the error value.
Now let me give you the mathematical aspects behind equation Mean squared error (MSE).
As you know, the straight-line equation is y=mx+b, where m is the slope and b are the y-intercept of the straight line.
So, we can get the MSE equation of different data points as follows:
We can simplify the above equation and write it as:
Now, let’s open all the bracket of the above equation and write it in a simpler way as shown below.
Now, let’s perform some other manipulation to simplify it more. Taking each part and put it together. We will take all the y, and (-2ymx) and etc., and we will put them all side-by-side which will help us to simplify it more as shown.
Now at this point we’re getting messy, so will take the mean of all squared values for y, xy, x, x² respectively.
We will take a new character for each one which will represent the mean of all the squared values.
So, to take the mean we will take all the y values, and divide them by n and call it y as shown below.
Multiplying both sides of the equation by n we get the equation as:
Finally, we will get following equation as shown below:
We can see from the above equation that we are having m and b as the coefficients of the equation.
Now our aim is to find value of m and b which minimizes the function.
So, how to find it?
We will take a partial derivative with respect to m and a partial derivative with respect to b. Since we are trying to find a minimum value. So, we will take the partial derivatives and compare this value with 0 as shown below.
                                            Partial derivatives formula
                                            Partial derivatives formula
                                                        Partial Derivatives
                                                        Partial Derivatives
Taking the two equations what we received above, isolating the variable b from both, and then subtracting the upper equation from the bottom equation as shown below.
Now subtracting the first equation from the second equation we get
This is the final equation to find the value of slope m.
And the final equation to find of intercept b can be given as:
So, the Equations for slope and y-intercept
Now, let me for simplified these equations for you do that you may not be wondering that what each element represents here.
It’s pretty simple
Sum of x divided by n
                                                      Sum of x divided by n
                                                      Sum of x divided by n
Sum of x² divided by n
                                                Sum of x² divided by n
                                                Sum of x² divided by n
Sum of xy divided by n
                                                Sum of xy divided by n
                                                Sum of xy divided by n
Sum of y divided by n
                                               Sum of y divided by n
                                               Sum of y divided by n
As of now, you may feel quite comfortable with the equation and concepts of the MSE.
So, to make it clearer and give you a deeper understanding. Let’s take an example.


Let’s take 3 points on (1,2), (2,1), (4,3) and plot them on a graph. The points will look like as shown below
Let’s try to find the value of slope m and intercept b for the equation y=mx+b.
We can find the value of Sum the x values and divide by n as shown below.
We can find the value of Sum the y values and divide by n as shown below.
We can find the value of Sum the xy values and divide by n as shown below.
We can find the value of Sum the x² values and divide by n as shown below.
As of now, we have calculated the different values of line equation, let’s put them together and calculate the value of slope m and intercept y.
After we’ve calculated the relevant parts for our M equation and B equation, let’s put those values inside the equations and get the slope and y-intercept.
Slope (m) calculation
y-intercept calculation
Let’s put all the above calculated value into the line equation y=mx+b.
So, this is the equation line which will give us the best-fit regression line.
Let’s draw the line using the above equation and see how the line passes through the lines in such a way that it minimizes the squared distances and provides us a best-fit regression line.
<b>                                   Regression line that minimizes the MSE.</b>
                                   Regression line that minimizes the MSE.

When to use mean squared error

We can use MSE when you are dealing with any type’s regression problems, believing that our target, depends on the input, which is normally distributed, and want large errors to be significantly more penalized than small ones.
Example: You want to predict future stock price prediction. The price is a continuous value, and therefore we want to do regression. MSE can here be used as the loss function.

Implementation of MSE using Python

# mlp for regression with mse loss function
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot

# generate regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=1)

# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]

# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

# define model
model = Sequential()
model.add(Dense(25, input_dim=20, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='linear'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_error', optimizer=opt)

# fit model
history =, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))

# plot loss during training
pyplot.title('Loss / Mean Squared Error')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')


The above code first output the mean squared error for the model on the train and test datasets as
Train: 0.000, Test: 0.001
Then it will plot training and testing loss as shown below:
After reading this article, finally you came to know the importance of Mean squared error (MSE). For more blogs/courses in data science, machine learning, artificial intelligence and new technologies do visit us at InsideAIML.
Thanks for reading…

Submit Review

We're Online!

Chat now for any query