#### World's Best AI Learning Platform with profoundly Demanding Certification Programs

Designed by IITian's, only for AI Learners.

Download our e-book of Introduction To Python

Why ReLU is used only on hidden layers specifically? What are local and global scope? Open a text file and find the longest word in the text file and find the length. How to plot Bubble plot with Encircling? What is use of Heat map ? How to plot heat map? Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training? What is TF/IDF vectorization? Which sorting technique is used by sort() and sorted() functions of python? Join Discussion

4 (4,001 Ratings)

220 Learners

Kajal Pawar

a year ago

Classification is a type of supervised learning problem,
which involves the prediction of a class label using one or more input
variables.

Basically, Classification problems that have just two labels
for the target, the variable is referred to as **binary classification problems**
and the problems with more than two labels/classes are referred to as **categorical
or multi-class classification problems**.

For example, we are
classifying

- Will, it rains today? Yes or No

- Is the email Spam or Not Spam?

Yes or No

Let me take
a simple classification example and explain to you how it actually works.

Let’s take some 10 random data points as
shown below and call it as x feature:

Now, let’s
we represent these data points on a number line as shown below.

Feature X

Let’s assign
some different color to different data points which represent different classes
or labels as shown below.

Data points with different colors

We can see
that our classification task here is quite simple and straight-forward.

Now from the
above task, we can see that it is a** binary classification task, **we can
also take this task as: **“Is the data point green” **or in a better way**,
“what is the probability of the point being green”? **

Basically,**
green points **would have a probability equal to **1** of being green** **and
the** red points **would have a probability equal to **0 **of being
green.

In the
above-mentioned scenario, green points belong to the positive class, i.e., Yes,
they are green, while the red points belong to the negative class, i.e., No,
they are not green.

Now, our
task is to build a model to perform this classification task, it will predict a
probability of being green for each of the data points. Given what we know
about the color of the points, how can we evaluate and know how good or bad are
the predictions made by our model.

This is
where the **loss function** comes into the picture, which will help us to
check whether our model is performing good or bad. It will return **high
values** for **bad predictions **and **low values** for **good
predictions.**

In our case,
for a binary classification, the typical loss functions are called as the
binary cross-entropy / log loss.

The Binary cross-entropy loss function
actually
calculates the average cross entropy across all examples.

The formula of this loss function can be given by:

Binary Cross-Entropy / Log loss

- Here, y represents the label / class (1 for the green points and 0 for the red points)

- p(y) represents the predicted probability of the data point being green for all N data points.

Let me explain you, what the formula given above actually
tells you.

For each green point (y=1), it adds log(p(y)) to the loss,
which means that the log probability of it being green. On the other hand, it
will add log(1-p(y)), which means that the log probability of it being red, for
each red point (y=0).

Let’s see how we can compute Binary cross-entropy in a visual
way first and then I will take you through how we can implement it using
python.

Let’s consider the above example only.

Data points with different colors

First, let’s
split these data points according to their respective classes, i.e., positive
and negative as shown below.

Next, let’s
build a **Logistic Regression** model to classify the given data points. The
fitted logistic regression model is a **sigmoid curve** which is
representing the probability of a point being green for any given x. It can be
given by:

Now, you
might be thinking what are the predicted probabilities of all the points
belonging to the positive class (green) or negative class (red) by our model.

Let’s me
show you how it will look actually**. **

These are
the green bars under the sigmoid curve, at the x coordinates corresponding to
the points as shown in the below figure.

And for the
negative class (red) it looks like as shown below:

Now
combining both the above figure we will get:

Since as of
now we have the predicted probabilities and let’s calculate the **binary
cross-entropy / log loss.**

Now we will
only take the bars graph which gives us the probabilities which all we need. It
can be shown as below:

As we are
trying to compute a loss, we need to penalize the bad predictions. But how?
Let’s see

If
the probability associated
with the true
class** **is** 1.0,** we need
its loss** **to
be zero.
On the other hand, if that probability is** low**,
say, 0.02,
we need its loss to be large.

So,
taking the (negative) log of the probability suits us well enough for this
purpose as the log of values between 0 and 1 is negative, we take the negative
log to obtain a positive value for the loss. To get more better understanding
we have to understand the math behind. It’s actually comes from the
cross-entropy.

From the below plot, we can see the predicted
probability of the true class gets closer to zero, the loss increases
exponentially

So, now let’s take the
(negative) log of the probabilities — these are the
corresponding losses** **of each and every point.

Finally, then we compute the mean of all
these losses as shown below

So, the calculated binary cross-entropy / log loss of the taken example here comes out to
be **0.3329.**

Let’s
us implement it on python

```
# import libraries
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import log_loss
import numpy as np
x = np.array([-2.2, -1.4, -.8, .2, .4, .8, 1.2, 2.2, 2.9, 4.6])
y = np.array([0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0])
logr = LogisticRegression(solver='lbfgs')
logr.fit(x.reshape(-1, 1), y)
y_pred = logr.predict_proba(x.reshape(-1, 1))[:, 1].ravel()
loss = log_loss(y, y_pred)
print('x = {}'.format(x))
print('y = {}'.format(y))
print('p(y) = {}'.format(np.round(y_pred, 2)))
print('Log Loss / Cross Entropy = {:.4f}'.format(loss))
```

```
x = [-2.2 -1.4 -0.8 0.2 0.4 0.8 1.2 2.2 2.9 4.6]
y = [0. 0. 1. 0. 1. 1. 1. 1. 1. 1.]
p(y) = [0.19 0.33 0.47 0.7 0.74 0.81 0.86 0.94 0.97 0.99]
Log Loss / Cross Entropy = 0.3329
```

Let’s take
an and write a full python code and calculate the binary cross-entropy / log
loss.

```
# mlp for the circles problem with cross entropy loss
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=1000, noise=0.1, random_state=1)
# split into train and test
n_train = 500
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='sigmoid'))
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=200, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()
```

The
above code first output the binary cross entropy for the model on the train and test
datasets as

```
Train: 0.840, Test: 0.853
```

Then
it will plot training and testing loss as shown below:

After
reading this article, finally you came to know the importance of Binary Cross-entropy / log loss**.
**For more blogs/courses in data science, machine learning, artificial
intelligence and new technologies do visit us at InsideAIML.

Thanks
for reading…

We're Online!

Chat now for any query