World's Best AI Learning Platform with profoundly Demanding Certification Programs
Designed by IITian's, only for AI Learners.
A data set is given to you about credit card fraud detection. You have built a classifier model and achieved a performance score of 98.3%. Is this a good model? If yes, justify. If not, what can you do about it?
Accuracy is not the best evaluation metric for evaluating the trained model with imbalanced dataset. For credit card fraud detection task, we have large number of normal transaction data and very few fraud data points.Accuracy only concentrates on true positive and true negative, so if any model predicts all data points as non fraud, then the accuracy would be higher. Our aim is mainly on focusing on to detect fraud data points.For this task precision-recall ,AUC and ROC curve will be best suited evaluation metric.
Data set about credit card fraud detection is not balanced enough i.e. imbalanced. In such a data set, accuracy score cannot be the measure of performance as it may only be predict the majority class label correctly but in this case our point of interest is to predict the minority label. But often minorities are treated as noise and ignored. So, there is a high probability of misclassification of the minority label as compared to the majority label. For evaluating the model performance in case of imbalanced data sets, we should use Sensitivity (True Positive rate) or Specificity (True Negative rate) to determine class label wise performance of the classification model. If the minority class label’s performance is not so good, we could do the following: