How Our Model Performs
We evaluate our model with a few different metrics.
First, we test how its TNR (True Negative Rate), TPR (True Positive Rate), and a few other tests.
- Another way we could have dealt with imbalanced data is by choosing a different threshold for delimiting 1 and 0 outcomes. However, the default threshold of 0.5 is actually pretty good for our purposes of predicting defaults. It gives us a 77% TPR.
- This plot demonstrates the tradeoffs between the different metrics. As TNR goes up, TPR goes down. F1 score (a metric that takes into account both the precision and recall), reaches its maximum at around 0.15.
- Even though the default threhold of 0.5 gives us a lower F1 score (a metric that takes into account both the precision and recall), thresholds with higher F1 would have to sacrifice TPR. Since we care about predicting positives (defaults), then we are okay with sacrificing a little F1 to gain a higher TPR.
The ROC curve is another way of assessing our model. We want a model that produces a curve that's highly up and to the left, representing a high sensitivity and a high specificity. It looks like our curve for the weighted_tuned_logistic is doing a good job of that.
Looks like our model is pretty good - now let's see what it actually predicts! What the common features of a loan that will default are.