The Most Accurate Model
The model that gave the best balanced accuracy is the tuned weighted logistic model.
observations
- This graph shows the most important predictors from this model. Where is the credit score? Annual income? When people think of loans we usually think of credit score as the natural predictor of if the person can pay back the loan or not. But as we can see that is one of the 10 most important factors.
- Interest rate plays a significant role in prediction if the person will default or not. This was actually one of the most important observations that people did not understand before the crash in 2008. People signed up for loans with variable interest, and did not realize that the moment the rates go up they will lose EVERYTHING. Maybe if they would have seen this report in a few years ago they would have been saved.
- A nice thing about Logistic is that it uses a more diverse array of predictors, rather than depending a lot on one predictor, like random forest does. So in this case, even though logistic takes more time, it may be more robust for out-of-sample predictions.
Tuning
We iterated through many of sklearn's parameters for logistic regression. We found that the highest accuracy comes when we use the Newton CG optimization algorithm (which automatically uses the L2 penalty) with 300 iterations. Additionally, having a smaller C values (of 0.5) helps a lot.
Our model's accuracy is great. Let's get a bit more in-depth about its performance.