Data Exploration: Loan Statistics

Basic stats about the LendingClub data.

 

First steps

We analyze the loan data made available by Lending Club at their website. They do not provide a file with the collated data, and only have individual files for different time frames. For example, for their general loan data, they have files for 2007-2011, 2012-2013, 2014, 2015, 2016 Q1, and 2016 Q2. This is an uneven division of years, which makes the data exploration slightly difficult. They also have a different division of years for their data on loan applications that were declined, which we won't focus on.

Our first task was therefore to import all the data into our workspace, create a dataframe, and start appending all the different files. We then started looking at the different loan qualities that were described in the data.

loan Statistics

The first item we looked at was the distribution of the loan statuses.

Screen Shot 2016-12-14 at 10.23.57 AM.png

The most common status is “current” - meaning, the loan is still active, and so nothing more can be said about it. However, the second most common status is “fully paid,” which is great. 

We next explored the amount of money that was loaned for each of those loan statuses. The shapes of their distributions are all pretty similar across the different loan statuses: the most common loans are in the $5,000-$10,000 range, while loans that had a lot more money (in the >$3,000 range) were the least common.

Borrower Statistics

Why are people taking out loans? Below are the reported reasons for the loans:

However, we caution against taking this data blindly - these are just the reported reasons for the loans.

 

Now that we've explored the general layout of our data, let's take a closer look at some of the predictors, like income or interest rate.

←Home                                                                                                         Simple Predictors→