How to Better Unlock Untapped Potential: Harnessing the Predictive Power of Existing Data
The power of existing data
In recent years, a number of terms have become the topic of choice in everyday conversations across the globe. Whilst artificial intelligence, machine learning and data science are top of the list when it comes to data, I want to dedicate this blog to the endless pursuit of predictive power through any data sources available. Although I am all for alternative data, I find it odd that organisations are not driving the use of the data that already exists within their systems and processes.
Before we talk about existing data, let’s just go through some of the most popular alternative data sources. Alternative data here means data that is not readily available or has never been used in a specific use case, i.e., application accept/reject, collections, cross-selling, etc...
Telco account and SIM data: This is, for me, the ultimate winner when it comes to alternative data. Several studies have shown that this data type is extremely predictive of risk. Examples of the predictive features not only include how much the customer charges into their pre-paid balance, but also how many calls are initiated by the customer, how many texts received… the list is endless. In segments that don’t hold credit bureau information and specifically in developing countries, where the use of wallets as a payment mechanism has exceeded that of traditional banking products, Telcos that are able to mine this data are overtaking banks in the credit granting arena. Unfortunately, not everyone is allowed access to the Telco monopoly, making it a challenge to compete in these markets.
Mobile/Web Meta data: Several companies have tried to tap into this source, some of them successfully and achieving Gini co-efficients comparable to geo-demographic data. This data certainly has a place in decision-making, whether for risk or fraud prediction, used in combination with other sources.
Utilities, government information, etc: This is, again, powerful in predicting risk but difficult to obtain and always ‘unclean’.
Psychometric data: As a modeller, I am sceptical when it comes to this data type. It is used in predicting risk in many countries, but the jury is still out on how much it adds to the prediction, and whether this balances how much it worsens the customer journey.
Social Media: What can I say that hasn’t been mentioned already… I was so excited when this data source came out. Not so much to predict risk, but to be able to define the customer life cycle and marketing potential. It’s now so limited that I haven’t heard anyone talk about this for a long time.
The list above is not meant to be comprehensive; just to provide an idea of what’s out there and how it can be used. From this, we can conclude:
Alternative data is not available to everyone, and it can be expensive when it is.
Not all types of alternative data provide benefits in each use case. When they do, they should not necessarily be used in isolation, as their predictive power is still limited to the information they hold about a customer.
Organisations seeking the use of alternative data should first ask themselves whether they are currently optimising the use of what they already have. It is sometimes easier to tap into external sources, but in some use cases, it is still expensive and not as predictive.
Internal data sources
So, let’s discuss the top internal sources and how they can be used:
When predicting risk, it is important to examine the ability and willingness to pay and the stability of the applicant. Application data has a bit of everything and can predict risk whether you are in the bureau or not. This is key for financial inclusion and to avoid reckless decisions.
In many countries, this data is not allowed as it is deemed discriminatory. Don’t use gender or age! My question is, why not? One thing that’s important to understand is that data is correlated. Whether you use gender or not, it’s intrinsic in the performance of the accounts, so the overall score will (or should) end up with higher scores for those accounts that perform better.
If gender is predictive and shows women being better risk, the overall score will show women have higher scores. If you don’t use the data, you are, in my opinion, making a worse decision for the wrong reasons, and in fact, may end up discriminating against the wrong people.
Application data should be used in combination with other sources if they exist, as it provides Gini co-efficients of around 30%.
So many organisations pay a fortune to go to the credit bureau for external information, but they have so much already available on their customers’ history. Most clients I work with have a challenge connecting their application and account systems, so predictive features such as customer time on books, worst performance, customer holding, etc... are overlooked.
Whilst it is true that some of this data may reside in the credit bureaux, it is mixed with everyone else’s and lacks the capacity to include loyalty and the organisation’s experience with that customer.
If I were to add something to the application data when making decisions for new products, this would be it. The uses of account data are endless throughout the full credit cycle and yet, so few organisations use it to its full potential. With a power of +/- 60% Gini, it seems such a waste not to invest in it.
Credit Bureau data
This is a powerful source as an early alert indicator and policy knockout but only applicable to the credit aware population. If your portfolio holds a significant percentage of customers with no bureau, one needs to consider adding other sources.
Credit bureau information should be used in different ways to accept/reject, collect, cross-sell, but always in combination with the application and account data. Credit bureau scores tend to be correlated to postal codes in some countries, but not much else in the application data, so using this source with application and account data is a proven strategy for all portfolios.
Although some would not classify this as existing data, it is. Most credit grantors would request copies of statements. All one needs to do is convert the PDF to existing features and there you have it, an improved affordability calculation.
I would have like to add items like purchasing behaviour or digital data, which can improve the decision across the credit life cycle, but perhaps we leave that for another day.
The main point of this article was to re-assess the power of the data organisations have and handle every day. Whilst using alternative data can be beneficial, it should be best combined with the full power of what you already have and brought in only when it adds to that.
Until next time…