Published: 08 July 2015
What if your Facebook friends and professional contacts on LinkedIn were used to determine your credit worthiness for that new car loan or mortgage you applied for? Hopefully your friends are wealthy or at least keep current on their payments with no defaults or fraudulent activities. In a 2013 article from the economist.com, they showed examples of companies using social media to assess potential clients and their LinkedIn contacts were the gold standard in determining “character and capacity” to repay.
To take it a bit further, an estimation of how quickly you will find a new job after being laid off can be derived from rating your contacts. A start-up company based in Hamburg by the name of Kreditech uses Facebook data to help determine who qualifies for their small loans, as one of their founders noted that “much is revealed by your friends”. The online loan process requires applicants to grant access to their Facebook or another social network account for a limited time. Applicants whose friends “appear to have well-paid jobs and live in nice neighbourhoods are more likely to secure a loan” while having “a friend who has defaulted on a Kreditech loan” increases the likelihood of your loan being rejected. We understand the challenges companies face when trying to assess the credit risk of their customers, especially small start-ups and differentiating the good applicants from the bad is not an exact science. The challenge however, is erroneously categorising a customer as high risk thereby damaging their overall credit profile and in some cases relying so heavily on algorithms to do all the work with little or no overview. This presents a major risk to applicants who are blacklisted and unfairly disqualified. It makes you wonder if your mother’s consistent reminder to be careful of the friends you associate with has come back to haunt you somehow.
Big Bad Data Challenges
Sadly, your friends may be the least of your concerns and the idea that you need to keep your friends close but your enemies closer may just apply to your Pharmacist. In the US, insurance companies have access to commercial prescription databases which provide quick and affordable digital data on potential clients. Bloomberg published a damning article on how prescription data was used to deny individual health insurance coverage and applicants had no idea their drug file records influenced the assessment. IntelliScript, one of the companies providing prescription data says “it sells prescription data to more than 75 health, life, and long-term-care insurance companies”. In some cases data provided was inaccurately linked to a patient and the implications here are even more damaging to the unassuming consumer seeking health insurance coverage not knowing the real reason behind a rejection. While government calls for disclosure, there are glaring privacy and security concerns that have to be addressed.
These challenges and concerns will be magnified as we move full speed ahead into the Big Data cornucopia. We must be mindful that not all data is good data and bad analytics can be more damaging than having no data at all. When working with big data, analysts will have to contend with the statistical phrase - correlation does not imply causation. Some correlations are spurious, meaning they are simply false or invalid and should not be explored further and no sensible decisions made on the basis of that correlation. The correlation between films Nicolas Cage appeared in with the number of people who drowned by falling into a pool may be coincidental but not valid if we are going to derive the hypothesis that Nicolas Cage movies is linked to deaths in swimming pools. The correlation between two things does not always mean that one causes the other as a third factor can be involved or simply random chance.
Bad vs. Good Analytics
Let us take an example of bad analytics from the big data project that used data from Twitter and other social media platforms to predict U.S. unemployment rate. The researchers used sentiment analysis (also known as opinion mining) to look for correlations between the total number of words per month and the monthly unemployment rate. The words included “unemployment”, “jobs” and “classifieds”. Sentiment analysis uses natural language processing (NLP), text analysis and computational linguistics to identify and extract subjective information in source materials (in this case, social media data). The researchers started noticing a huge spike in tweets containing one of the key words. The project got more sponsorship with the belief that there was some relationship worth exploring but the researchers did not notice that Steve Jobs had died. Professor Gary King from Harvard University highlighted this example and noted that these errors happen all the time in sentiment analysis by word count and other analytics programs that have not been customised or where human oversight is missing.
Some companies have achieved a tremendous level of success harvesting big data and applying analytics and statistical models to understand, predict or even change the behavioural habits of their customers. Target’s “pregnancy prediction” model, based on a deep understanding of the habit loop could estimate expecting mothers due date to within a small window allowing Target to send coupons tied to very specific stages of her pregnancy. The accuracy of the model initially made expecting mothers uncomfortable because this was not information they had shared with the company. Imagine an expecting mother entering her second trimester receiving coupons for only baby related items. Target revised their marketing strategies to overcome this but there are certainly lessons to learn from the success of Target’s good data analytics.
Companies undertaking any big data project must be mindful of the impact bad analytics can have on business decisions. There is a major risk when we allow algorithms and software to make decisions with no human involvement. The reality is that we have more success stories than failures but the consequences of decisions made from bad analytics can be detrimental.
About the author: Raquel Seville [@quelzseville] is a Business Intelligence Professional, SAP Mentor, BI Evangelist, Founder: exportBI | Co-Founder: eatoutjamaica. To find out more, please visit her about me page.