empowering borrowers in their choice of lenders: decoding...

Empowering Borrowers in their Choice of Lenders:Decoding ServiceQuality from Customer Complaints

Aniruddha M. [email protected]

School of Informatics, Computing, and EngineeringIndiana University

Bloomington, Indiana

David J. [email protected]

School of Informatics, Computing, and EngineeringIndiana University

Bloomington, Indiana

ABSTRACTWhen shopping for lenders, most consumers choose a financialinstitution based on just a few key factors: the interest rate, thedistance to the lender’s nearest branch, an existing relationship withthe lender, and the reputation of that lender. Butmost consumers failto consider an important element that will be key to their long-termsatisfaction: whether the customer service provided by the lenderis commensurate with the price. Our underlying assumption in thispaper is that a consumer’s personality traits are associated with theissues they will face. We use state-of-the-art cross-domain wordvector space mapping and representative trait vectors in this spaceto estimate ten personality traits corresponding to each text and usetopic modeling for finding the topics in a complaint. We then usetwo modified collaborative topic regression methods to create twocomplaint topic trait spaces for each lender, and test our underlyingassumption by using statistical tests for this unsupervised learningproblem in three cases: mortgage loans, student loans, and paydayloans. We propose that lenders could be recommended for a specificuser by analyzing this space, recommending a lender with thefewest number of complaints per retail customer of that lender inthe complaint space neighborhood of the customer. We suggestfuture work that may be undertaken for the three types of loans,including the possibility that lenders evaluate their service froma customer’s perspective to track customer satisfaction over time,and extensions to other parts of the service economy.

CCS CONCEPTS• Information systems → Recommender systems; Data ana-lytics; • Social and professional topics → User characteristics; •Human-centered computing → Collaborative and social com-puting.

KEYWORDSComplaint, finance, customer service, personality traits, cross-domainword vector mapping, recommender system, collaborative topicregression, unsupervised learning

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’19, June 30-July 3, 2019, Boston, MA, USA© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-6202-3/19/06. . . $15.00https://doi.org/10.1145/3292522.3326021

ACM Reference Format:Aniruddha M. Godbole and David J. Crandall. 2019. Empowering Borrow-ers in their Choice of Lenders: Decoding Service Quality from CustomerComplaints. In 11th ACM Conference on Web Science (WebSci ’19), June30-July 3, 2019, Boston, MA, USA. ACM, New York, NY, USA, 8 pages.https://doi.org/10.1145/3292522.3326021

1 INTRODUCTIONHistorically, lenders have either not been evaluated by their cus-tomers for quality of service, or they have been assessed usingaggregate-level reports based on subjective ratings in surveys. TheU.S. Consumer Financial Protection Bureau [13] reports that al-most half of all borrowers do not shop around when arranging amortgage loan. In the absence of an easy way to compare lendersbased on their customer service, consumers tend to compare basedjust on price (interest rate), the existence of a relationship with thelender, the distance to the nearest branch, and the reputation of thelender [13]. In general, whether for mortgages or other types ofloans, there is almost no emphasis on customer service [6, 15, 51].

There is thus a need to create tools and information for con-sumers to make better-informed decisions based on the quality ofservice provided by a lender, and how it will conform with theirown expectations. While machine learning has been applied ex-tensively to the financial sector, we are not aware of any workthat analyzes complaints from a customer’s perspective. Even thecustomer-centric use cases of credit scoring, client-facing chatbots,and selling of insurance to customers [11] are actually formulatedfrom a lender’s perspective.

Our goal in this paper is to take a first step towards testingwhether it is possible to automatically recommend lenders to aparticular consumer based on which lenders are least likely to leadto consumer complaints. To do this, we apply machine learningand data mining techniques to a large-scale dataset of consumercomplaints, looking for patterns (topics) in them. We also analyze alarge-scale dataset of Twitter text, trying to infer personalities traitsof individual users from it. We then look for connections betweenthese personality profiles and the complaints for different lenders.We are not trying to resolve the complaints [25] better or faster.Rather, our goal is to try to identify lenders that are likely to givefewer reasons for making complaints.

1.1 Research Questions and ScopeIn particular, we investigate two specific research questions:

(1) Is the association among the set of personality traits andcomplaint topics different for each lender?

https://doi.org/10.1145/3292522.3326021

https://doi.org/10.1145/3292522.3326021

(2) Is it feasible to make personalized recommendations to cus-tomers about suitable lenders based on customer service?

To study these questions, we use three specific datasets: (1) theConsumer Financial Protection Bureau (CFPB) Complaints Dataset(as of 31 July 2018) [2], (2) the TREC 2011 Microblog Dataset [41],and (3) the World Well-Being Project’s [50] correlations betweenpersonality traits and words. We restrict our attention to threespecific types of loans: mortgage loans, student loans, and paydayloans. We do not consider temporal modeling aspects, nor do webenchmark the proxy personality trait scores. The impact of lendersselling their loans and borrowers refinancing their loan is consid-ered to neutralize each other — for groups of customers and groupsof lenders — as a simplifying assumption.

1.2 ContributionsOur main contributions are:

(1) We show it is feasible to use machine learning on a publiclyavailable complaints dataset to empower retail customerswith personalized recommendations about service providersbased on their customer service;

(2) We find a general association between the set of personalitytraits and complaint topics in the case of payday loans, andshow how lenders differ through visualizations of the latentspace;

(3) We undertake domain adaptation within the English lan-guage using Cross-Domain Word Vector Space Mapping;

(4) We propose to recommend lenders based on analyzing theneighborhood around the customer in latent topic space;

(5) We propose to use a large number of topics that include afew that are semantically interpretable by the consumer;

(6) We compare different techniques for building a joint spacebetween the topics and and personality traits; and

(7) We make our code publicly available at https://github.com/godboleam/service-quality.

2 RELATEDWORKSo far, the focus in work related to complaint data has generallybeen on the lender’s usage of the data [46, 49]. Financial firmshave reportedly lobbied against a publicly available complaintsdataset [20]. The focus has been to look at the dataset from thelenders’ perspective [35] even though the intent of the CFPB hasclearly been in favor of giving primacy to the interests of the bor-rower. An unspoken adversarial relationship between lenders andcustomers is unnecessary and perhaps short-sighted [10]. Giventhat the CFPB oversight seems to have not affected the overall vol-ume of mortgage lending [24] there may not be a trade-off betweenlenders’ interest and customers’ interests even in the short-term.

Customers often regret — when they even exercise a choice — aninterest rate-based selection of a lender [30]. J.D. Power Ratings saystop-performing banks have fewer reported complaints and prob-lems [31]. Their 2017 study was based on more than 5,784 responses.Deloitte suggests modification to products and processes based oncomplaint analysis [21], and this development is encouraging eventhough it is from a lender’s perspective.

Despite our best efforts we could not find prior literature thatexplores the CFPB complaints dataset for either exploring the as-sociation between personality traits and complaint topics, or formaking personalized recommendations to customers based on alender’s customer service.

3 METHODSOur goal is to analyze a large-scale, publicly-available dataset ofcomplaints against lenders, and to develop a word embedding spaceto connect the complaints to personality traits inferred throughanalyzing consumers’ Twitter feeds. A major challenge in makingthis connection is that the style and vocabulary of complaints istypically very different from those of informal tweets. Moreover,we need complaints topics that are interpretable by humans, andwe need to evaluate the proposed methodology without labeledground truth data.

Our overall approach is as follows. We created word embeddingsseparately for the complaints text and the tweets text. Only theEnglish tweets were considered. We then used a cross-domain wordvector space mapping so that domain adaptation helps in two cases:for finding word vectors of words that are correlated with the fivepersonality traits (on bipolar scales), and for finding the proxytraits for a Twitter users. We separately use topic modeling for thecomplaints dataset (without using the word vectors as [22] wasnot conclusive about using word vectors for topic modeling). Weuse the probability of that topic as a score for that topic in thecomplaint. We then use modified Collaborative Topic Regression(CTR) by two methods to build a joint space between complainttopics and personality traits.

We now describe the approach in more detail.

3.1 DefinitionsWe define a complaint user to be a retail customer who has made acomplaint in our dataset. A Twitter user is a prospective borroweror someone who has sought a personalized recommendation inthe past. We assume that the text of the tweets of a user reflectsomething about the person’s personality. The proxy trait scores arethe ten scores associated with the five personality traits, on bipolarscales. The complaint language is based on the complaint narrativetexts in the CFPB complaints dataset. This includes all complaintsirrespective of the type of the loan. The Tweet language is based onthe tweet text (detected as English) in the TREC 2011 MicroblogDataset. Although complaints and tweets are written in the samelanguage (English), the styles and vocabularies are quite different,as if they were different dialects. We build a Complaint Topic TraitSpace to connect complaints and personality traits.

3.2 DataWe use three main datasets, collected as of July 31, 2018.

3.2.1 Complaints data. Around 1.5 million complaints [16] havebeen sent to the CFPB, and the publicly available dataset has 1,087,269(1.08 million). This is probably the largest dataset of its kind. About30% (0.307 million) have accompanying narratives or unstructuredcomplaint text.We believe this complaints dataset can be considereda reasonable proxy for measuring customer service. It is believed

https://github.com/godboleam/service-quality

https://github.com/godboleam/service-quality

that the pain for a loss is felt much more than the joy felt for a sim-ilar magnitude of gain [32]. This asymmetrical relationship impliesthat complaints are very valuable for evaluating customer service,and a recent JD Power survey reached this conclusion for the fi-nancial sector in particular [31]. A variety of negative emotions areseen in almost half of the complaint narratives about lenders [23].

We use the complaints data independently to generate wordembeddings and for topic modeling. In the case of the word embed-dings, we apply the following pre-processing steps: (1) convert tolower case, (2) drop most punctuation and symbols (but retain ’, $,and %), (3) replace all numbers with ∗, (4) replace all tokens suchas XXXXX (which indicate private information that was scrubbedby the CFPB before releasing the data) with &, (5) remove extraspaces, and (6) use utf-8 encoding. We then apply the fastText [12]Python wrapper to create the word embeddings.

The complaints dataset does not include unique identifiers foreach customer, so we cannot detect if a single customer makesmultiple complaints. We thus make the (naive) assumption thateach customer in the dataset has made exactly one complaint.

3.2.2 World Well-Being Project data. We use the gender and age-controlled list of 1-gram word correlations for the five personalitytraits (on bipolar scales) from the World Well-Being Project [50].Most of these are words used in informal English, like the languageoften used on Twitter.

3.2.3 Tweets data. We use the TREC 2011 Microblog dataset [41],which had 10,617,146 (10.6mn) tweets as of mid-2018. The tweetscorpus was downloaded using Twitter Tools [39] and the TREC 2011Twitter Collection Downloader [5]. Language detection done usinga port of Google’s language detection library to Python [3] indicatedthat around two-thirds of the tweets are not in English. Only theEnglish tweets were considered for creating the tweet languageword embeddings, and we used the following preprocessing beforeapplying fastText [12]: (1) drop most symbols and punctuation (butretain ’, !, and @), (2) replace all URLs by ˆ, (3) replace all numbersby ∗, (4) replace all Twitter handles with @, (5) remove extra spaces,and (6) use utf-8 encoding.We believe that our collection of over onemillion tweets will probably suffice for building word embeddings,as more will likely not give significantly superior results [38].

3.3 Word Vector Space and Cross-DomainWord Vector Space

We address the challenge of inferring personality traits from com-plaint and Twitter text using cross-domain word vector space map-ping, and by using personality trait proxies (see Section 3.5) in thissame space. The fastText Python wrapper was used to create acomplaint language word vector space and a tweet language wordvector space, both of 200 dimensions. Earlier uses of cross-domainvector spaces has focused on unsupervised translation between twolanguages [9, 37]. We used the state-of-the-art Vecmap opensourceproject code [1] to create a mapping from the complaint languagespace to the tweet language space. This algorithm includes Cross-domain Similarity Local Scaling (CSLS) proposed by Lample etal. [37] We used the ‘identical’ parameter to specify that words inthe tweet text that are common to the complaint text ought to have

CFPB Classification as available in the dataset

Our Classification Product Sub-product

Mortgage loan Mortgage Conventional home mortgageConventional fixed mortgageConventional adjustable mortgage (ARM)

Debt collection MortgageMortgage debt

Student loan Student loan Federal student loan servicingPrivate student loanNon-federal student loan

Debt collection Federal student loanFederal student loan debtNon-federal student loanPrivate student loan debt

Payday loan Payday loan —

Debt collection Payday loanPayday loan debt

Table 1: Mapping from CFPB loan type classifications to ourthree classifications (mortgage, payday, student loan).

the same meaning. The Vecmap output word2vec [42] embeddingswere consumed using the gensim [45] library.

3.4 Topic Modeling of ComplaintsOut of the 1.08 million complaints in the dataset, the total numberof complaints with a narrative is 307,120. Of these, 128,314 are suchthat the lender’s response is not disputed by the customer. We usedonly the undisputed complaints for topic modeling, since these aremore likely to be genuine complaints and thus higher-quality data.We considered three types of loans, mortgage loans, student loans,and payday loans.

The CFPB’s classification and nomenclature of products andissue options underwent a change in April 2017 [14]. Also, theproduct and debt collection are identified separately by the CFPB.The map of our classification of the loans to the CFPB classificationis given in Table 1. The nomenclature, both prior to and after April2017, is aligned with how banks are organizationally structured,which means they are designed from the perspective of efficientissue resolution by a lender and not from a customer’s perspective.

Of the over 100,000 undisputed complaints, we have 12,772 formortgage loans, 8,687 for student loans, and 2,698 for payday loans.For student loans, we considered only lenders who are involvedin both product and debt collection of both Federal and privatestudent loans, to be able to analyze the complaints throughout thelife of the product. We assume that a company is involved with aproduct or with debt collection if there is at least one complaintagainst it.

About 58% (76) of the mortgage debt collection companies areinvolved in both mortgage product and mortgage debt collection.There are 92 companies associated with Federal student loan debtcollection, 154 with Private student loan debt collections, and 54who do both. Thirty-six companies are associated with both Fed-eral and private student loans and have at least one undisputedcomplaint with a narrative. About 60% (243) of the payday productcompanies are involved in both payday loans and payday loan debtcollection.

bank payment loan go i from my saying download being info XXXX pay studentcompany account paying loans payments I mortgage the home house property WeThe XX because Bank America Wells Fargo They ,u’’n t’’ $ time told \n\n Asmoney debt he day work said called calls phone number He payday Mae Salliemonth debt receive pay go send ask sell borrower facility time year monthrobo middle check debt daughter week year session pg xxxxpage mis friendcredit yard human citi acre someone i. chairman name manner refer corporationgrandmother cc cycle people girlfriend onset resource preciousdriver gal mandaughter georgia agi boundrie set thank ed bill significance bless what i kfedloan hotel crew head performer c. bofa. hail agent me.all that:1.1 crystalstatusdespite layer hr midst absence situation error modification loss!?!?!?!.doing grandmother retirement st husband sir girl what i lady claiming seniorcolleague birth use drafting sake red path navient yrs xxxxgreatlake ringthousand dog d definite minus bofa. co. speaker head difference box sewerspart loan/ init service director boss site means est brother hundred kin noonyesterday essential avenue tomorrow sent whom slrp female since brotherinlawspage usefuleness trid purple tid elbow lilac circuit sending abbreviationwoman door name because weekend wk basic thing rank navient someone 48hrsdinner it me progress member dismiss

Figure 1: List of 222 custom stop words.

3.4.1 Preprocessing. For topic modeling, we used spaCy [27] totokenize complaints. Only the lemmatized version of tokens taggedas nouns by spaCy were used, as these tokens generally seemedto offer better interpretability [22]. We then applied HierarchicalDirichlet Process (HDP) topic modeling using the gensim library toidentify 150 topics for each of the three types of loans. We foundin initial work that the topics were often dominated by frequentbut not distinctive words, making it difficult to interpret the topics.To refine our topic models, we used a heuristic procedure thatiteratively identified these words and added them to a list of stopwords:

(1) An HDP model’s 150 × k inference matrix, where k is thenumber of complaints (sampled from the complaints forthat type of loan, sampling was required as spaCy—used fortokenization and Parts of Speech tagging—by default workswith up to a million words) was fetched using the gensimlibrary. We used the spaCy default stop word list in the firstiteration.

(2) The probabilities in this matrix were set to zero if they werein the bottom or top quartiles (otherwise the interpretabilityof topics became difficult when attempted at the end of aniteration).

(3) For each topic the median probability (from among the kcomplaints) was considered to be representative for thattopic.

(4) We then found the top 10 topics.(5) Based on the keywords in the 10 topics (across mortgage

loans, payday loans, student loans), appropriate additionalstop words were added to the stop words list.

(6) Repeat above steps five times.In all, 222 additional custom stop words were added to the default

spaCy list in the four iterations. The topic modeling was undertakenfive times for each of the three types of loans. The list of the customstop words are shown in Figure 1.

3.4.2 Scoring. A complaint topic score is the probability of a spe-cific topic being associated with a specific complaint. We did notrescale these scores as this would force all topics to have similarscores, which would implicitly make the unnecessary and unrea-sonable assumption that all topics are equally important.

Chang et al [19] found a trade-off between predictive perplex-ity and interpretability of latent topics. In our work, partial inter-pretability — i.e. having only a few of the 150 topics as interpretable— is acceptable because we only need a small set of topic importancescores from a Twitter user. The higher the relative importance score,the more important the issue/topic to that customer.

3.4.3 Sample topics. To give an idea of the topics found by our anal-ysis, we give two examples. The top seven words in Topic #13 in ouranalysis included the words disbursement, afternon, violationno,confirmation, correspondence, bankrupty, and postcard, all withprobabilities of about 0.003. This topic seems to correspond withcomplaints about rights under bankruptcy not being respected.Topic #141 seems to correspond with fraudulent advertisement,with words heat, fault, 839,i, work, checking, advertisement, andclerk, again with probabilities about 0.003.

3.5 Personality Trait ProxiesBig5 is a popular and standard personality test [18] that uses bipolarscales for extraversion, neuroticism, agreeableness, conscientious-ness, and openness. Correlations between bipolar scales of thefive personality traits and associated words are available [50]. Ourintuition behind developing personality trait proxies is to inferpersonality trait-like information for word embeddings in the cross-domain vector space by considering the similarity of these wordembeddings with the proxy representative trait vectors. For each ofthe ten (two polarities for each of five scales) traits, we compute aweighted average word vector in the vector space, yielding the tenproxy representative trait vectors. The weights are the correlationsbetween the personality trait and the words associated with thatpersonality trait. For each word for which there exists a word vectorin the cross-domain word vector space, we find that word’s similar-ity with a proxy representative trait vector. The same is done forother words in the given text (which could be a complaint narrativeor a tweet). Then we take the average of the similarities to get asingle proxy trait score for that text, and repeat for the other ninetraits.

3.6 Modified Collaborative Topic RegressionHuman interpretability of topics is useful in practice so that aprospective borrower could give her input on specific topics thatare of interest to her, although it is sufficient for only a subsetof the 150 topics to be interpretable. We address the challenge ofhuman interpretability with a modification to Collaborative TopicRegression [54]. In the case of each complaint, we compute 160scores: 150 scores corresponding to each of the 150 topics and10 scores for each of the proxy personality traits. We use rootmean squared error (RMSE) as the loss function given that we aredoing a regression [55]. The complaints are mapped to a ComplaintTopic Trait Space using the predicted values for the 160 scores.We use the Funk SVD implementation of the Surprise [28] library.Additionally, we use a hybrid (neural network) implementation ofthe Spotlight [7] library which uses a bilinear neural network —which in turn uses the Pytorch library— for another such ComplaintTopic Trait Space. In all we have two Complaint Topic Trait Spacesfor each of the three types of loans: mortgage loans, student loansand payday loans.

Type of loan Sample Size Complaint Topic Space RMSE

Mortgage 12,772 Funk SVD 0.053Hybrid 0.054

Student 8,687 Funk SVD 0.055Hybrid 0.055

Payday 2,698 Funk SVD 0.059Hybrid 0.059

Table 3: Results of collaborative topic space regression, fordifferent types of loans and different approaches.

For Funk SVD, we considered 4 latent factors, one epoch, anddefault values for all other hyperparameters. The hybrid methodconsists of a set of independent fully-connected layers for the user,and a set of independent fully-connected layers for the scores (com-plaint topics, personality trait proxies), and the outputs of thesetwo sets are combined by a dot product [36]. We increased theregularization parameter (as compared to the default values of theSpotlight library) in order reduce the instability in the predictionspace [17]. Key hyperparameter values for the hybrid method aregiven in Table 2. Table 3 reports the RMSEs for different approaches.

Table 4 presents sample Complaint Topic Trait Space visualiza-tions, each corresponding to the first two principal components.Each gray dot corresponds to a complaint. Each red dot correspondsto a complaint about the concerned lender. The blue dot correspondsto a prospective borrower and is discussed in detail later in thispaper.

3.7 Challenges in evaluationThe RMSE gives some indication of whether the predicted Com-plaint Topic Trait Space is an acceptable generalization of the data,but is otherwise of limited utility [8]. It is not a reliable metric in thecase of predicting scores for a customer who interacts for anotherloan or with another lender in the future, or in the case of a newcustomer [33]. Nevertheless, our RMSE scores seem to suggest ageneralization in the data compared with the maximum scores.

A major challenge in evaluating our work is that it is a case ofunsupervised learning, and we do not have even a small labeledsubset of the dataset. Given the large number of lenders, a con-trolled experiment would not be practical, so the best lender fora specific borrower — from a customer perspective — cannot beknown with certainty or with a high confidence. While finding atrue best lender is impractical, given that an alternative baseline is arandom selection from a service quality perspective, we believe thatrecommendations to the customer (discussed in the next section)could be empowering even if the improvement (given that servicequality is not the only factor) were incremental.

3.8 Making personalized recommendationsTable 4 shows samples visualizations of the topic trait embeddingspaces. For each of the two techniques (Funk SVD and hybrid), andfor each type of loan, the figure shows the spaces created from thecomplaint data of two (anonymized) banks or lenders. The blue dotscorrespond to a hypothetical prospective mortgage loan borrower(one of the authors!) using tweets from the hypothetical prospectiveborrower’s Twitter handle. The Python-Twitter library [4] was

used to fetch tweets (retweets are excluded) and infer the proxypersonality traits. The hypothetical proxy borrower is asked togive relative importance scores for two randomly selected topicsfrom the set of interpretable topics. Modified Collaborative TopicRegression is used to find the predicted 160 scores (predicted scoresfor the 10 proxy personality traits and two complaint topics are alsoincluded). The first two principal components are used to visualizethe hypothetical proxy borrower as the blue dot.

We hypothesize that personalized recommendations can bemadeby finding a lender with the least number of complaints in the neigh-borhood of the prospective borrower, normalized by number ofcustomers for that lender. The U.S. Federal Reserve publishes thenumber of branches of various lenders, and we use the number ofbranches as a proxy for the number of retail customers. One signif-icant limitation of this proxy is that it ignores other (non-branch)channels of lending, and moreover assumes that all the consideredlenders make loans of different types in a similar proportion, whichis a significant approximation [52]. Unfortunately, the names of thelenders in the CFPB dataset and in the Federal Reserve StatisticalRelease [47] are often not identical, so we used the Levenshteindistance [43] to map between lender names. In the case where alender is missing, we conservatively consider that lender to haveonly one branch.

These assumptions and approximations are significant limita-tions to the preliminary work we present here and which couldbe addressed with adequate time and effort. Because of this andbecause our intent is to present a methodology rather than a pro-duction ready system, we present anonymized lender names inTable 4.

4 RESULTSWe use the Freeman and Halton Exact Test to very conservativelyevaluate our proposed methodology. Halkidi et al [26] say that foran external criteria test the null hypothesis is that the dataset israndomly structured. We apply external criteria to the ComplaintTopic Trait Space at the lender level. It is feasible that the sub-spaceof interest has distinctive frequency distributions across lenders.Besides, groups of lenders may be similar at the aggregate level.We consider only the first principal component of the ComplaintTopic Trait Space. We discretized the first principal component intotwenty discrete levels. Then we find the frequency distributionbased on the discrete level corresponding to each complaint. Wedid not consider other principal components, so our application ofexternal validation is a very conservative test. In the case of theFreeman and Halton Exact Test, the alternative hypothesis is aboutgeneral (and not linear) association. Table 5 presents p-values of theFreeman and Halton Exact Test for Complaint Topic Trait Spacesspecified by the Funk SVD modified Collaborative Topic Regressionand the Hybrid modified Collaborative Topic Regression in the caseof the three types of loans.

According to the results, at an aggregate level and at a 0.20 levelof significance, the Complaint Topic Trait Space is distinctive forthe lenders in the case of payday loans. However, we cannot makesuch an inference in the case of mortgage loans and student loans. Itis interesting that the p-values for the Complaint Topic Trait Spacebuilt using the Hybrid method are lower in the case of mortgage

Type of loan User layers Item layers Regularization Learning rate Iterations Batch size

Mortgage (12773,64), (64,16), (16,4), (4,4) (160,4), (4,4) 0.01 0.01 1 12773Student (2699,16), (16,4), (4,4) (160,4), (4,4) 0.01 0.01 1 2699Payday (2699,16), (16,4), (4,4) (160,4), (4,4) 0.01 0.01 1 2699

Table 2: Parameter values used for topic regression.

Funk SVD Hybrid

Mortgageloans H1 Bank/Lender P1 Bank/Lender H1 Bank/Lender P1 Bank/Lender

Stud

entloans

G1 Bank/Lender U1 Bank/Lender G1 Bank/Lender U1 Bank/Lender

Payd

ayloans

C1 Bank/Lender A1 Bank/Lender C1 Bank/Lender A1 Bank/Lender

Table 4: Visualizations of complaint topic trait spaces. For each technique (Funk SVD and hybrid) and for each type of loan(mortgage, student loan, payday loan), we show a 2d projection of the embedding space for two sample (anonymized) lenders.

Loan type Sample size # samples Space p-value

Mortgage 12,772 1,000,000 Funk SVD 0.8029 ± 0.0010Hybrid 0.4962 ± 0.0013

Student 8,687 200,000 Funk SVD 0.6177 ± 0.0028Hybrid 0.5380 ± 0.0029

Payday 2,698 200,000 Funk SVD 0.0985 ± 0.0017Hybrid 0.1593 ± 0.0021

Table 5: Results of topic trait space analysis.

loans and student loans, which suggests that perhaps additionalexperimentation of the neural network topology and hyperparam-eters could lead to more distinctive spaces for the lenders (at anaggregate level). As discussed above, this test is very conservative.

Sample recommendations using this space are shown in Table 6.

5 DISCUSSIONNew datasets and open-source algorithms have made it possibleto study complaints against financial institutions at a large scale.In particular, the availability of a cross-domain word embeddingimplementation like Vecmap as an open source project has been anessential tool that has made it — in conjunction with TREC 2011Microblog Dataset and the WWBP — much easier to explore theutility of machine learning on a large complaints dataset from a cus-tomer’s perspective. Interpretability of topics is a very significantproblem and the use of modified Collaborative Topic Regressionto meaningfully use a mix of interpretable and non-interpretabletopic scores makes it possible to go beyond merely predicting basedon personality traits alone. An alternative approach would havebeen to ignore an explicit consideration of complaint topics andinstead use only personality trait embeddings [44]. This would re-quire labeled data and would be an oversimplification, ignoringvariations among people with similar personality traits but differentappreciation of pain points. Domain adaptation in the case of such

J1 Bank/Lender W1 Bank/Lender C1 Bank/Lender N1 Bank/Lender R1 Bank/Lender

(a) Funk SVD

W2 Bank/Lender B1 Bank/Lender H1 Bank/Lender C2 Bank/Lender N2 Bank/Lender

(b) HybridTable 6: Sample recommendations using the complaint topic space with Funk SVD (top) and the Hybrid method (bottom).

an alternative method would have to be implemented in a differentway, perhaps as suggested in Rieman [48].

The hybrid method for building a Complaint Topic Trait Spaceseems promising given the substantially lower Freeman and Haltontest p-values than with the Funk SVD for mortgage and studentloans. While the use of complaints as a proxy for customer serviceis reasonable it has a limitation: it implicitly ignores positive actionsby the lenders which may have a lesser but nevertheless importantrole in customer service. In case of a passive Twitter user, non-textinputs would be useful for inferring personality traits [53], howeverthe inference of the personality proxy traits in such a case wouldbe very different. Our method assumes that complaint users arerepresentative of all retail customers.

Our work underscores that publicly available complaints datasetscan be a gold mine for customers and service providers alike. Un-fortunately, it has been reported that the CFPB complaints datasetmay be withdrawn from the public [34]. In our view, in order tofurther empower the customer, more anonymized data (and notless) could be made available, including data about the originatorof the loan, servicer of the loan, and each lender’s number of retailcustomers for each type of loan by geography on a quarterly basis.

6 CONCLUSION AND FUTUREWORKWe proposed to analyze a publicly available complaints dataset fromthe retail customer’s perspective. We showed that the ComplaintTopic Trait Space is distinct for each lender in the case of paydayloans while it was not distinct for mortgage and student loans. Somepossible causes for this include a smaller intersection set of lendersoriginating loans and debt collectors, more frequent transfer ofloans (among lenders), more heterogeneity among sub-products,and heterogeneity in service quality in different geographies. Weproposed a method to make personalized recommendations aboutsuitable lenders based on the customer service provided by thoselenders. The visualizations of the Complaint Topic Trait Spaces sug-gest that the same method could be suitable in the case of mortgage

loans and student loans depending on the location of the Twitteruser in the Complaint Topic Trait Space. To confirm the same fre-quency distributions of the complaints in the Complaint Topic TraitSub-spaces in the neighborhood of Twitter users and the distribu-tions of the Complaint Topic Trait Sub-spaces for groups of lendersin the neighborhood of Twitter users, we will have to test usingexternal criteria like the Freeman and Halton Exact tests.

In line with primum non nocere, given that the Freeman and Hal-ton Exact test only tells us whether the structure of the dataset is notrandom (for low p-values), it would be prudent to use our proposedmethod only on the subset of payday loan lenders (and to mortgageloan lenders and student loans lenders based on future work) whichin a specific customer’s eyes are similar. In other words,at leastinitially, the Twitter user also ought to be asked for her shortlistof potential lenders from whom she is equally inclined to borrowand our proposed methodology ought to recommend from amongthat shortlist. In the future, lenders may analyze complaints againstthem across time to evaluate variation in quality of service. Thismay also be particularly useful for analyzing complaints before andafter the launch of new products, and refinement of products basedon such analyses. Use of anomaly detection (e.g. [40]) for findingboth nominal and abnormal points in the Complaint Topic TraitSpace could potentially give interesting insights. Metrics based onthe proportion of nominal and abnormal points could also be useful.

Competitive pressures that could result from a customer focus onquality are desirable, and could help lenders move away from “a raceto the bottom” based primarily on pricing (which also contributes—along with other factors— to unviable low interest rates whichmay be accompanied by asset price bubbles and hence a less stableeconomy). Future work could include use of the proposed methodin the case of other publicly available complaints datasets, betterinterpretability of topics by multiple experts, use of bigrams andtrigrams, experimentation with the hyperparameters and additionaltopologies for the hybrid method for building the Complaint TopicTrait Spaces, benchmarking of proxy personality trait scores, and

use of Facebook text [29] or an ensemble of tweets and Facebookposts. Mortgage loan sub-products could be analyzed separately.Federal student loans and private student loans could also be ana-lyzed separately. Analysis may also be undertaken by geography.Under the assumption that it is the meanings of words that is rele-vant for association with a personality trait, an extension to otherlanguages will also be feasible using Cross-Lingual word embed-dings. Finally, our techniques could be extended and applied toother parts of the service economy.

7 ACKNOWLEDGMENTSWe acknowledge first hearing about the belief regarding the asym-metrical effect of loss/gain associated with pain/joy from A. V. Ra-jwade (1936-2018). We acknowledge the help of Maciej Kula aboutthe appropriate class in the Spotlight library that could be modifiedfor adding more layers to the neural network. This research wassupported in part by Lilly Endowment, Inc., through its supportfor the Indiana University Pervasive Technology Institute, and inpart by the Indiana METACyt Initiative. The Indiana METACytInitiative at IU was also supported in part by Lilly Endowment, Inc.

REFERENCES[1] [n. d.]. https://github.com/artetxem/vecmap.[2] [n. d.]. https://www.consumerfinance.gov/data-research/consumer-complaints/.[3] [n. d.]. Port of Google’s Language Detection library to Python. https://pypi.org/

project/langdetect/.[4] [n. d.]. python-twitter: A Python wrapper around the Twitter API. https://

python-twitter.readthedocs.io/en/latest/.[5] [n. d.]. TREC 2011 Twitter Collection Downloader. https://github.com/cmlonder/

trec-collection-downloader.[6] 2017. How To Choose the Right Bank for You. ABC News (2017).[7] 2017. Spotlight. https://github.com/maciejkula/spotlight.[8] 2018. Implicit and explicit feedback recommenders: And the curse of RMSE.

https://resources.bibblio.org/hubfs/share/2018-01-24-RecSysLDN-Ravelin.pdf.[9] Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning

method for fully unsupervised cross-lingual mappings of word embeddings. InAnnual Meeting of the Association for Computational Linguistics.

[10] Ian Ayres, Jeff Lingwall, and Sonia Steinway. 2013. Skeletons In The Database:An Early Analysis Of The CFPB’s Consumer Complaints. Fordham Journal ofCorporate & Financial Law XIX (2013).

[11] Financial Stability Board. 2017. Artificial intelligence and machine learning infinancial services: Market developments and financial stability implications. ,11-15 pages.

[12] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016.Enriching Word Vectors with Subword Information. arXiv:1607.04606 (2016).

[13] Consumer Financial Protection Bureau. 2015. Consumers’ mortgage shoppingexperience: A first look at results from the National Survey of Mortgage Borrow-ers.

[14] Consumer Financial Protection Bureau. 2017. CFPB Summary of product andsub-product changes.

[15] Consumer Financial Protection Bureau. 2018. Buying a house: Tools and resourcesfor homebuyers. https://www.consumerfinance.gov/owning-a-home/process/compare.

[16] Bureau Of Consumer Financial Protection. 2018. Complaint snapshot: Debtcollection.

[17] Antal Buza and Piroska B. Kis. 2014. Instability Of Matrix Factorization Used InRecommender Systems. Annales Univ. Sci. Budapest., Sect. Comp. 42 (2014).

[18] Fabio Celli, Fabio Pianesi, David Stillwell, and Michal Kosinski. 2013. Workshopon Computational Personality Recognition: Shared Task. In ICWSM Workshops.

[19] Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and DavidBlei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models. In NeuralInformation Processing Systems.

[20] Stacy Cowley. 2018. Consumer Bureau Looks to End Public View of ComplaintsDatabase. New York Times (2018).

[21] Deloitte. 2015. The power of complaints: Unlocking the value of customerdissatisfaction.

[22] Fabrizio Esposito, Anna Corazza, and Francessco Cutugno. 2016. Topic Modelingwith Word Embeddings. In Third Italian Conference on Computational Linguistics.

[23] Pamela Foohey. 2017. Calling On The CPFB For Help: Telling Stories AndConsumer Protection. In 80 Law & Contemporary Problems (Forthcoming) IndianaLegal Studies Research Paper No. 356.

[24] Andreas Fuster, Matthew Plosser, and James Vickery. 2018. Does CFPB OversightCrimp Credit? . Federal Reserve Bank of New York Staff Reports 857 (2018).

[25] Boris A. Galitsky and Josep Lluis de la Rosa. 2011. Learning Adversarial ReasoningPatterns in Customer Complaints. In AAAI Workshop on Applied AdversarialReasoning and Risk Modeling.

[26] Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On ClusteringValidation Techniques. Journal of Intelligent Information Systems 17, 2/3 (2001).

[27] Matthew Honnibal and Mark Johnson. 2015. An Improved Non-monotonicTransition System for Dependency Parsing. In EMNLP. 1373–1378.

[28] Nicholas Hug. 2017. Surprise, a Python library for recommender systems. http://surpriselib.com.

[29] Kokil Jaidka, Sharath Chandra Guntuku, Anneke Buffone, H. Schwartz, and LyleUngar. 2018. Facebook vs. Twitter: Differences in Self-disclosure and TraitPrediction. In ICWSM.

[30] J.D. Power Ratings. 2016. Buyer’s Remorse Is Relatively High despite RisingSatisfaction.

[31] J.D. Power Ratings. 2017. Secret to Nationwide, Multiproduct Success in RetailBanking: Not Making Mistakes.

[32] D. Kahneman and A. Tversky. 1979. Prospect Theory: An Analysis of Decisionunder Risk. Econometrica 47, 4 (1979), 263âĂŞ291.

[33] Frank Kane. [n. d.]. Building Recommender Systemswith Machine Learning and AI. https://www.udemy.com/building-recommender-systems-with-machine-learning-and-ai/.

[34] Ted Knutson. 2018. CFPB Chief: I could make consumer complaints secret. Forbes(2018).

[35] KPMG. 2017. KPMG Client Alert, America’s FS Regulatory Center of Excellence:Complaints Monitoring and Risk Management.

[36] Maciej Kula. 2016. Hybrid Recommender Systems at Py-Data Amsterdam 2016. https://speakerdeck.com/maciejkula/hybrid-recommender-systems-at-pydata-amsterdam-2016?slide=21.

[37] Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato.2018. Unsupervised Machine Translation Using Monolingual Corpora Only. InICLR.

[38] Quanzhi Li, Sameena Shah, Xiaomo Liu, and Armineh Nourbakhsh. 2017. WordEmbeddings Learned from Tweets and General Data. In ICWSM.

[39] Jimmy Lin. [n. d.]. Twitter tools. https://github.com/lintool/twitter-tools.[40] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-based Anomaly

Detection. ACM Transactions on Knowledge Discovery from Data 6, 1 (March2012).

[41] Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, andDean McCullough. 2012. On building a reliable Twitter corpus. In SIGIR.

[42] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. EfficientEstimation of Word Representations in Vector Space. arXiv (2013).

[43] David Necas. [n. d.]. python-Levenshtein:Python extension for computing stringedit distances and similarities. https://pypi.org/project/python-Levenshtein/.

[44] Yair Neuman and Yochai Cohen. 2014. A Vectorial Semantics Approach toPersonality Assessment. Scientific Reports 4 (2014).

[45] Rehurek Radim and Petr Sojka. 2010. Software Framework for Topic Modellingwith Large Corpora. In Workshop on New Challenges for NLP Frameworks.

[46] Lucia Rahilly. 2018. McKinsey on Risk.[47] Federal Reserve Statistical Release. 2018. Large Commercial Banks: Insured U.S.-

Chartered Commercial Banks that have Consolidated Assets of $300 million ormore, ranked by Consolidated Assets.

[48] Daniel Rieman, Kokil Jaidka, H. Schwartz, and Lyle Ungar. 2017. Domain Adap-tation from User-level Facebook Models to County-level Twitter Predictions. InInternational Joint Conference on Natural Language Processing.

[49] Tom Sabo. 2017. Applying Text Analytics andMachine Learning to Assess ConsumerFinancial Complaints. Technical Report. SAS Institute.

[50] H. Schwartz, Johannes Eichstaedt, Margaret Kern, Lukasz Dziurzynski, StephanieRamones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, MartinSeligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Languageof Social Media: The Open-Vocabulary Approach. PLOS ONE 8, 9 (September2013).

[51] Laura Shin. 2014. How To Choose A Bank Account: 10 Things To Look For.Forbes (2014).

[52] Trefis Group. 2017. A breakdown of the loan portfolios of the largest U.S. banks.[53] Svitlana Volkova, Yoram Bachrach, and Benjamin Van Durme. 2016. Mining User

Interests to Predict Perceived Psycho-Demographic Traits on Twitter. In SecondInternational Conference on Big Data Computing Service and Applications.

[54] Chong Wang and David Blei. 2011. Collaborative Topic Modeling for Recom-mending Scientific Articles. In KDD.

[55] Yandex. [n. d.]. Big Data Applications: Machine Learning at Scale, Cours-era. https://www.coursera.org/lecture/machine-learning-applications-big-data/recsys-mf-ii-Bq6m2.

https://github.com/artetxem/vecmap

https://pypi.org/project/langdetect/

https://pypi.org/project/langdetect/

https://python-twitter.readthedocs.io/en/latest/

https://python-twitter.readthedocs.io/en/latest/

https://github.com/cmlonder/trec-collection-downloader

https://github.com/cmlonder/trec-collection-downloader

https://github.com/maciejkula/spotlight

https://resources.bibblio.org/hubfs/share/2018-01-24-RecSysLDN-Ravelin.pdf

https://www.consumerfinance.gov/owning-a-home/process/compare

https://www.consumerfinance.gov/owning-a-home/process/compare

http://surpriselib.com

http://surpriselib.com

https://www.udemy.com/building-recommender-systems-with-machine-learning-and-ai/

https://www.udemy.com/building-recommender-systems-with-machine-learning-and-ai/

https://speakerdeck.com/maciejkula/hybrid-recommender-systems-at-pydata-amsterdam-2016?slide=21

https://speakerdeck.com/maciejkula/hybrid-recommender-systems-at-pydata-amsterdam-2016?slide=21

https://github.com/lintool/twitter-tools

https://pypi.org/project/python-Levenshtein/

https://www.coursera.org/lecture/machine-learning-applications-big-data/recsys-mf-ii-Bq6m2

https://www.coursera.org/lecture/machine-learning-applications-big-data/recsys-mf-ii-Bq6m2

empowering borrowers in their choice of lenders: decoding...

Documents