targeted bank marketing campaign research paper (predictive analytics)
TRANSCRIPT
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
1/71
1
ST 635Statistics Project
Team members:
Chih Ying, Lee
Praveena Mani
Sandipan Sen
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
2/71
2
Table of Contents
INTRODUCTION ............................................................................................................................................. 4
Objective ................................................................................................................................................... 4
Dataset description ................................................................................................................................... 4
Data validation .......................................................................................................................................... 4
ANALYSIS ....................................................................................................................................................... 6
Description of variables ............................................................................................................................ 6
Age ........................................................................................................................................................ 6
Job ......................................................................................................................................................... 7
Marital ................................................................................................................................................... 7
Education .............................................................................................................................................. 8
Default ................................................................................................................................................... 8
Balance .................................................................................................................................................. 8
Housing ................................................................................................................................................. 9
Loan ....................................................................................................................................................... 9
Contact .................................................................................................................................................. 9
Day ...................................................................................................................................................... 10
Month ................................................................................................................................................. 10
Duration .............................................................................................................................................. 10
Campaign ............................................................................................................................................ 11
Pdays ................................................................................................................................................... 11
Previous ............................................................................................................................................... 12
Poutcome ............................................................................................................................................ 12
HYPOTHESIS ................................................................................................................................................ 13
METHODOLOGY .......................................................................................................................................... 14
Identification of important variables ...................................................................................................... 14
Decision rules .......................................................................................................................................... 15
Customers least likely to subscribe ..................................................................................................... 20
Customers most likely to subscribe .................................................................................................... 20
Underlying patterns among variables ..................................................................................................... 21
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
3/71
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
4/71
4
INTRODUCTION
Objective
We want to predict the chance that a customer will subscribe to a Certificate of Deposit. A
Portuguese banking institution conducted a mass marketing campaign to sell CD subscriptions
back in 2008-10. Our goal is to analyze the socio-economic life style of customers contacted as
part of this campaign and derive a statistical model based on which we can predict the outcome
for any similar marketing campaign from the Portuguese bank in the future. The idea is to
develop a targeted marketing strategy on the basis of patterns observed in the historical data.
Dataset description
We have identified a dataset related to a direct marketing campaign of a Portuguese banking
institution. The dataset contains detailed information on potential customers who were contacted
as a part of the campaign. Data was randomly collected during the period May 2008 to
November 2010 by making phone calls to the clients. Often more than one contact to the same
client was required to access information on whether the client would subscribe to the CD. Our
dataset consists of 17 different attributes, which are a combination of 7 numeric and 10
categorical variables, and has a sample size of 45,211 records. Various parameters such as age,
job, marital status, credit default, education, etc. have been taken into consideration.
Data validation
During our data validation we came across a few variables that had unknown values but werent
classified as missing by our data source. Lets consider the variable poutcome, which signifies
the outcome of a previous marketing campaign. Values of success or failure are self-
justifiable. A value of other means that a previously contacted customer couldnt decide
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
5/71
5
whether he will subscribe to a CD. He was probably not sure of a subscription during the
previous campaigns but didnt necessarily rule out the option of doing so at a later point in time.
However for cases where thepoutcomevalue assumes unknown, none of the above scenarios
can be justified. Our further investigation revealed a high collinearity of poutcome with the
variable pdays. Except for 5 records rest all have a pdays value of -1 when poutcome is
unknown. Pdays indicates the number of days that had passed by after the client was last
contacted from a previous campaign and a value of -1 implies that the client wasnt previous
contacted. This high correlation leads us to believe that apoutcomeof unknown simply means
that the client was not contacted before during any previous marketing campaign and is therefore
not a missing value. The 5 records for which poutcomeare unknown were considered erroneous
entries and hence we decided to rule them out from our analysis.
An unknown value in thejobvariable indicates that the occupation of the individual doesnt fall
under any of the other 11 categories profiled by the bank.
Variable contact, which denotes the mode of communication the bank used in contacting the
customer, has unknownas one of the possible values. It means that people didnt share their
contact information. These people were contacted through other means such as mail offers,
electronic emails or a personal visit by the bank sales representative.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
6/71
6
ANALYSIS
Description of variables
Our first descriptive analysis was conducted on our dependent variable. The objective for our
research is to define the strategy for a targeted marketing campaign in the future. In order to do
so we first needed to understand how the current marketing campaign performed. A descriptive
analysis of thesubscriptionvariation suggests that out of 45,206 customers contacted only 5,287
of them had subscribed to the banks CD, a success rate of 11.7%. This is quite a low
performance considering the amount of time and money wasted in contacting these customers
not only once but repeatedly. Was repeated phone calls a good idea? Was the bank able to target
the right set of customers based on their socio-economic behavior? What amount of resource was
wasted behind those customers who didnt carrythe potential to subscribe? For us to be able to
answer such questions we had to draw various hypotheses, prove or disprove them and finally
collate the results together to identify the right customer profile.
Age
The distribution of age is not normal. However since our sample size is quite large, as per
Central Limit Theorem, we are 99% confident that the average age of targeted customers was
around 40 years (Exhibit 3-b). Upon performing the descriptive analysis with only successful
subscription cases the results didnt change much. So it looks like the bank typically kept
targeting people around 40 years old and hence the majority of the subscription cases came from
this target group. However we tried to categorize them into logical age groups and found out that
people around the age of 40 are among the least likely to subscribe to a CD. It seems people
between 18 to 27 years of age, which includes undergraduate students, young professionals or
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
7/71
7
people pursuing their masters, have a good chance of subscription and also beyond 60 years of
age peoples tendency to subscribe to a CD increases (Exhibit 1-a). This seems logical because
people tend to retire after 60 and therefore a CD becomes the only source of income for their
family.
Job
Analysis of successful subscription by job category revealed that students have the highest
chance of subscription followed by retired and unemployed people. People working in
management and administrative positions are also quite likely to subscribe. A cross sectional
analysis of job category vs. age groups suggests that management and administrative positions
are majorly filled with people between 28 to 37 years of age (Exhibit 1-b). It is usually the peak
time of ones lifewhen people form families, have children and look out for additional sources
of income. Therefore management and administrative workers can form a good target group.
Following them is the group of people who are self-employed or have started their own business.
Such individuals are always on the lookout for extra sources of cash probably because of the
volatility of their business, requirement for extra funding in the future or incentive to save taxes.
Marital
A person was listed as either single or married or divorced. Among them married couples were
targeted heavily followed by singles. From an analysis of success rates achieved in either of
these categories we found that singles were most likely to subscribe to a CD followed by people
who were divorced. Interestingly married couples who were the main target customers for the
bank were ranked lowest (Exhibit 1-c).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
8/71
8
Education
In Portugal, the education system is divided into three categories primary, secondary and
tertiary. Primary education is free and compulsory for 9 years. Beyond that starts secondary
education which is basically three years of education 10th
, 11th
and 12th
. Higher education post
the 12th
is classified tertiary and includes undergraduate, masters or doctoral programs. Our
dataset contains another category called unknown for the highest level of education received
by a customer. Our research indicates that such cases occur when the customer decides not to
disclose this information. From our analysis we found that people with tertiary education had the
highest subscription rate compared to other. A general pattern that can be inferred from the graph
is that as the level of education increases the subscription rate increases (Exhibit 1-d).
Default
The default variable measures whether a customer has defaulted in his/her credit payments. An
overall indication of how efficiently the customer manages his/her credit score. About 11.79% of
the customers who havent defaulted subscribed to a CD as compared to 6.38% of those
customers who did. Also very few people with defaulted credit were contacted for the campaign,
about 815 as opposed to 44,391 people who didnt default(Exhibit 1-e).
Balance
Distribution of the average yearly balance is not normal. It is highly right skewed, similar to
what we observe generally for the distribution of income among people. The banks main targets
were people with lower yearly balance in their account. However it looks like there wasnt any
significant difference in subscription rates in the other yearly balance categories. We observed
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
9/71
9
two cases in the 80,000 to 90,000 Euro range, which had 100% subscription rate; however we
dont have enough data to conclude whether this didnt happen merely by chance(Exhibit 1-f).
Housing
Almost 56% of the people who were approached during the campaign had a house loan.
However, only 7.7% subscribed. On the contrary, 16.7% of the people who didnt have a housing
loan underwent subscription. It seems it is easier for the bank to convince customers who do not
have a housing loan (Exhibit 1-g).
Loan
Same as housing loan, if customers do not have any personal liabilities or debts to pay off,
likelihood of them subscribing to a CD is more (Exhibit 1-h).
Contact
Majority of the customers were contacted through cellphone. The second most common way of
reaching them was through mail offers, newsletters, or a bank sales representative visiting them
personally. The least used method was to reach them on their landline. Our analysis shows that
people tended to respond positively when contacted via cellphone more than when contacted via
landline; and were least responsive to any other modes of communication. Cellphones and
landlines offer the flexibility of negotiating the terms and conditions of a deal, whereas mail
offers may be too generic. On the other hand a sales representative visit looks too aggressive.
This leads us to believe that more reachable and interactive the communication is with the
customer, the more probable he/she is to subscribe to a CD (Exhibit 1-i).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
10/71
10
Day
We found that no matter during which day of the month the customers were contacted, the
subscription rate remains fairly constant. Probably day of the month is not a very good predictor
in our analysis (Exhibit 1-j).
Month
Subscription is highest during the months of March, September, October and December. These
months are usually the festive seasons in Portugal. The country celebrates Rio-style carnivals
during the month of March and the year-end is filled with events such as their Independence
Day, Christmas, etc. High subscription rate during the festive season could be because of banks
offering attractive interest rates or flexible deposit plans during the period (Exhibit 1-k).
Duration
Duration measures the time spent on call during the last contact with the customer. Data reveals
that people who spent less than 10 minutes on call were less likely to subscribe to a CD. On the
other hand call durations that lasted greater than 10 minutes show good subscription rates. In a
typical call, a bank representative may take approximately 5-7 minutes to explain the initial set
of terms and conditions of a plan to the customer. Rest of the time is mostly spent on discussing
the Q&A, customers have. Someone who is not interested in any sort of proposal is less likely to
prolong the call. However, longer call durations suggest that customers are more willing to hear
the details of a subscription plan and probably have an interest. Most success came from calls
lasting anywhere between 20-50 minutes (Exhibit 1-l).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
11/71
11
Campaign
The campaign variable gives us the count of phone calls made to the customer during the current
campaign that lasted from May 2008 to November 2010. Our analysis shows that people who
were contacted 1-5 times during the campaign had a subscription rate of 12.32%, slightly higher
than the overall success rate. Apparently most of the customers fell into the range of 1-5 phone
calls. However, when people were contacted repeatedly more than 5 times, the subscription rate
fell drastically. Generally people who are interested in fixed deposits will readily subscribe to a
CD without being urged. And for those who are not, repeated calls arentgoing to change their
minds all of a sudden; instead it might lead to more frustration and reduce chances of
subscription further (Exhibit 1-m).
Pdays
Pdays measures the number of days that passed by after the client was last contacted from a
previous campaign. We found that new customers were the main target for the bank, comprising
almost 82% of the entire group of people who were contacted. It turns out the amount of
subscription that came from new customers was considerably low (below the overall success
rate) as opposed to what came from previously contacted customers. Among such customers,
when there was a gap of more than a year from the last contact, the subscription rate was quite
high. This may be because of a good relationship of the customer with the bank in the past year
or returning customer who were happy with previous subscriptions. It has to be noted that
customers who were contacted within 3 months also showed a good subscription rate of 43.25%,
which possibly represents an ongoing campaign with customers in the formal process of
subscription (Exhibit 1-n).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
12/71
12
Previous
The marketing campaign has primarily focused on new customers who have been contacted less
than 9 times in previous campaign. For ex: 44,845 customers have been contacted in the range of
0-9 times with a success rate of 11.61%. Subscription rate was however high (23.83%) for
customers who were contacted 10-19 times before this campaign. Beyond that subscription rate
fell again. Generally in order to maintain a good reputation with customers, an optimum number
of interactions are necessary. Customer can perceive too few calls as a sign of disinterest from
the bank as well as too many calls may be thought of as an oversell. The bank should carefully
consider their customer retention strategy. Interestingly we found a tremendously high
subscription rate of 66.67% among customers who were contacted 50-60 times. However this
could be more like a case of chance because only 3 customers were contacted in that range out of
which only 2 subscribed (Exhibit 1-o).
Poutcome
Analysis of subscription data based on the results of the previous marketing campaign clearly
shows that if the previous campaign was successful for a particular customer, then there is a
higher chance that the customer will subscribe in the current campaign. This can be attributed to
several factors such as trust developed with the bank, higher satisfaction rate, etc. Therefore the
bank could focus their marketing efforts that target customers who opened an account during
previous campaigns (Exhibit 1-p).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
13/71
13
HYPOTHESIS
Our discussion of the variables has raised few interesting questions which we would like to
answer. In order to do so we have formulated the following hypotheses which we will prove or
disprove going forward.
H1: Students or young professionals (18-27) and people at the verge of retiring (60) have a
higher chance of subscription.
H2: Customers who are singles are more likely to subscribe than when they are married.
H3: The chance of subscription increases with a higher degree of education.
H4: People who have a good credit history are good targets.
H5: People with less financial liability such as personal loan are more likely to subscribe
H6: People who are more reachable are more likely to subscribe
H7: Chances of subscription increases during the months of festive season
H8: The more the time spent on the call the more likely customers are to subscribe. However
repeated calls reduce that chance
H9: Returning customers have higher chance of subscription
H10: Subscription chances are higher if customers are contacted a year after a previous
campaign
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
14/71
14
METHODOLOGY
Identification of important variables
As a first step in the process of evaluating our hypotheses we wanted to understand the
importance of each variable in the marketing campaign. For that we conducted a decision tree
analysis involving all the variables. The dataset was partitioned into groups, with 2/3rd
being
used as training data and 1/3rd
as validation data. Probability Chi-square statistics was used as
our splitting criteria. Also because our dataset is large, we assumed that in order to form a
significant group, there should be at least 100 customers in it or else we wouldnt consider it as a
meaningful categorization.
Analysis reveals that the misclassification rate is quite low. Only about 9% of the observations
were categorized erroneously and our validation dataset follows this statistic very closely
(Exhibit 2-a). It means that our decision tree model is reliable. The variable importance chart
reflected some interesting findings. A good number of variables which we thought had
considerable relevance to the bank marketing campaign were deemed unimportant. For example,
we believed that bearing housing and personal loans were bad indicators of subscription. We
thought contacting the customers repeated is going to deteriorate the chance of their subscription.
Also educational qualification was thought to be an important consideration; needless to say we
felt credit history of a person had substantial significance in the context of subscription chances.
The variables that contributed to the model in a decreasing order of importance are duration,
poutcome, month, age, marital status andcontact (Exhibit 2-b).
On careful investigation of the leaf nodes we found that the decision tree has paid more focus on
classifying those types of customers who are less likely to subscribe rather than identifying those
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
15/71
15
cases which are probably going to be a success. This works like a process of elimination, in
which instead of trying to find an ideal match we get rid of cases that are unlikely to subscribe.
So even though we dont know whois the right set of customers, we know with most certainty
who we shouldnt target. It makes sense because there could be several other missing factors,
which havent been taken into account in the campaign that can increase the probability of
success.
Decision rules
The decision tree in Fig () displays the results of the marketing campaign to sell CD
subscriptions. Customers who subscribed at the end of the campaign were coded 1. The root
node shows that, of the 30,286 customers in the training dataset who were targeted, 11.7%
subscribed to a CD whereas 88.3% did not (coded with 0).
This decision tree could be used by the bank at several different points in making decisions on
which groups of customers they should focus their marketing campaign (Exhibit 2-c).
When the duration was less than 8.68 minutes VERSUS when the duration was greater
than or equal to 8.68 minutes
Under the root node, the first categorization of subscription was done based on duration.
Duration was the most important factor in predicting the subscription. This factor has been
applied several times for categorization in the decision tree. It is a general perception that when a
person is interested in a bank product or any other service, the time spent on call with them will
be more. When looking at duration of less than 8.68 minutes versus duration of greater than
equal to 8.68 minutes, people who spend more time over the call (>8.68 minutes) have a higher
subscription rate of 44.1% than people who spend less than 8.7 minutes who have a very low
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
16/71
16
subscription rate of just 7.7%. This also reconfirms the perception we had earlier. The bank
could make use of this rule and focus on improving customer service by making attractive offers
or training the representatives to keep the customers engaged on call for a longer duration.
When the duration was less than 8.68 minutes and poutcome was Successful VERSUS
when poutcome was Failure or Unknown
When time spent on call was less than 8.68 minutes, customers were further categorized based
on the outcome of the previous marketing campaign. One could expect that customers who spend
less time over the call did not prefer the bank service. When looking at the decision tree, it
clearly shows that customers who subscribed to a CD in the previous campaign (coded
success) had a high subscription rate of 62.4% whereas customers who did not subscribe
previously or were never contacted before (coded failure, unknown) had a very low
subscription rate of 5.9%. From this decision rule, even if less time was spent on call, the bank
should filter customers who had subscribed previously and put their marketing efforts towards
them.
When poutcome was successful and duration was less than 2.21 minutes VERSUS when
duration was between 2.21 minutes to 8.68 minutes
When the previous outcome was successful, the customers were further categorized once again
based on duration. From the decision trees, it is clear that people with successful previous
campaign and time spent between 2.21 minutes to 8.68 minutes have a good subscription rate of
71.7% versus people who spend less than 2.21 minutes who have a low subscription rate of
21.5%. Therefore even if the previous marketing campaign was successful and the second time
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
17/71
17
customers are easier to target, the bank representatives should try to keep the customers engaged
for at least 2.21 minutes to increase the chances of subscription.
When poutcome was failure, unknown or other, duration was less than 8.68 minutes and
month was October, March, September VERSUS when month was January, February,
April, May, June, July, August, November and December
When poutcome was failure, unknown or the customer was unsure, it was further categorized by
months. During the months of March, September and October the subscription rate was 37.7%
which was not distinctive enough to claim that month was influential in deciding the subscription
rate. Whereas for the rest of the months we can clearly see that the chances of subscription was
quite low 4.7%.
March, September and October are the festive months in Portugal. So in order to determine the
subscription chances during these months we would have to look at other factors.
When previous outcome is failure, unknown or other, month is October, March, September
and duration is less than 2.9 minutes VERSUS when duration is greater than 2.9 minutes
but less than 8.7 minutes
As discussed before that we needed other factors to be considered for the months of March,
September and October, the decision was based on duration i.e. lesser the time spent on call the
lower is the subscription rate. People who spend less than 2.9 minutes had a subscription rate of
17.6%. Therefore we can say for sure that subscription is most unlikely. Whereas those who
spend more than 2.9 minutes have subscription rate of 57.4% but we should carefully consider
other factors when making a decision. Even if more time was spent on call in these months we
could not tell for sure whether the subscription would happen.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
18/71
18
When previous outcome is failure, unknown or other, month is January, February, April,
May, June, July, August, November, December and duration is less than 4.31 minutes
VERSUS when duration is greater than 4.31 minutes but less than 8.7 minutes
People who were contacted during these months had a very low subscription rate of 4.1% as
observed before; and if these people spent less than 4.31 minutes on call their subscription rate
decreased further to 2.6%. However people who spent more than 4.31 minutes have a
subscription rate of 11.0%, which is still low. But we will see in our upcoming analysis how age
could be a deciding factor.
When duration is greater than 4.31 minutes but less than 8.7 minutes, previous outcome is
failure, unknown or other, month is January, February, April, May, June, July, August,
November, December and age is less than or equal to 60.5 years VERSUS age is greater
60.5 years
In the above splitting rule, we mentioned that age could be a deciding factor. People who spent
greater than 4.31 minutesbut less than 8.7 minutes but who were contacted during the months of
January, February, April, May, June, July, August, November, December followed the same
pattern of lower subscription rate if they were younger than 60.5 years old. But for people who
are older than 60.5 years this pattern is no longer true. People beyond that age had fairly equal
chances of subscribing or not subscribing.
When duration was greater than 8.70 minutes but less than 13.8 minutes and poutcome
was success VERSUS when poutcome was unknown or failure
Under the node where duration is greater than 8.70 minutesbut less than 13.8 minutes were
further categorized based on the outcome of the previous marketing campaign. We could again
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
19/71
19
predict that people who had subscribed previously with the bank and spent a good amount of
time on the call have a higher chance of subscribing again. And decision tree confirms this;
people who had a successful previous marketing campaign had a higher subscription rate of
83.3% versus people who didnt respond positively in the previous marketing campaign who had
a low subscription rate of 33.3%
When duration is greater than or equal to 13.8 minutes and Marital status was single,
divorced VERSUS when marital status was married
The group of customers who have spent considerably high amount of time (>13.8 minutes) over
the call were further categorized based on marital status. Irrespective of the marital status, people
who spent the most time over the call had a decent subscription rate. People, who were single,
divorced had a subscription rate of 63.8% and people who were married had a subscription rate
of 54.4%. Given a chance the bank should focus more on customers who are single or divorced
than married people.
When duration was greater than or equal to 13.8 minutes, marital status was married and
contact type was cellular VERSUS when contact type was unknown
Among people who are married and had spent more than 13.8 minutesover the call are further
categorized based on the contact method. People who were easily reachable had a higher chance
of subscription. For example if the contact method was cellphone the subscription rate was 58%
whereas people were contacted through snail mail or e-mail had a lower subscription rate of
44.5%. Even though the subscription rate when contacted through cellphone was not
significantly high it was relatively greater than being contacted via mail and e-mail. So it is
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
20/71
20
better to reach customers through cellphone or landline than any other method as it increases the
chance of customers being engaged in a human interaction.
Customers least l ikely to subscri be
Therefore the cases we know for sure in which customers are least likely to subscribe are:
1. Customers who have previously subscribed to a CD and spend less than 2.2 minutes on
call (predictability of 78.5%).
2. Customers who are being contacted for the first time or failed to subscribe during a
previous marketing campaign, spend less than 4.3 minutes and are contacted during non-
festive months (predictability of 97.4%).
3. Customers, who are contacted during the non-festive months, either failed to subscribe
during a previous marketing campaign or are first timers but spend between 4.3 - 8.7
minutes on call and are younger than 60.5 years old (predictability of 89.9%).
4. Customers, who are contacted during festive months, spend less than 2.9 minutes on call
and those who didnt subscribe during a previous marketing campaign or were being
contacted for the first time (predictability of 82.4%).
Customers most likely to subscri be
Similarly, cases in which we know for sure customers are most likely to subscribe are:
1. Customer who spends between 2.213.8 minutes on call and had previously accepted a
subscription offer (predictability of 73.3%).
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
21/71
21
Underlying patterns among variables
Now that we have identified the set of important variables and rules to be considered for the
bank marketing campaign we want to validate our analysis by fitting a logistic regression model
to the data. But before we proceed, we would like to take a closer look at the correlation among
the independent variables. This is necessary because we need to prevent multicollinearity issues
from creeping into our model, which can inflate the coefficient estimates of our variables.
Interesting enough, we didnt find any correlation among the numerical variables in our dataset
(balance, duration, campaign and previous). The correlation and scatter plot matrix suggested
very mild association, which will be of no concern in our analysis (Exhibit 4). Below is a tabular
representation of the association among the variables:
balance duration campaign previous
balance
duration
campaign
previous
Strong correlation No correlation
We were more curious about any association between the categorical variables because they
formed the majority of our dataset. We carried out chi-square test of association between each
pair of categorical variables and looked at their Crammers V statistics to identify any underlying
relationship. A Crammers V estimateof 0.25 or higher suggests strong association between the
variables whereas anything below is acceptable. The table below tells us which of the categorical
variables have strong association:
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
22/71
22
Age Job Marital Education Default Housing Loan Contact Month Pdays Poutcome
Age
Job
Marital
Education
Default
Housing
Loan
Contact
Month
Pdays
Poutcome
Strong correlation
Age and job
We found age and job to be correlated. This makes sense because as people tend to grow older
they get promotions and move to better job positions. For example a fresh out of college student
is more likely to be placed in professional services or technical job as opposed to someone who
is middle aged or around 60, who are more likely to be occupying management positions, self-
employed or retired (Exhibit 5-a).
Age and mar ital status
Similarly age was also correlated with marital status, which makes even more sense because
young people tend to be single more often than people in their 30s. And as they grow older they
either remain married or get divorced (Exhibit 5-b).
Job and education
The higher the educational qualification of a person the higher are the chances of finding a
sophisticated job. A person with Masters or PhD is more probable of serving management
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
23/71
23
positions or doing sophisticated technical jobs that require high degree of technical qualifications
whereas someone with only basic school education is likely to end up in a blue collar profession
(Exhibit 5-c).
Job and housing
The kind of people who are prone to taking a housing loan can be explained by their job profile.
Lets take acase by case example. People who do not have job or have very low income such as
students, housewives or retired employees are very likely not to take a housing loan because of
the apprehensions in repayment. Moreover their background history may not be suitable enough
for banks to extend such credit. Low salaried people on the other hand, such as blue collared
professionals, clerks or people in services actively seek better living standards. Therefore they
are more likely to accept housing loans. For self-employed individuals or persons working in
management jobs, who have sufficient level of income do not care much about housing loans
because they can afford quality living standards themselves (Exhibit 5-d).
Housing and month
Our data shows that majority of the people were contacted during Q2 and Q3 of the year. Out of
that, Q2 had a high focus on people who had a housing loan whereas during Q3, people who
didnt have any housing loans were targeted. Less focus was given on reaching out to customers
during Q1 and Q4 which are generally the festive seasons in Portugal. This explains the high
degree of association between the two variables. As discussed before we found that subscription
rate was greater during the festive seasons, which was also supported by our decision tree
analysis. We also notice that fairly equal amount of focus was given to people with and without
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
24/71
24
housing loans during Q1 and Q4. This leads us to believe that housing loan probably isnt that
important a factor (Exhibit 5-e).
Contact and month
Customers were contacted mostly through mail offers during Q2 and through cellphone during
Q3. As we talked about before, less attention was paid in reaching out to customers during the
other two seasons (Q1 and Q4) via any means. This explains the correlation but there is no
general understanding of such variation in communication type depending upon the season.
Moreover our decision tree analysis suggests that both contact and month are important variables
under consideration. Therefore we will keep them in our analysis (Exhibit 5-f).
Pdays and poutcome
There is very strong association between pdaysandpoutcome. New customers and people who
didnt subscribe to a CD during a previous marketing campaign were the main focus of the
campaign. A large portion of them were contacted within 6 months, and even more within a
period of one year. Very few people were contacted after a year with least focus being given to
existing customers who had subscribed during the previous campaign. This is generally the case;
marketers always try to lure new customers into buying their products, but they often dont focus
on servicing existing customers, probably because they take them for granted (Exhibit 5-g).
Selection of relevant variables
We decided to keep jobout of our analysis, because of its high dependency on age, education
andhousing loan. Moreover, as there are several categories in the jobvariable, keeping them in
the equation would be over fitting our model. Since job profiles can be so diverse, we need to
keep room for new positions that may pop up in the future. Ageand education,which follow a
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
25/71
25
more standard classification, will do a better role. We decided to drop housing loan as well
because we didnt think it was a good enough predictor of subscription.
Rest all categorical variables are being kept in our model. Correlation amongst the numeric
variables werent found noteworthy, so none of them were dropped. We ran general linear
models across all numerical and categorical variables to find if there is any association between
them. Most of them showed minor correlation but we couldnt drop any of them because all
those variables seemed relevant to the banks marketing strategy.
With respect topdaysandpoutcome, because of their high degree of association, we want to run
two logistic models, one withoutpdaysand the other withoutpoutcome.
Subscription Model (with poutcome)
We got pretty satisfactory results on running our logistic model using the selected set of
independent variables we just discussed above (Exhibit 6). Model predictability was quite high at
88.94% as suggested by the c-statistic. Convergence criterion was satisfied for the model to be
interpretable and the overall model was significant at level of 0.01, thus indicating that our
model is a good fit. There were of course a few outliers, some of which had high leverage and
some poorly accounted for by the model. We got rid of them to prevent them from altering our
coefficient estimates drastically. Influential diagnostic suggested that our obtained estimates
were quite stable after the cleansing.
Non-significant factors
As proposed by our decision tree results, variables default and previous were not statistically
significant predictors in our model. Defaultindicates whether the customer defaulted in paying
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
26/71
26
debts. Although not significant in our model, defaulting to pay loans, which effects customers
credit history, plays a crucial role in the financial services industry. Banks verify a persons
payment history and background check before extending mortgage loans, car loans, etc. all the
time. Therefore we do not want to lose an important like credit history in a banking model.
Previous, which accounts for the number of contacts performed before this campaign for a client,
was found not significant by a close margin (p-value = 0.0335, = 0.01). Moreover it makes
sense to keep knowledge on the amount of effort spent a particular customer, how accustomed
they are with the banks products, whether they are new clients, etc. Hencepreviouswas kept in
the model as well.
Testing our Hypothesis
Note: All odds estimate between variables have been interpreted holding other variables constant
H1: Students or young professionals (18-27) and people at the verge of retiring (60) have a
higher chance of subscription.
Turns out, our hypothesis is correct. We hypothesized that young professionals, fresh college
pass outs and people 60 years and above are more like to subscribe to a CD. Ageis a significant
factor in our model with a p-value less than 0.01. For our analysis we had categorized customers
into the following age groups: 18-27, 28-45, 46-59 and 60+. Since people between 18-27 and
60+ were our main focus, we used 28-45 as our reference age group. We found that the odds that
people in the age group 18-27 will subscribe to a CD are 1.87 times the odds for people in the
age group 28-45. That means that students or young professionals (18-27) are 87% more likely to
subscribe when compared to people in their 28-45. And the odds of subscription for people 60
years and above are 4 times the odds for people in the age group 28-45. Thus students or young
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
27/71
27
professionals (18-27) and people at the verge of retiring (60) should be the main target customers
for the bank.
H2: Customers who are singles are more likely to subscribe than when they are married.
Marital status was a significant factor in our model. We compared the likely of subscription
taking singles as our reference group. It seems the odds that singles will take a subscription offer
are 1.26 to 1.57 times the odds of married people taking the offer, meaning about 41% greater
chance of subscription. It is often the case that singles are less stable in their lives compared to
married couples and therefore look for other sources of income for stability.
H3: The chance of subscription increases with a higher degree of education.
We cannot comment on people who didnt disclose their highest educational qualification, but
their odds of subscription are 1.5 times the odds of people with only primary education. Primary
education in Portugal is mandatory and free. Hence it is safe to assume that people who didnt
disclose their educational status have either primary or more education. When we evaluate the
chances of subscription between people with known education level, it looks like people with
secondary education, meaning those who have an undergraduate degree or equivalent
qualification, are 31% more likely than people with basic primary education to subscribe. And
people who have attained tertiary education such as Masters or PhD are about 81% more likely
than primary educated people for a subscription. Therefore it is true that subscription chance
increases with a higher degree of education.
H4: People who have a good credit history are good targets.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
28/71
28
As discussed, default was found to be not a statistically significant factor in our model. However
we included it because of its business significance. Model results suggest that the odds of people
with a good credit history, subscribing to a CD are 1.16 times the odds of people who have
defaulted in paying off their financial debts. However the likeliness could vary largely between
0.765 and 1.76, indicating that sometimes even people with a bad credit history could turn out to
be a potential customer. This is true in some sense because not always do banks turn down
clients with a bad credit history. Some banks even extend offers to such clients giving them a
chance to improve their credit score. So it varies from case to case. All we can say is that our
data doesnt contain sufficient evidence to validate our hypothesis about credit history.
H5: People with less financial liability such as personal loan are more likely to subscribe
There is statistically significant proof that people who do not have a personal loan to payback are
more likely to subscribe, about 77% more. Our model suggests that the odds of people with no
personal loan are 1.77 times the odds of people with a personal loan, for a subscription. If we
think of it, people who have taken a personal loan will more likely be concerned about paying
back their loans, which means they would have sufficient funds to invest into a CD.
H6: People who are more reachable are more likely to subscribe
Customers were contacted via various methods such as cellphone, landline and mail offers. Out
of all, the highest chance of subscription came from people contacted via cellphones, followed
by landline and then mail offers, as suggested by the odds ratios. People are more reachable
through cellphones or landlines as opposed to mail offers. Customers usually prefer taking to a
human rather than respond to targeted offline advertisements when it comes to dealing with
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
29/71
29
financial matters. Therefore we have statistically significant proof that people who are more
reachable are more likely to subscribe.
H7: Chances of subscription increases during the months of festive season
Q1 and Q4 are the main festive seasons in Portugal. The odds of customers taking a subscription
offer during Q1 are greater than any other quarters with the second highest being Q4. Odds for
Q3 are close to Q4, indicating that both these seasons have a higher rate of subscription after Q1.
However our estimate of Q4 wasnt statistically significant at = 0.01 (p-value = 0.0118).
Moreover our descriptive statistics suggested that subscription rate was quite high during the
months of September, which falls in Q3 and October & December in Q4. Statistical
insignificance of Q4 could be because of less than the overall rate of subscription in November.
Therefore Q1 is a definite target for bank representatives and with some confidence the last 4
months of the year as well. But further investigation will be needed to the find the reason for low
subscription rate in November.
H8: The more the time spent on the call the more likely customers are to subscribe. However
repeated calls reduce that chance
Durationwas measured in seconds spent talking to the customer. For ease of understanding we
will interpret the increased chance of subscription for minutes increase in time spent on call.
Point estimate suggests that for every 10 minutes increase in the time spent on call, customers
are 10 times more likely to take the offer. It makes quite a lot of sense because the longer a bank
representative talks to the customer, the more probable it is that the customer is interested in
knowing about the product, and therefore more likely to subscribe.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
30/71
30
Variable campaign, which tells us the number of times a customer was called, has a negative
coefficient. This indicates that repeated number of calls reduces a customers chance of
subscription. From the odds ratio we estimated that with every repeat call the chance of
subscription decreases by 8.5%.
Therefore ideally bank representatives should try to spend longer times explaining the deal on
one call rather than calling them repeated.
H9: Returning customers have higher chance of subscription
We includedpoutcomein our analysis to prove or disprove this hypothesis. When poutcome is a
failure i.e. the customer failed to subscribe during a previous marketing campaign, we cannot be
totally sure if the customer is going to take the offer now, because our p-value for failure
poutcome cases came out to be non-significant (p-value = 0.5371). However when customers did
accept a previous offer, they are 12 times more likely to subscribe once again to a new offer. It is
usually seen that existing customers were more likely satisfied with a previous deal, which is
why they subscribed at the first place. Hence chances are high that they will subscribe again.
Therefore targeting existing customers will be a good move and an essential customer retention
strategy from the perspective of the bank.
Subscription Model (with pdays)
H10: Subscription chances are higher if customers are contacted a year after a previous
campaign
For the purpose of proving our last hypothesis we replacedpoutcomewithpdaysand ran another
logistic model, keeping all other variables intact (Exhibit 7). Predictability of our second model
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
31/71
31
was 87.6%, approximately the same as our previous model. The coefficient estimates didnt
change much, which is a good sign because pdays is high correlated with poutcome, hence
replacing one with the other shouldnt vary our model drastically. We divided pdays into four
categories 1) people who were being contacted for the first time (NC) 2) people who were
contacted within 6 months after the last campaign 3) people who were contacted after 6 months
but before one year and lastly 4) people who were contacted after one year. From our odds ratio
we found that people who were contacted after a year were most likely to subscribe followed by
people who were contacted within 6 months. For customers contacted between 6-12 months, the
chance of subscription fell drastically. So the bank should reach out to customers after a year, at
which point they would be 2.5 times more likely to subscribe compared to be in contacted
between 6-12 months.
We can draw an analogy combining our hypothesisH9andH10that repeated calls within a short
period of time can increase customer frustration. People who had recently accepted or denied
taking an offer are less likely to change their mind within a short span of time.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
32/71
32
CONCLUSION
Recommended Marketing Strategy
Our goal was to analyze the historical data of a marketing campaign conducted by a Portuguese
bank in order to identify the important indicator variables that could help us predict subscription
chances. These indicator variables are going to be used to devise a directed marketing strategy
targeting only potential customers. The data collected over a two year period reflected
statistically significant evidence in favor of a few key factors that are certainly crucial to
marketers in the financial service industry. Marketers should pay special attention to existing
customers, who have accepted a bank offer previously. They should also focus on people around
the age 18-27, when they are usually single and divorced individuals above the age of 60.
Because they lack the support of a spouse they tend to look for other means of financial stability
and are hence more likely to subscribe. Highly educated people who have a good source of
income form a good target audience. People with high salaries are less likely to take a loan
because they have sufficient funds to afford their personal expenses and also enough disposable
income to invest in banking products like CD. Marketers should focus on reaching out to these
customers during the festive seasons (Q1) through cellphones and spend as much time as
possible on the call. The more they keep them engaged the more likely they are to take the offer.
However they should refrain from calling them repeatedly and must ensure a gap of at least a
year before contacting them again. It is true, because in recent times, customer care calls have
become so frequent that high volume of calls can lead to increased customer frustration and
thereby reduce chances of subscription.
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
33/71
33
EXHIBIT
Exhibit 1-a
Red line: the overall success rate 11.7%
Exhibit 1-b
Red line: the overall success rate 11.7%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
18-27 28-37 38-47 48-57 58-67 68-77 78-87 88-97
Subscription (by Age category)
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
Subscription (by Job category)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
34/71
34
Exhibit 1-c
Red line: the overall success rate 11.7%
Exhibit 1-d
Red line: the overall success rate 11.7%
single divorced married
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
Subscription (by Marital status)
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
tertiary unknown secondary primary
Subscription (by Education level)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
35/71
35
Exhibit 1-e
Default People contacted People subscribed
no 44391 5235 11.79%
yes 815 52 6.38%
GrandTotal 45206 5287 11.70%
Exhibit 1-f
Exhibit 1-g
Housing loan People contacted People subscribed
no 20078 3353 16.70%
yes 25128 1934 7.70%
Grand Total 45206 5287 11.70%
Exhibit 1-h
Personal loan People contacted People subscribed
no 37964 4804 12.65%
yes 7242 483 6.67%
Grand Total 45206 5287 11.70%
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%
80.00%90.00%
100.00%
Subscription (by Balance amount)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
36/71
36
Exhibit 1-i
Contact People contacted People subscribed
cellular 29280 4367 14.91%
telephone 2906 390 13.42%
unknown 13020 530 4.07%Grand Total 45206 5287 11.70%
Exhibit 1-j
Day People contacted People subscribed
1-10 13724 1733 12.63%
11-20 18387 2024 11.01%
21-31 13095 1530 11.68%
Grand Total 45206 5287 11.70%
Exhibit 1-k
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
jan feb mar apr may jun jul aug sep oct nov dec
Subscription (by Month)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
37/71
37
Exhibit 1-l
Exhibit 1-m
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Contact duration (secs)
Subscription (by Duration)
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
Subscription (by No. of phone calls)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
38/71
38
Exhibit 1-n
Exhibit 1-o
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
Subscription (by pdays)
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
0-9 10-19 20-29 30-39 40-49 50-59 270-279
Subscription (by No. of previous
phone calls)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
39/71
39
Exhibit 1-p
Exhibit 2-a
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
success other failure unknown
Subscription (by previous campaign
outcome)
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
40/71
40
Exhibit 2-b
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
41/71
41
Exhibit 2-c
Exhibit 3
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
42/71
42
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
43/71
43
Exhibit 4
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
44/71
44
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
45/71
45
Exhibit 5-a
descriptive analysis of successful cases
The FREQ Procedure
Table of age by job
job(job)admin. blue- entrepreneur housemaid management retired self- services student technician unemployed unknown Total
age collar employed18- 410 627 53 25 351 3 91 366 593 436 85 9 304927 0.91 1.39 0.12 0.06 0.78 0.01 0.20 0.81 1.31 0.96 0.19 0.02 6.74
13.45 20.56 1.74 0.82 11.51 0.10 2.98 12.00 19.45 14.30 2.79 0.307.93 6.44 3.56 2.02 3.71 0.13 5.76 8.81 63.22 5.74 6.52 3.13
28- 3366 6302 897 547 6303 76 1014 2747 342 5218 791 107 2771045 7.45 13.94 1.98 1.21 13.94 0.17 2.24 6.08 0.76 11.54 1.75 0.24 61.30
12.15 22.74 3.24 1.97 22.75 0.27 3.66 9.91 1.23 18.83 2.85 0.3965.11 64.76 60.32 44.11 66.66 3.36 64.22 66.13 36.46 68.69 60.71 37.15
46- 1314 2705 511 567 2610 1095 433 1019 3 1854 409 144 1266459 2.91 5.98 1.13 1.25 5.77 2.42 0.96 2.25 0.01 4.10 0.90 0.32 28.01
10.38 21.36 4.04 4.48 20.61 8.65 3.42 8.05 0.02 14.64 3.23 1.1425.42 27.79 34.36 45.73 27.60 48.39 27.42 24.53 0.32 24.41 31.39 50.00
60+ 80 98 26 101 192 1089 41 22 0 88 18 28 17830.18 0.22 0.06 0.22 0.42 2.41 0.09 0.05 0.00 0.19 0.04 0.06 3.944.49 5.50 1.46 5.66 10.77 61.08 2.30 1.23 0.00 4.94 1.01 1.571.55 1.01 1.75 8.15 2.03 48.12 2.60 0.53 0.00 1.16 1.38 9.72
Total 5170 9732 1487 1240 9456 2263 1579 4154 938 7596 1303 288 4520611.44 21.53 3.29 2.74 20.92 5.01 3.49 9.19 2.07 16.80 2.88 0.64 100.00
Statistics for Table of age by job
Statistic DF Value Prob
Chi-Square 33 19298.9919
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
46/71
46
Exhibit 5-b
Descriptive analysis of successful cases
The FREQ Procedure
Frequency Table of age by marital
Percent marital(marital)Row PctCol Pct age divorced married single Total
18-27 45 598 2406 30490.10 1.32 5.32 6.741.48 19.61 78.910.86 2.20 18.81
28-45 2618 15770 9322 277105.79 34.88 20.62 61.309.45 56.91 33.64
50.28 57.95 72.90
46-59 2241 9424 999 126644.96 20.85 2.21 28.01
17.70 74.42 7.8943.04 34.63 7.81
60+ 303 1419 61 17830.67 3.14 0.13 3.94
16.99 79.58 3.425.82 5.21 0.48
Total 5207 27211 12788 4520611.52 60.19 28.29 100.00
Statistics for Table of age by marital
Statistic DF Value Prob
Chi-Square 6 7552.3473
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
47/71
47
Statistic DF Value Prob
Chi-Square 11 3590.2589
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
48/71
48
Exhibit 5-f
descriptive analysis of successful cases
The FREQ Procedure
Frequency Table of contact by month
Percent month(month)Row PctCol Pct contact(contact) Q1 Q2 Q3 Q4 Total
cellular 4044 8786 12182 4268 292808.95 19.44 26.95 9.44 64.77
13.81 30.01 41.61 14.5889.29 39.87 88.79 86.77
telephone 456 739 1164 547 29061.01 1.63 2.57 1.21 6.43
15.69 25.43 40.06 18.8210.07 3.35 8.48 11.12
unknown 29 12513 374 104 130200.06 27.68 0.83 0.23 28.800.22 96.11 2.87 0.800.64 56.78 2.73 2.11
Total 4529 22038 13720 4919 4520610.02 48.75 30.35 10.88 100.00
Statistics for Table of contact by month
Statistic DF Value Prob
Chi-Square 6 16487.9794
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
49/71
49
6.18 2.25 1.33 0.00 9.7763.26 23.08 13.66 0.0056.99 55.38 39.91 0.00
NC 0 0 0 36954 369540.00 0.00 0.00 81.75 81.750.00 0.00 0.00 100.000.00 0.00 0.00 100.00
Total 4901 1840 1511 36954 4520610.84 4.07 3.34 81.75 100.00
Statistics for Table of pdays by poutcome
Statistic DF Value Prob
Chi-Square 9 46386.7985
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
50/71
50
60+ 0 0 1
marital divorced 1 0
married 0 1
single 0 0
education primary 0 0 0
secondary 1 0 0tertiary 0 1 0
unknown 0 0 1
default no 0
yes 1
loan no 0
yes 1
contact cellular 1 0
telephone 0 1
unknown 0 0
month Q1 0 0 0Q2 1 0 0
Q3 0 1 0
Q4 0 0 1
poutcome failure 1 0 0
other 0 1 0
success 0 0 1
unknown 0 0 0
Model Fit StatisticsIntercept
Intercept and Criterion Only Covariates
AIC 32614.294 22833.093
SC 32623.013 23033.627
-2 Log L 32612.294 22787.093
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 9825.2014 22
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
51/71
51
default 1 0.8578 0.3544
balance 1 17.6666
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
52/71
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
53/71
53
Partition for the Hosmer and Lemeshow Test
subscription = 1 subscription = 0
Group Total Observed Expected Observed Expected
1 4520 8 35.98 4512 4484.02
2 4520 21 68.61 4499 4451.39
3 4520 31 108.50 4489 4411.504 4520 61 156.42 4459 4363.58
5 4522 104 209.42 4418 4312.58
6 4520 212 271.99 4308 4248.01
7 4520 369 360.62 4151 4159.38
8 4520 672 521.26 3848 3998.74
9 4520 1246 917.92 3274 3602.08
10 4519 2561 2634.43 1958 1884.57
Hosmer and Lemeshow Goodness-of-FitTest
Chi-Square DF Pr > ChiSq443.7519 8
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
54/71
54
0.200 3233 36567 3349 2052 88.1 61.2 91.6 50.9 5.3
0.300 2564 37945 1971 2721 89.6 48.5 95.1 43.5 6.7
0.400 2113 38592 1324 3172 90.1 40.0 96.7 38.5 7.6
0.500 1716 38997 919 3569 90.1 32.5 97.7 34.9 8.4
0.600 1282 39274 642 4003 89.7 24.3 98.4 33.4 9.2
0.700 914 39494 422 4371 89.4 17.3 98.9 31.6 10.0
0.800 587 39648 268 4698 89.0 11.1 99.3 31.3 10.6
0.900 269 39785 131 5016 88.6 5.1 99.7 32.8 11.2
1.000 0 39916 0 5285 88.3 0.0 100.0 . 11.7
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
55/71
55
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
56/71
56
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
57/71
57
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
58/71
58
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
59/71
59
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
60/71
60
Exhibit 7
logistic regression
The LOGISTIC Procedure
Model Information
Data Set MYSAS.BANK_RECODED
Response Variable subscription subscription
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 45201
Number of Observations Used 45201
Response ProfileOrdered Total
Value subscription Frequency
1 0 39916
2 1 5285
Probability modeled is subscription=1.
Class Level Information
Class Value Design Variables
age 18-27 1 0 0
28-45 0 0 0
46-59 0 1 0
60+ 0 0 1
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
61/71
61
marital divorced 1 0
married 0 1
single 0 0
education primary 0 0 0
secondary 1 0 0
tertiary 0 1 0unknown 0 0 1
default no 0
yes 1
loan no 0
yes 1
contact cellular 1 0
telephone 0 1
unknown 0 0
month Q1 0 0 0
Q2 1 0 0Q3 0 1 0
Q4 0 0 1
pdays 1 year 1 0 0
6 months 0 1 0
6-12 months 0 0 1
NC 0 0 0
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
62/71
62
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
63/71
63
Odds Ratio Estimates99% Wald
Effect Point Estimate Confidence Limits
age 18-27 vs 28-45 2.006 1.719 2.341
age 46-59 vs 28-45 1.102 0.983 1.234
age 60+ vs 28-45 4.511 3.791 5.367
marital divorced vs single 0.830 0.708 0.974
marital married vs single 0.706 0.634 0.787
education secondary vs primary 1.353 1.165 1.571
education tertiary vs primary 1.893 1.619 2.213
education unknown vs primary 1.687 1.324 2.151
default yes vs no 0.828 0.546 1.254
balance 1.000 1.000 1.000
loan yes vs no 0.522 0.451 0.606
contact cellular vs unknown 3.685 3.131 4.336
contact telephone vs unknown 3.099 2.454 3.914
month Q2 vs Q1 0.763 0.663 0.878
month Q3 vs Q1 0.735 0.637 0.848
month Q4 vs Q1 0.808 0.686 0.951
duration 1.004 1.004 1.004
campaign 0.907 0.884 0.930
previous 1.018 0.995 1.042
pdays 1 year vs NC 3.937 3.012 5.146
pdays 6 months vs NC 3.112 2.672 3.624
pdays 6-12 months vs NC 1.586 1.362 1.848
Association of Predicted Probabilities and Observed Responses
Percent Concordant 87.6 Somers' D 0.753
Percent Discordant 12.4 Gamma 0.753
Percent Tied 0.0 Tau-a 0.155
Pairs 210956060 c 0.876
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
64/71
64
Partition for the Hosmer and Lemeshow Test
subscription = 1 subscription = 0
Group Total Observed Expected Observed Expected
1 4520 5 34.29 4515 4485.71
2 4520 22 67.24 4498 4452.76
3 4520 31 107.16 4489 4412.844 4520 66 158.74 4454 4361.26
5 4520 136 218.02 4384 4301.98
6 4520 232 293.10 4288 4226.90
7 4521 440 406.52 4081 4114.48
8 4521 716 605.77 3805 3915.23
9 4520 1264 997.32 3256 3522.68
10 4519 2373 2396.82 2146 2122.18
Hosmer and Lemeshow Goodness-of-FitTest
Chi-Square DF Pr > ChiSq331.9394 8
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
65/71
65
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
66/71
66
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
67/71
67
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
68/71
68
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
69/71
69
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
70/71
70
-
8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)
71/71