targeted bank marketing campaign research paper (predictive analytics)

Upload: sandipan-sen

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    1/71

    1

    ST 635Statistics Project

    Team members:

    Chih Ying, Lee

    Praveena Mani

    Sandipan Sen

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    2/71

    2

    Table of Contents

    INTRODUCTION ............................................................................................................................................. 4

    Objective ................................................................................................................................................... 4

    Dataset description ................................................................................................................................... 4

    Data validation .......................................................................................................................................... 4

    ANALYSIS ....................................................................................................................................................... 6

    Description of variables ............................................................................................................................ 6

    Age ........................................................................................................................................................ 6

    Job ......................................................................................................................................................... 7

    Marital ................................................................................................................................................... 7

    Education .............................................................................................................................................. 8

    Default ................................................................................................................................................... 8

    Balance .................................................................................................................................................. 8

    Housing ................................................................................................................................................. 9

    Loan ....................................................................................................................................................... 9

    Contact .................................................................................................................................................. 9

    Day ...................................................................................................................................................... 10

    Month ................................................................................................................................................. 10

    Duration .............................................................................................................................................. 10

    Campaign ............................................................................................................................................ 11

    Pdays ................................................................................................................................................... 11

    Previous ............................................................................................................................................... 12

    Poutcome ............................................................................................................................................ 12

    HYPOTHESIS ................................................................................................................................................ 13

    METHODOLOGY .......................................................................................................................................... 14

    Identification of important variables ...................................................................................................... 14

    Decision rules .......................................................................................................................................... 15

    Customers least likely to subscribe ..................................................................................................... 20

    Customers most likely to subscribe .................................................................................................... 20

    Underlying patterns among variables ..................................................................................................... 21

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    3/71

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    4/71

    4

    INTRODUCTION

    Objective

    We want to predict the chance that a customer will subscribe to a Certificate of Deposit. A

    Portuguese banking institution conducted a mass marketing campaign to sell CD subscriptions

    back in 2008-10. Our goal is to analyze the socio-economic life style of customers contacted as

    part of this campaign and derive a statistical model based on which we can predict the outcome

    for any similar marketing campaign from the Portuguese bank in the future. The idea is to

    develop a targeted marketing strategy on the basis of patterns observed in the historical data.

    Dataset description

    We have identified a dataset related to a direct marketing campaign of a Portuguese banking

    institution. The dataset contains detailed information on potential customers who were contacted

    as a part of the campaign. Data was randomly collected during the period May 2008 to

    November 2010 by making phone calls to the clients. Often more than one contact to the same

    client was required to access information on whether the client would subscribe to the CD. Our

    dataset consists of 17 different attributes, which are a combination of 7 numeric and 10

    categorical variables, and has a sample size of 45,211 records. Various parameters such as age,

    job, marital status, credit default, education, etc. have been taken into consideration.

    Data validation

    During our data validation we came across a few variables that had unknown values but werent

    classified as missing by our data source. Lets consider the variable poutcome, which signifies

    the outcome of a previous marketing campaign. Values of success or failure are self-

    justifiable. A value of other means that a previously contacted customer couldnt decide

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    5/71

    5

    whether he will subscribe to a CD. He was probably not sure of a subscription during the

    previous campaigns but didnt necessarily rule out the option of doing so at a later point in time.

    However for cases where thepoutcomevalue assumes unknown, none of the above scenarios

    can be justified. Our further investigation revealed a high collinearity of poutcome with the

    variable pdays. Except for 5 records rest all have a pdays value of -1 when poutcome is

    unknown. Pdays indicates the number of days that had passed by after the client was last

    contacted from a previous campaign and a value of -1 implies that the client wasnt previous

    contacted. This high correlation leads us to believe that apoutcomeof unknown simply means

    that the client was not contacted before during any previous marketing campaign and is therefore

    not a missing value. The 5 records for which poutcomeare unknown were considered erroneous

    entries and hence we decided to rule them out from our analysis.

    An unknown value in thejobvariable indicates that the occupation of the individual doesnt fall

    under any of the other 11 categories profiled by the bank.

    Variable contact, which denotes the mode of communication the bank used in contacting the

    customer, has unknownas one of the possible values. It means that people didnt share their

    contact information. These people were contacted through other means such as mail offers,

    electronic emails or a personal visit by the bank sales representative.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    6/71

    6

    ANALYSIS

    Description of variables

    Our first descriptive analysis was conducted on our dependent variable. The objective for our

    research is to define the strategy for a targeted marketing campaign in the future. In order to do

    so we first needed to understand how the current marketing campaign performed. A descriptive

    analysis of thesubscriptionvariation suggests that out of 45,206 customers contacted only 5,287

    of them had subscribed to the banks CD, a success rate of 11.7%. This is quite a low

    performance considering the amount of time and money wasted in contacting these customers

    not only once but repeatedly. Was repeated phone calls a good idea? Was the bank able to target

    the right set of customers based on their socio-economic behavior? What amount of resource was

    wasted behind those customers who didnt carrythe potential to subscribe? For us to be able to

    answer such questions we had to draw various hypotheses, prove or disprove them and finally

    collate the results together to identify the right customer profile.

    Age

    The distribution of age is not normal. However since our sample size is quite large, as per

    Central Limit Theorem, we are 99% confident that the average age of targeted customers was

    around 40 years (Exhibit 3-b). Upon performing the descriptive analysis with only successful

    subscription cases the results didnt change much. So it looks like the bank typically kept

    targeting people around 40 years old and hence the majority of the subscription cases came from

    this target group. However we tried to categorize them into logical age groups and found out that

    people around the age of 40 are among the least likely to subscribe to a CD. It seems people

    between 18 to 27 years of age, which includes undergraduate students, young professionals or

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    7/71

    7

    people pursuing their masters, have a good chance of subscription and also beyond 60 years of

    age peoples tendency to subscribe to a CD increases (Exhibit 1-a). This seems logical because

    people tend to retire after 60 and therefore a CD becomes the only source of income for their

    family.

    Job

    Analysis of successful subscription by job category revealed that students have the highest

    chance of subscription followed by retired and unemployed people. People working in

    management and administrative positions are also quite likely to subscribe. A cross sectional

    analysis of job category vs. age groups suggests that management and administrative positions

    are majorly filled with people between 28 to 37 years of age (Exhibit 1-b). It is usually the peak

    time of ones lifewhen people form families, have children and look out for additional sources

    of income. Therefore management and administrative workers can form a good target group.

    Following them is the group of people who are self-employed or have started their own business.

    Such individuals are always on the lookout for extra sources of cash probably because of the

    volatility of their business, requirement for extra funding in the future or incentive to save taxes.

    Marital

    A person was listed as either single or married or divorced. Among them married couples were

    targeted heavily followed by singles. From an analysis of success rates achieved in either of

    these categories we found that singles were most likely to subscribe to a CD followed by people

    who were divorced. Interestingly married couples who were the main target customers for the

    bank were ranked lowest (Exhibit 1-c).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    8/71

    8

    Education

    In Portugal, the education system is divided into three categories primary, secondary and

    tertiary. Primary education is free and compulsory for 9 years. Beyond that starts secondary

    education which is basically three years of education 10th

    , 11th

    and 12th

    . Higher education post

    the 12th

    is classified tertiary and includes undergraduate, masters or doctoral programs. Our

    dataset contains another category called unknown for the highest level of education received

    by a customer. Our research indicates that such cases occur when the customer decides not to

    disclose this information. From our analysis we found that people with tertiary education had the

    highest subscription rate compared to other. A general pattern that can be inferred from the graph

    is that as the level of education increases the subscription rate increases (Exhibit 1-d).

    Default

    The default variable measures whether a customer has defaulted in his/her credit payments. An

    overall indication of how efficiently the customer manages his/her credit score. About 11.79% of

    the customers who havent defaulted subscribed to a CD as compared to 6.38% of those

    customers who did. Also very few people with defaulted credit were contacted for the campaign,

    about 815 as opposed to 44,391 people who didnt default(Exhibit 1-e).

    Balance

    Distribution of the average yearly balance is not normal. It is highly right skewed, similar to

    what we observe generally for the distribution of income among people. The banks main targets

    were people with lower yearly balance in their account. However it looks like there wasnt any

    significant difference in subscription rates in the other yearly balance categories. We observed

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    9/71

    9

    two cases in the 80,000 to 90,000 Euro range, which had 100% subscription rate; however we

    dont have enough data to conclude whether this didnt happen merely by chance(Exhibit 1-f).

    Housing

    Almost 56% of the people who were approached during the campaign had a house loan.

    However, only 7.7% subscribed. On the contrary, 16.7% of the people who didnt have a housing

    loan underwent subscription. It seems it is easier for the bank to convince customers who do not

    have a housing loan (Exhibit 1-g).

    Loan

    Same as housing loan, if customers do not have any personal liabilities or debts to pay off,

    likelihood of them subscribing to a CD is more (Exhibit 1-h).

    Contact

    Majority of the customers were contacted through cellphone. The second most common way of

    reaching them was through mail offers, newsletters, or a bank sales representative visiting them

    personally. The least used method was to reach them on their landline. Our analysis shows that

    people tended to respond positively when contacted via cellphone more than when contacted via

    landline; and were least responsive to any other modes of communication. Cellphones and

    landlines offer the flexibility of negotiating the terms and conditions of a deal, whereas mail

    offers may be too generic. On the other hand a sales representative visit looks too aggressive.

    This leads us to believe that more reachable and interactive the communication is with the

    customer, the more probable he/she is to subscribe to a CD (Exhibit 1-i).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    10/71

    10

    Day

    We found that no matter during which day of the month the customers were contacted, the

    subscription rate remains fairly constant. Probably day of the month is not a very good predictor

    in our analysis (Exhibit 1-j).

    Month

    Subscription is highest during the months of March, September, October and December. These

    months are usually the festive seasons in Portugal. The country celebrates Rio-style carnivals

    during the month of March and the year-end is filled with events such as their Independence

    Day, Christmas, etc. High subscription rate during the festive season could be because of banks

    offering attractive interest rates or flexible deposit plans during the period (Exhibit 1-k).

    Duration

    Duration measures the time spent on call during the last contact with the customer. Data reveals

    that people who spent less than 10 minutes on call were less likely to subscribe to a CD. On the

    other hand call durations that lasted greater than 10 minutes show good subscription rates. In a

    typical call, a bank representative may take approximately 5-7 minutes to explain the initial set

    of terms and conditions of a plan to the customer. Rest of the time is mostly spent on discussing

    the Q&A, customers have. Someone who is not interested in any sort of proposal is less likely to

    prolong the call. However, longer call durations suggest that customers are more willing to hear

    the details of a subscription plan and probably have an interest. Most success came from calls

    lasting anywhere between 20-50 minutes (Exhibit 1-l).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    11/71

    11

    Campaign

    The campaign variable gives us the count of phone calls made to the customer during the current

    campaign that lasted from May 2008 to November 2010. Our analysis shows that people who

    were contacted 1-5 times during the campaign had a subscription rate of 12.32%, slightly higher

    than the overall success rate. Apparently most of the customers fell into the range of 1-5 phone

    calls. However, when people were contacted repeatedly more than 5 times, the subscription rate

    fell drastically. Generally people who are interested in fixed deposits will readily subscribe to a

    CD without being urged. And for those who are not, repeated calls arentgoing to change their

    minds all of a sudden; instead it might lead to more frustration and reduce chances of

    subscription further (Exhibit 1-m).

    Pdays

    Pdays measures the number of days that passed by after the client was last contacted from a

    previous campaign. We found that new customers were the main target for the bank, comprising

    almost 82% of the entire group of people who were contacted. It turns out the amount of

    subscription that came from new customers was considerably low (below the overall success

    rate) as opposed to what came from previously contacted customers. Among such customers,

    when there was a gap of more than a year from the last contact, the subscription rate was quite

    high. This may be because of a good relationship of the customer with the bank in the past year

    or returning customer who were happy with previous subscriptions. It has to be noted that

    customers who were contacted within 3 months also showed a good subscription rate of 43.25%,

    which possibly represents an ongoing campaign with customers in the formal process of

    subscription (Exhibit 1-n).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    12/71

    12

    Previous

    The marketing campaign has primarily focused on new customers who have been contacted less

    than 9 times in previous campaign. For ex: 44,845 customers have been contacted in the range of

    0-9 times with a success rate of 11.61%. Subscription rate was however high (23.83%) for

    customers who were contacted 10-19 times before this campaign. Beyond that subscription rate

    fell again. Generally in order to maintain a good reputation with customers, an optimum number

    of interactions are necessary. Customer can perceive too few calls as a sign of disinterest from

    the bank as well as too many calls may be thought of as an oversell. The bank should carefully

    consider their customer retention strategy. Interestingly we found a tremendously high

    subscription rate of 66.67% among customers who were contacted 50-60 times. However this

    could be more like a case of chance because only 3 customers were contacted in that range out of

    which only 2 subscribed (Exhibit 1-o).

    Poutcome

    Analysis of subscription data based on the results of the previous marketing campaign clearly

    shows that if the previous campaign was successful for a particular customer, then there is a

    higher chance that the customer will subscribe in the current campaign. This can be attributed to

    several factors such as trust developed with the bank, higher satisfaction rate, etc. Therefore the

    bank could focus their marketing efforts that target customers who opened an account during

    previous campaigns (Exhibit 1-p).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    13/71

    13

    HYPOTHESIS

    Our discussion of the variables has raised few interesting questions which we would like to

    answer. In order to do so we have formulated the following hypotheses which we will prove or

    disprove going forward.

    H1: Students or young professionals (18-27) and people at the verge of retiring (60) have a

    higher chance of subscription.

    H2: Customers who are singles are more likely to subscribe than when they are married.

    H3: The chance of subscription increases with a higher degree of education.

    H4: People who have a good credit history are good targets.

    H5: People with less financial liability such as personal loan are more likely to subscribe

    H6: People who are more reachable are more likely to subscribe

    H7: Chances of subscription increases during the months of festive season

    H8: The more the time spent on the call the more likely customers are to subscribe. However

    repeated calls reduce that chance

    H9: Returning customers have higher chance of subscription

    H10: Subscription chances are higher if customers are contacted a year after a previous

    campaign

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    14/71

    14

    METHODOLOGY

    Identification of important variables

    As a first step in the process of evaluating our hypotheses we wanted to understand the

    importance of each variable in the marketing campaign. For that we conducted a decision tree

    analysis involving all the variables. The dataset was partitioned into groups, with 2/3rd

    being

    used as training data and 1/3rd

    as validation data. Probability Chi-square statistics was used as

    our splitting criteria. Also because our dataset is large, we assumed that in order to form a

    significant group, there should be at least 100 customers in it or else we wouldnt consider it as a

    meaningful categorization.

    Analysis reveals that the misclassification rate is quite low. Only about 9% of the observations

    were categorized erroneously and our validation dataset follows this statistic very closely

    (Exhibit 2-a). It means that our decision tree model is reliable. The variable importance chart

    reflected some interesting findings. A good number of variables which we thought had

    considerable relevance to the bank marketing campaign were deemed unimportant. For example,

    we believed that bearing housing and personal loans were bad indicators of subscription. We

    thought contacting the customers repeated is going to deteriorate the chance of their subscription.

    Also educational qualification was thought to be an important consideration; needless to say we

    felt credit history of a person had substantial significance in the context of subscription chances.

    The variables that contributed to the model in a decreasing order of importance are duration,

    poutcome, month, age, marital status andcontact (Exhibit 2-b).

    On careful investigation of the leaf nodes we found that the decision tree has paid more focus on

    classifying those types of customers who are less likely to subscribe rather than identifying those

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    15/71

    15

    cases which are probably going to be a success. This works like a process of elimination, in

    which instead of trying to find an ideal match we get rid of cases that are unlikely to subscribe.

    So even though we dont know whois the right set of customers, we know with most certainty

    who we shouldnt target. It makes sense because there could be several other missing factors,

    which havent been taken into account in the campaign that can increase the probability of

    success.

    Decision rules

    The decision tree in Fig () displays the results of the marketing campaign to sell CD

    subscriptions. Customers who subscribed at the end of the campaign were coded 1. The root

    node shows that, of the 30,286 customers in the training dataset who were targeted, 11.7%

    subscribed to a CD whereas 88.3% did not (coded with 0).

    This decision tree could be used by the bank at several different points in making decisions on

    which groups of customers they should focus their marketing campaign (Exhibit 2-c).

    When the duration was less than 8.68 minutes VERSUS when the duration was greater

    than or equal to 8.68 minutes

    Under the root node, the first categorization of subscription was done based on duration.

    Duration was the most important factor in predicting the subscription. This factor has been

    applied several times for categorization in the decision tree. It is a general perception that when a

    person is interested in a bank product or any other service, the time spent on call with them will

    be more. When looking at duration of less than 8.68 minutes versus duration of greater than

    equal to 8.68 minutes, people who spend more time over the call (>8.68 minutes) have a higher

    subscription rate of 44.1% than people who spend less than 8.7 minutes who have a very low

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    16/71

    16

    subscription rate of just 7.7%. This also reconfirms the perception we had earlier. The bank

    could make use of this rule and focus on improving customer service by making attractive offers

    or training the representatives to keep the customers engaged on call for a longer duration.

    When the duration was less than 8.68 minutes and poutcome was Successful VERSUS

    when poutcome was Failure or Unknown

    When time spent on call was less than 8.68 minutes, customers were further categorized based

    on the outcome of the previous marketing campaign. One could expect that customers who spend

    less time over the call did not prefer the bank service. When looking at the decision tree, it

    clearly shows that customers who subscribed to a CD in the previous campaign (coded

    success) had a high subscription rate of 62.4% whereas customers who did not subscribe

    previously or were never contacted before (coded failure, unknown) had a very low

    subscription rate of 5.9%. From this decision rule, even if less time was spent on call, the bank

    should filter customers who had subscribed previously and put their marketing efforts towards

    them.

    When poutcome was successful and duration was less than 2.21 minutes VERSUS when

    duration was between 2.21 minutes to 8.68 minutes

    When the previous outcome was successful, the customers were further categorized once again

    based on duration. From the decision trees, it is clear that people with successful previous

    campaign and time spent between 2.21 minutes to 8.68 minutes have a good subscription rate of

    71.7% versus people who spend less than 2.21 minutes who have a low subscription rate of

    21.5%. Therefore even if the previous marketing campaign was successful and the second time

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    17/71

    17

    customers are easier to target, the bank representatives should try to keep the customers engaged

    for at least 2.21 minutes to increase the chances of subscription.

    When poutcome was failure, unknown or other, duration was less than 8.68 minutes and

    month was October, March, September VERSUS when month was January, February,

    April, May, June, July, August, November and December

    When poutcome was failure, unknown or the customer was unsure, it was further categorized by

    months. During the months of March, September and October the subscription rate was 37.7%

    which was not distinctive enough to claim that month was influential in deciding the subscription

    rate. Whereas for the rest of the months we can clearly see that the chances of subscription was

    quite low 4.7%.

    March, September and October are the festive months in Portugal. So in order to determine the

    subscription chances during these months we would have to look at other factors.

    When previous outcome is failure, unknown or other, month is October, March, September

    and duration is less than 2.9 minutes VERSUS when duration is greater than 2.9 minutes

    but less than 8.7 minutes

    As discussed before that we needed other factors to be considered for the months of March,

    September and October, the decision was based on duration i.e. lesser the time spent on call the

    lower is the subscription rate. People who spend less than 2.9 minutes had a subscription rate of

    17.6%. Therefore we can say for sure that subscription is most unlikely. Whereas those who

    spend more than 2.9 minutes have subscription rate of 57.4% but we should carefully consider

    other factors when making a decision. Even if more time was spent on call in these months we

    could not tell for sure whether the subscription would happen.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    18/71

    18

    When previous outcome is failure, unknown or other, month is January, February, April,

    May, June, July, August, November, December and duration is less than 4.31 minutes

    VERSUS when duration is greater than 4.31 minutes but less than 8.7 minutes

    People who were contacted during these months had a very low subscription rate of 4.1% as

    observed before; and if these people spent less than 4.31 minutes on call their subscription rate

    decreased further to 2.6%. However people who spent more than 4.31 minutes have a

    subscription rate of 11.0%, which is still low. But we will see in our upcoming analysis how age

    could be a deciding factor.

    When duration is greater than 4.31 minutes but less than 8.7 minutes, previous outcome is

    failure, unknown or other, month is January, February, April, May, June, July, August,

    November, December and age is less than or equal to 60.5 years VERSUS age is greater

    60.5 years

    In the above splitting rule, we mentioned that age could be a deciding factor. People who spent

    greater than 4.31 minutesbut less than 8.7 minutes but who were contacted during the months of

    January, February, April, May, June, July, August, November, December followed the same

    pattern of lower subscription rate if they were younger than 60.5 years old. But for people who

    are older than 60.5 years this pattern is no longer true. People beyond that age had fairly equal

    chances of subscribing or not subscribing.

    When duration was greater than 8.70 minutes but less than 13.8 minutes and poutcome

    was success VERSUS when poutcome was unknown or failure

    Under the node where duration is greater than 8.70 minutesbut less than 13.8 minutes were

    further categorized based on the outcome of the previous marketing campaign. We could again

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    19/71

    19

    predict that people who had subscribed previously with the bank and spent a good amount of

    time on the call have a higher chance of subscribing again. And decision tree confirms this;

    people who had a successful previous marketing campaign had a higher subscription rate of

    83.3% versus people who didnt respond positively in the previous marketing campaign who had

    a low subscription rate of 33.3%

    When duration is greater than or equal to 13.8 minutes and Marital status was single,

    divorced VERSUS when marital status was married

    The group of customers who have spent considerably high amount of time (>13.8 minutes) over

    the call were further categorized based on marital status. Irrespective of the marital status, people

    who spent the most time over the call had a decent subscription rate. People, who were single,

    divorced had a subscription rate of 63.8% and people who were married had a subscription rate

    of 54.4%. Given a chance the bank should focus more on customers who are single or divorced

    than married people.

    When duration was greater than or equal to 13.8 minutes, marital status was married and

    contact type was cellular VERSUS when contact type was unknown

    Among people who are married and had spent more than 13.8 minutesover the call are further

    categorized based on the contact method. People who were easily reachable had a higher chance

    of subscription. For example if the contact method was cellphone the subscription rate was 58%

    whereas people were contacted through snail mail or e-mail had a lower subscription rate of

    44.5%. Even though the subscription rate when contacted through cellphone was not

    significantly high it was relatively greater than being contacted via mail and e-mail. So it is

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    20/71

    20

    better to reach customers through cellphone or landline than any other method as it increases the

    chance of customers being engaged in a human interaction.

    Customers least l ikely to subscri be

    Therefore the cases we know for sure in which customers are least likely to subscribe are:

    1. Customers who have previously subscribed to a CD and spend less than 2.2 minutes on

    call (predictability of 78.5%).

    2. Customers who are being contacted for the first time or failed to subscribe during a

    previous marketing campaign, spend less than 4.3 minutes and are contacted during non-

    festive months (predictability of 97.4%).

    3. Customers, who are contacted during the non-festive months, either failed to subscribe

    during a previous marketing campaign or are first timers but spend between 4.3 - 8.7

    minutes on call and are younger than 60.5 years old (predictability of 89.9%).

    4. Customers, who are contacted during festive months, spend less than 2.9 minutes on call

    and those who didnt subscribe during a previous marketing campaign or were being

    contacted for the first time (predictability of 82.4%).

    Customers most likely to subscri be

    Similarly, cases in which we know for sure customers are most likely to subscribe are:

    1. Customer who spends between 2.213.8 minutes on call and had previously accepted a

    subscription offer (predictability of 73.3%).

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    21/71

    21

    Underlying patterns among variables

    Now that we have identified the set of important variables and rules to be considered for the

    bank marketing campaign we want to validate our analysis by fitting a logistic regression model

    to the data. But before we proceed, we would like to take a closer look at the correlation among

    the independent variables. This is necessary because we need to prevent multicollinearity issues

    from creeping into our model, which can inflate the coefficient estimates of our variables.

    Interesting enough, we didnt find any correlation among the numerical variables in our dataset

    (balance, duration, campaign and previous). The correlation and scatter plot matrix suggested

    very mild association, which will be of no concern in our analysis (Exhibit 4). Below is a tabular

    representation of the association among the variables:

    balance duration campaign previous

    balance

    duration

    campaign

    previous

    Strong correlation No correlation

    We were more curious about any association between the categorical variables because they

    formed the majority of our dataset. We carried out chi-square test of association between each

    pair of categorical variables and looked at their Crammers V statistics to identify any underlying

    relationship. A Crammers V estimateof 0.25 or higher suggests strong association between the

    variables whereas anything below is acceptable. The table below tells us which of the categorical

    variables have strong association:

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    22/71

    22

    Age Job Marital Education Default Housing Loan Contact Month Pdays Poutcome

    Age

    Job

    Marital

    Education

    Default

    Housing

    Loan

    Contact

    Month

    Pdays

    Poutcome

    Strong correlation

    Age and job

    We found age and job to be correlated. This makes sense because as people tend to grow older

    they get promotions and move to better job positions. For example a fresh out of college student

    is more likely to be placed in professional services or technical job as opposed to someone who

    is middle aged or around 60, who are more likely to be occupying management positions, self-

    employed or retired (Exhibit 5-a).

    Age and mar ital status

    Similarly age was also correlated with marital status, which makes even more sense because

    young people tend to be single more often than people in their 30s. And as they grow older they

    either remain married or get divorced (Exhibit 5-b).

    Job and education

    The higher the educational qualification of a person the higher are the chances of finding a

    sophisticated job. A person with Masters or PhD is more probable of serving management

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    23/71

    23

    positions or doing sophisticated technical jobs that require high degree of technical qualifications

    whereas someone with only basic school education is likely to end up in a blue collar profession

    (Exhibit 5-c).

    Job and housing

    The kind of people who are prone to taking a housing loan can be explained by their job profile.

    Lets take acase by case example. People who do not have job or have very low income such as

    students, housewives or retired employees are very likely not to take a housing loan because of

    the apprehensions in repayment. Moreover their background history may not be suitable enough

    for banks to extend such credit. Low salaried people on the other hand, such as blue collared

    professionals, clerks or people in services actively seek better living standards. Therefore they

    are more likely to accept housing loans. For self-employed individuals or persons working in

    management jobs, who have sufficient level of income do not care much about housing loans

    because they can afford quality living standards themselves (Exhibit 5-d).

    Housing and month

    Our data shows that majority of the people were contacted during Q2 and Q3 of the year. Out of

    that, Q2 had a high focus on people who had a housing loan whereas during Q3, people who

    didnt have any housing loans were targeted. Less focus was given on reaching out to customers

    during Q1 and Q4 which are generally the festive seasons in Portugal. This explains the high

    degree of association between the two variables. As discussed before we found that subscription

    rate was greater during the festive seasons, which was also supported by our decision tree

    analysis. We also notice that fairly equal amount of focus was given to people with and without

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    24/71

    24

    housing loans during Q1 and Q4. This leads us to believe that housing loan probably isnt that

    important a factor (Exhibit 5-e).

    Contact and month

    Customers were contacted mostly through mail offers during Q2 and through cellphone during

    Q3. As we talked about before, less attention was paid in reaching out to customers during the

    other two seasons (Q1 and Q4) via any means. This explains the correlation but there is no

    general understanding of such variation in communication type depending upon the season.

    Moreover our decision tree analysis suggests that both contact and month are important variables

    under consideration. Therefore we will keep them in our analysis (Exhibit 5-f).

    Pdays and poutcome

    There is very strong association between pdaysandpoutcome. New customers and people who

    didnt subscribe to a CD during a previous marketing campaign were the main focus of the

    campaign. A large portion of them were contacted within 6 months, and even more within a

    period of one year. Very few people were contacted after a year with least focus being given to

    existing customers who had subscribed during the previous campaign. This is generally the case;

    marketers always try to lure new customers into buying their products, but they often dont focus

    on servicing existing customers, probably because they take them for granted (Exhibit 5-g).

    Selection of relevant variables

    We decided to keep jobout of our analysis, because of its high dependency on age, education

    andhousing loan. Moreover, as there are several categories in the jobvariable, keeping them in

    the equation would be over fitting our model. Since job profiles can be so diverse, we need to

    keep room for new positions that may pop up in the future. Ageand education,which follow a

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    25/71

    25

    more standard classification, will do a better role. We decided to drop housing loan as well

    because we didnt think it was a good enough predictor of subscription.

    Rest all categorical variables are being kept in our model. Correlation amongst the numeric

    variables werent found noteworthy, so none of them were dropped. We ran general linear

    models across all numerical and categorical variables to find if there is any association between

    them. Most of them showed minor correlation but we couldnt drop any of them because all

    those variables seemed relevant to the banks marketing strategy.

    With respect topdaysandpoutcome, because of their high degree of association, we want to run

    two logistic models, one withoutpdaysand the other withoutpoutcome.

    Subscription Model (with poutcome)

    We got pretty satisfactory results on running our logistic model using the selected set of

    independent variables we just discussed above (Exhibit 6). Model predictability was quite high at

    88.94% as suggested by the c-statistic. Convergence criterion was satisfied for the model to be

    interpretable and the overall model was significant at level of 0.01, thus indicating that our

    model is a good fit. There were of course a few outliers, some of which had high leverage and

    some poorly accounted for by the model. We got rid of them to prevent them from altering our

    coefficient estimates drastically. Influential diagnostic suggested that our obtained estimates

    were quite stable after the cleansing.

    Non-significant factors

    As proposed by our decision tree results, variables default and previous were not statistically

    significant predictors in our model. Defaultindicates whether the customer defaulted in paying

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    26/71

    26

    debts. Although not significant in our model, defaulting to pay loans, which effects customers

    credit history, plays a crucial role in the financial services industry. Banks verify a persons

    payment history and background check before extending mortgage loans, car loans, etc. all the

    time. Therefore we do not want to lose an important like credit history in a banking model.

    Previous, which accounts for the number of contacts performed before this campaign for a client,

    was found not significant by a close margin (p-value = 0.0335, = 0.01). Moreover it makes

    sense to keep knowledge on the amount of effort spent a particular customer, how accustomed

    they are with the banks products, whether they are new clients, etc. Hencepreviouswas kept in

    the model as well.

    Testing our Hypothesis

    Note: All odds estimate between variables have been interpreted holding other variables constant

    H1: Students or young professionals (18-27) and people at the verge of retiring (60) have a

    higher chance of subscription.

    Turns out, our hypothesis is correct. We hypothesized that young professionals, fresh college

    pass outs and people 60 years and above are more like to subscribe to a CD. Ageis a significant

    factor in our model with a p-value less than 0.01. For our analysis we had categorized customers

    into the following age groups: 18-27, 28-45, 46-59 and 60+. Since people between 18-27 and

    60+ were our main focus, we used 28-45 as our reference age group. We found that the odds that

    people in the age group 18-27 will subscribe to a CD are 1.87 times the odds for people in the

    age group 28-45. That means that students or young professionals (18-27) are 87% more likely to

    subscribe when compared to people in their 28-45. And the odds of subscription for people 60

    years and above are 4 times the odds for people in the age group 28-45. Thus students or young

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    27/71

    27

    professionals (18-27) and people at the verge of retiring (60) should be the main target customers

    for the bank.

    H2: Customers who are singles are more likely to subscribe than when they are married.

    Marital status was a significant factor in our model. We compared the likely of subscription

    taking singles as our reference group. It seems the odds that singles will take a subscription offer

    are 1.26 to 1.57 times the odds of married people taking the offer, meaning about 41% greater

    chance of subscription. It is often the case that singles are less stable in their lives compared to

    married couples and therefore look for other sources of income for stability.

    H3: The chance of subscription increases with a higher degree of education.

    We cannot comment on people who didnt disclose their highest educational qualification, but

    their odds of subscription are 1.5 times the odds of people with only primary education. Primary

    education in Portugal is mandatory and free. Hence it is safe to assume that people who didnt

    disclose their educational status have either primary or more education. When we evaluate the

    chances of subscription between people with known education level, it looks like people with

    secondary education, meaning those who have an undergraduate degree or equivalent

    qualification, are 31% more likely than people with basic primary education to subscribe. And

    people who have attained tertiary education such as Masters or PhD are about 81% more likely

    than primary educated people for a subscription. Therefore it is true that subscription chance

    increases with a higher degree of education.

    H4: People who have a good credit history are good targets.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    28/71

    28

    As discussed, default was found to be not a statistically significant factor in our model. However

    we included it because of its business significance. Model results suggest that the odds of people

    with a good credit history, subscribing to a CD are 1.16 times the odds of people who have

    defaulted in paying off their financial debts. However the likeliness could vary largely between

    0.765 and 1.76, indicating that sometimes even people with a bad credit history could turn out to

    be a potential customer. This is true in some sense because not always do banks turn down

    clients with a bad credit history. Some banks even extend offers to such clients giving them a

    chance to improve their credit score. So it varies from case to case. All we can say is that our

    data doesnt contain sufficient evidence to validate our hypothesis about credit history.

    H5: People with less financial liability such as personal loan are more likely to subscribe

    There is statistically significant proof that people who do not have a personal loan to payback are

    more likely to subscribe, about 77% more. Our model suggests that the odds of people with no

    personal loan are 1.77 times the odds of people with a personal loan, for a subscription. If we

    think of it, people who have taken a personal loan will more likely be concerned about paying

    back their loans, which means they would have sufficient funds to invest into a CD.

    H6: People who are more reachable are more likely to subscribe

    Customers were contacted via various methods such as cellphone, landline and mail offers. Out

    of all, the highest chance of subscription came from people contacted via cellphones, followed

    by landline and then mail offers, as suggested by the odds ratios. People are more reachable

    through cellphones or landlines as opposed to mail offers. Customers usually prefer taking to a

    human rather than respond to targeted offline advertisements when it comes to dealing with

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    29/71

    29

    financial matters. Therefore we have statistically significant proof that people who are more

    reachable are more likely to subscribe.

    H7: Chances of subscription increases during the months of festive season

    Q1 and Q4 are the main festive seasons in Portugal. The odds of customers taking a subscription

    offer during Q1 are greater than any other quarters with the second highest being Q4. Odds for

    Q3 are close to Q4, indicating that both these seasons have a higher rate of subscription after Q1.

    However our estimate of Q4 wasnt statistically significant at = 0.01 (p-value = 0.0118).

    Moreover our descriptive statistics suggested that subscription rate was quite high during the

    months of September, which falls in Q3 and October & December in Q4. Statistical

    insignificance of Q4 could be because of less than the overall rate of subscription in November.

    Therefore Q1 is a definite target for bank representatives and with some confidence the last 4

    months of the year as well. But further investigation will be needed to the find the reason for low

    subscription rate in November.

    H8: The more the time spent on the call the more likely customers are to subscribe. However

    repeated calls reduce that chance

    Durationwas measured in seconds spent talking to the customer. For ease of understanding we

    will interpret the increased chance of subscription for minutes increase in time spent on call.

    Point estimate suggests that for every 10 minutes increase in the time spent on call, customers

    are 10 times more likely to take the offer. It makes quite a lot of sense because the longer a bank

    representative talks to the customer, the more probable it is that the customer is interested in

    knowing about the product, and therefore more likely to subscribe.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    30/71

    30

    Variable campaign, which tells us the number of times a customer was called, has a negative

    coefficient. This indicates that repeated number of calls reduces a customers chance of

    subscription. From the odds ratio we estimated that with every repeat call the chance of

    subscription decreases by 8.5%.

    Therefore ideally bank representatives should try to spend longer times explaining the deal on

    one call rather than calling them repeated.

    H9: Returning customers have higher chance of subscription

    We includedpoutcomein our analysis to prove or disprove this hypothesis. When poutcome is a

    failure i.e. the customer failed to subscribe during a previous marketing campaign, we cannot be

    totally sure if the customer is going to take the offer now, because our p-value for failure

    poutcome cases came out to be non-significant (p-value = 0.5371). However when customers did

    accept a previous offer, they are 12 times more likely to subscribe once again to a new offer. It is

    usually seen that existing customers were more likely satisfied with a previous deal, which is

    why they subscribed at the first place. Hence chances are high that they will subscribe again.

    Therefore targeting existing customers will be a good move and an essential customer retention

    strategy from the perspective of the bank.

    Subscription Model (with pdays)

    H10: Subscription chances are higher if customers are contacted a year after a previous

    campaign

    For the purpose of proving our last hypothesis we replacedpoutcomewithpdaysand ran another

    logistic model, keeping all other variables intact (Exhibit 7). Predictability of our second model

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    31/71

    31

    was 87.6%, approximately the same as our previous model. The coefficient estimates didnt

    change much, which is a good sign because pdays is high correlated with poutcome, hence

    replacing one with the other shouldnt vary our model drastically. We divided pdays into four

    categories 1) people who were being contacted for the first time (NC) 2) people who were

    contacted within 6 months after the last campaign 3) people who were contacted after 6 months

    but before one year and lastly 4) people who were contacted after one year. From our odds ratio

    we found that people who were contacted after a year were most likely to subscribe followed by

    people who were contacted within 6 months. For customers contacted between 6-12 months, the

    chance of subscription fell drastically. So the bank should reach out to customers after a year, at

    which point they would be 2.5 times more likely to subscribe compared to be in contacted

    between 6-12 months.

    We can draw an analogy combining our hypothesisH9andH10that repeated calls within a short

    period of time can increase customer frustration. People who had recently accepted or denied

    taking an offer are less likely to change their mind within a short span of time.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    32/71

    32

    CONCLUSION

    Recommended Marketing Strategy

    Our goal was to analyze the historical data of a marketing campaign conducted by a Portuguese

    bank in order to identify the important indicator variables that could help us predict subscription

    chances. These indicator variables are going to be used to devise a directed marketing strategy

    targeting only potential customers. The data collected over a two year period reflected

    statistically significant evidence in favor of a few key factors that are certainly crucial to

    marketers in the financial service industry. Marketers should pay special attention to existing

    customers, who have accepted a bank offer previously. They should also focus on people around

    the age 18-27, when they are usually single and divorced individuals above the age of 60.

    Because they lack the support of a spouse they tend to look for other means of financial stability

    and are hence more likely to subscribe. Highly educated people who have a good source of

    income form a good target audience. People with high salaries are less likely to take a loan

    because they have sufficient funds to afford their personal expenses and also enough disposable

    income to invest in banking products like CD. Marketers should focus on reaching out to these

    customers during the festive seasons (Q1) through cellphones and spend as much time as

    possible on the call. The more they keep them engaged the more likely they are to take the offer.

    However they should refrain from calling them repeatedly and must ensure a gap of at least a

    year before contacting them again. It is true, because in recent times, customer care calls have

    become so frequent that high volume of calls can lead to increased customer frustration and

    thereby reduce chances of subscription.

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    33/71

    33

    EXHIBIT

    Exhibit 1-a

    Red line: the overall success rate 11.7%

    Exhibit 1-b

    Red line: the overall success rate 11.7%

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    18-27 28-37 38-47 48-57 58-67 68-77 78-87 88-97

    Subscription (by Age category)

    0.00%

    5.00%

    10.00%

    15.00%

    20.00%

    25.00%

    30.00%

    35.00%

    Subscription (by Job category)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    34/71

    34

    Exhibit 1-c

    Red line: the overall success rate 11.7%

    Exhibit 1-d

    Red line: the overall success rate 11.7%

    single divorced married

    0.00%

    2.00%

    4.00%

    6.00%

    8.00%

    10.00%

    12.00%

    14.00%

    16.00%

    Subscription (by Marital status)

    0.00%

    2.00%

    4.00%

    6.00%

    8.00%

    10.00%

    12.00%

    14.00%

    16.00%

    tertiary unknown secondary primary

    Subscription (by Education level)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    35/71

    35

    Exhibit 1-e

    Default People contacted People subscribed

    no 44391 5235 11.79%

    yes 815 52 6.38%

    GrandTotal 45206 5287 11.70%

    Exhibit 1-f

    Exhibit 1-g

    Housing loan People contacted People subscribed

    no 20078 3353 16.70%

    yes 25128 1934 7.70%

    Grand Total 45206 5287 11.70%

    Exhibit 1-h

    Personal loan People contacted People subscribed

    no 37964 4804 12.65%

    yes 7242 483 6.67%

    Grand Total 45206 5287 11.70%

    0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%

    80.00%90.00%

    100.00%

    Subscription (by Balance amount)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    36/71

    36

    Exhibit 1-i

    Contact People contacted People subscribed

    cellular 29280 4367 14.91%

    telephone 2906 390 13.42%

    unknown 13020 530 4.07%Grand Total 45206 5287 11.70%

    Exhibit 1-j

    Day People contacted People subscribed

    1-10 13724 1733 12.63%

    11-20 18387 2024 11.01%

    21-31 13095 1530 11.68%

    Grand Total 45206 5287 11.70%

    Exhibit 1-k

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    jan feb mar apr may jun jul aug sep oct nov dec

    Subscription (by Month)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    37/71

    37

    Exhibit 1-l

    Exhibit 1-m

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    Contact duration (secs)

    Subscription (by Duration)

    0.00%

    2.00%

    4.00%

    6.00%

    8.00%

    10.00%

    12.00%

    14.00%

    Subscription (by No. of phone calls)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    38/71

    38

    Exhibit 1-n

    Exhibit 1-o

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    Subscription (by pdays)

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    0-9 10-19 20-29 30-39 40-49 50-59 270-279

    Subscription (by No. of previous

    phone calls)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    39/71

    39

    Exhibit 1-p

    Exhibit 2-a

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    success other failure unknown

    Subscription (by previous campaign

    outcome)

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    40/71

    40

    Exhibit 2-b

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    41/71

    41

    Exhibit 2-c

    Exhibit 3

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    42/71

    42

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    43/71

    43

    Exhibit 4

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    44/71

    44

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    45/71

    45

    Exhibit 5-a

    descriptive analysis of successful cases

    The FREQ Procedure

    Table of age by job

    job(job)admin. blue- entrepreneur housemaid management retired self- services student technician unemployed unknown Total

    age collar employed18- 410 627 53 25 351 3 91 366 593 436 85 9 304927 0.91 1.39 0.12 0.06 0.78 0.01 0.20 0.81 1.31 0.96 0.19 0.02 6.74

    13.45 20.56 1.74 0.82 11.51 0.10 2.98 12.00 19.45 14.30 2.79 0.307.93 6.44 3.56 2.02 3.71 0.13 5.76 8.81 63.22 5.74 6.52 3.13

    28- 3366 6302 897 547 6303 76 1014 2747 342 5218 791 107 2771045 7.45 13.94 1.98 1.21 13.94 0.17 2.24 6.08 0.76 11.54 1.75 0.24 61.30

    12.15 22.74 3.24 1.97 22.75 0.27 3.66 9.91 1.23 18.83 2.85 0.3965.11 64.76 60.32 44.11 66.66 3.36 64.22 66.13 36.46 68.69 60.71 37.15

    46- 1314 2705 511 567 2610 1095 433 1019 3 1854 409 144 1266459 2.91 5.98 1.13 1.25 5.77 2.42 0.96 2.25 0.01 4.10 0.90 0.32 28.01

    10.38 21.36 4.04 4.48 20.61 8.65 3.42 8.05 0.02 14.64 3.23 1.1425.42 27.79 34.36 45.73 27.60 48.39 27.42 24.53 0.32 24.41 31.39 50.00

    60+ 80 98 26 101 192 1089 41 22 0 88 18 28 17830.18 0.22 0.06 0.22 0.42 2.41 0.09 0.05 0.00 0.19 0.04 0.06 3.944.49 5.50 1.46 5.66 10.77 61.08 2.30 1.23 0.00 4.94 1.01 1.571.55 1.01 1.75 8.15 2.03 48.12 2.60 0.53 0.00 1.16 1.38 9.72

    Total 5170 9732 1487 1240 9456 2263 1579 4154 938 7596 1303 288 4520611.44 21.53 3.29 2.74 20.92 5.01 3.49 9.19 2.07 16.80 2.88 0.64 100.00

    Statistics for Table of age by job

    Statistic DF Value Prob

    Chi-Square 33 19298.9919

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    46/71

    46

    Exhibit 5-b

    Descriptive analysis of successful cases

    The FREQ Procedure

    Frequency Table of age by marital

    Percent marital(marital)Row PctCol Pct age divorced married single Total

    18-27 45 598 2406 30490.10 1.32 5.32 6.741.48 19.61 78.910.86 2.20 18.81

    28-45 2618 15770 9322 277105.79 34.88 20.62 61.309.45 56.91 33.64

    50.28 57.95 72.90

    46-59 2241 9424 999 126644.96 20.85 2.21 28.01

    17.70 74.42 7.8943.04 34.63 7.81

    60+ 303 1419 61 17830.67 3.14 0.13 3.94

    16.99 79.58 3.425.82 5.21 0.48

    Total 5207 27211 12788 4520611.52 60.19 28.29 100.00

    Statistics for Table of age by marital

    Statistic DF Value Prob

    Chi-Square 6 7552.3473

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    47/71

    47

    Statistic DF Value Prob

    Chi-Square 11 3590.2589

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    48/71

    48

    Exhibit 5-f

    descriptive analysis of successful cases

    The FREQ Procedure

    Frequency Table of contact by month

    Percent month(month)Row PctCol Pct contact(contact) Q1 Q2 Q3 Q4 Total

    cellular 4044 8786 12182 4268 292808.95 19.44 26.95 9.44 64.77

    13.81 30.01 41.61 14.5889.29 39.87 88.79 86.77

    telephone 456 739 1164 547 29061.01 1.63 2.57 1.21 6.43

    15.69 25.43 40.06 18.8210.07 3.35 8.48 11.12

    unknown 29 12513 374 104 130200.06 27.68 0.83 0.23 28.800.22 96.11 2.87 0.800.64 56.78 2.73 2.11

    Total 4529 22038 13720 4919 4520610.02 48.75 30.35 10.88 100.00

    Statistics for Table of contact by month

    Statistic DF Value Prob

    Chi-Square 6 16487.9794

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    49/71

    49

    6.18 2.25 1.33 0.00 9.7763.26 23.08 13.66 0.0056.99 55.38 39.91 0.00

    NC 0 0 0 36954 369540.00 0.00 0.00 81.75 81.750.00 0.00 0.00 100.000.00 0.00 0.00 100.00

    Total 4901 1840 1511 36954 4520610.84 4.07 3.34 81.75 100.00

    Statistics for Table of pdays by poutcome

    Statistic DF Value Prob

    Chi-Square 9 46386.7985

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    50/71

    50

    60+ 0 0 1

    marital divorced 1 0

    married 0 1

    single 0 0

    education primary 0 0 0

    secondary 1 0 0tertiary 0 1 0

    unknown 0 0 1

    default no 0

    yes 1

    loan no 0

    yes 1

    contact cellular 1 0

    telephone 0 1

    unknown 0 0

    month Q1 0 0 0Q2 1 0 0

    Q3 0 1 0

    Q4 0 0 1

    poutcome failure 1 0 0

    other 0 1 0

    success 0 0 1

    unknown 0 0 0

    Model Fit StatisticsIntercept

    Intercept and Criterion Only Covariates

    AIC 32614.294 22833.093

    SC 32623.013 23033.627

    -2 Log L 32612.294 22787.093

    Testing Global Null Hypothesis: BETA=0

    Test Chi-Square DF Pr > ChiSq

    Likelihood Ratio 9825.2014 22

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    51/71

    51

    default 1 0.8578 0.3544

    balance 1 17.6666

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    52/71

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    53/71

    53

    Partition for the Hosmer and Lemeshow Test

    subscription = 1 subscription = 0

    Group Total Observed Expected Observed Expected

    1 4520 8 35.98 4512 4484.02

    2 4520 21 68.61 4499 4451.39

    3 4520 31 108.50 4489 4411.504 4520 61 156.42 4459 4363.58

    5 4522 104 209.42 4418 4312.58

    6 4520 212 271.99 4308 4248.01

    7 4520 369 360.62 4151 4159.38

    8 4520 672 521.26 3848 3998.74

    9 4520 1246 917.92 3274 3602.08

    10 4519 2561 2634.43 1958 1884.57

    Hosmer and Lemeshow Goodness-of-FitTest

    Chi-Square DF Pr > ChiSq443.7519 8

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    54/71

    54

    0.200 3233 36567 3349 2052 88.1 61.2 91.6 50.9 5.3

    0.300 2564 37945 1971 2721 89.6 48.5 95.1 43.5 6.7

    0.400 2113 38592 1324 3172 90.1 40.0 96.7 38.5 7.6

    0.500 1716 38997 919 3569 90.1 32.5 97.7 34.9 8.4

    0.600 1282 39274 642 4003 89.7 24.3 98.4 33.4 9.2

    0.700 914 39494 422 4371 89.4 17.3 98.9 31.6 10.0

    0.800 587 39648 268 4698 89.0 11.1 99.3 31.3 10.6

    0.900 269 39785 131 5016 88.6 5.1 99.7 32.8 11.2

    1.000 0 39916 0 5285 88.3 0.0 100.0 . 11.7

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    55/71

    55

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    56/71

    56

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    57/71

    57

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    58/71

    58

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    59/71

    59

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    60/71

    60

    Exhibit 7

    logistic regression

    The LOGISTIC Procedure

    Model Information

    Data Set MYSAS.BANK_RECODED

    Response Variable subscription subscription

    Number of Response Levels 2

    Model binary logit

    Optimization Technique Fisher's scoring

    Number of Observations Read 45201

    Number of Observations Used 45201

    Response ProfileOrdered Total

    Value subscription Frequency

    1 0 39916

    2 1 5285

    Probability modeled is subscription=1.

    Class Level Information

    Class Value Design Variables

    age 18-27 1 0 0

    28-45 0 0 0

    46-59 0 1 0

    60+ 0 0 1

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    61/71

    61

    marital divorced 1 0

    married 0 1

    single 0 0

    education primary 0 0 0

    secondary 1 0 0

    tertiary 0 1 0unknown 0 0 1

    default no 0

    yes 1

    loan no 0

    yes 1

    contact cellular 1 0

    telephone 0 1

    unknown 0 0

    month Q1 0 0 0

    Q2 1 0 0Q3 0 1 0

    Q4 0 0 1

    pdays 1 year 1 0 0

    6 months 0 1 0

    6-12 months 0 0 1

    NC 0 0 0

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    62/71

    62

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    63/71

    63

    Odds Ratio Estimates99% Wald

    Effect Point Estimate Confidence Limits

    age 18-27 vs 28-45 2.006 1.719 2.341

    age 46-59 vs 28-45 1.102 0.983 1.234

    age 60+ vs 28-45 4.511 3.791 5.367

    marital divorced vs single 0.830 0.708 0.974

    marital married vs single 0.706 0.634 0.787

    education secondary vs primary 1.353 1.165 1.571

    education tertiary vs primary 1.893 1.619 2.213

    education unknown vs primary 1.687 1.324 2.151

    default yes vs no 0.828 0.546 1.254

    balance 1.000 1.000 1.000

    loan yes vs no 0.522 0.451 0.606

    contact cellular vs unknown 3.685 3.131 4.336

    contact telephone vs unknown 3.099 2.454 3.914

    month Q2 vs Q1 0.763 0.663 0.878

    month Q3 vs Q1 0.735 0.637 0.848

    month Q4 vs Q1 0.808 0.686 0.951

    duration 1.004 1.004 1.004

    campaign 0.907 0.884 0.930

    previous 1.018 0.995 1.042

    pdays 1 year vs NC 3.937 3.012 5.146

    pdays 6 months vs NC 3.112 2.672 3.624

    pdays 6-12 months vs NC 1.586 1.362 1.848

    Association of Predicted Probabilities and Observed Responses

    Percent Concordant 87.6 Somers' D 0.753

    Percent Discordant 12.4 Gamma 0.753

    Percent Tied 0.0 Tau-a 0.155

    Pairs 210956060 c 0.876

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    64/71

    64

    Partition for the Hosmer and Lemeshow Test

    subscription = 1 subscription = 0

    Group Total Observed Expected Observed Expected

    1 4520 5 34.29 4515 4485.71

    2 4520 22 67.24 4498 4452.76

    3 4520 31 107.16 4489 4412.844 4520 66 158.74 4454 4361.26

    5 4520 136 218.02 4384 4301.98

    6 4520 232 293.10 4288 4226.90

    7 4521 440 406.52 4081 4114.48

    8 4521 716 605.77 3805 3915.23

    9 4520 1264 997.32 3256 3522.68

    10 4519 2373 2396.82 2146 2122.18

    Hosmer and Lemeshow Goodness-of-FitTest

    Chi-Square DF Pr > ChiSq331.9394 8

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    65/71

    65

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    66/71

    66

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    67/71

    67

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    68/71

    68

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    69/71

    69

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    70/71

    70

  • 8/12/2019 Targeted Bank Marketing campaign Research Paper (predictive analytics)

    71/71