$25
Banking Insurance Product –
Phase 2: IP – F1.H2
Purpose
By responding to this Request for Proposal (RFP), the Proposer agrees that s/he has read and understood all documents within this RFP package.
Background
The Commercial Banking Corporation (hereafter the “Bank”), acting by and through its department of Customer Services and New Products is seeking proposals for banking services. The Bank ultimately wants to predict which customers will buy a variable rate annuity product.
A variable annuity is a contract between you and an insurance company / bank, under which the insurer agrees to make periodic payments to you, beginning either immediately or at some future date. You purchase a variable annuity contract by making either a single purchase payment or a series of purchase payments.
A variable annuity offers a range of investment options. The value of your investment as a variable annuity owner will vary depending on the performance of the investment options you choose. The investment options for a variable annuity are typically mutual funds that invest in stocks, bonds, money market instruments, or some combination of the three. If you are interested in more information, see:
http://www.sec.gov/investor/pubs/varannty.htm
The project will be broken down into 3 phases:
• Phase 1 – Variable Understanding and Assumptions
• Phase 2 – Variable Selection and Modeling Building
• Phase 3 – Model Assessment and Prediction
Objective – Phase 2
The scope of services in this phase includes the following:
• For this phase use only the binned training data set.
• Based on your first report, the Bank has strategically binned each of the continuous variables in the data set to help facilitate any further analysis.
o For any variable with missing values, change the data to include a missing category instead of a missing value for the categorical variable.
§ (HINT: Now all variables should be categorized (treated as categorical variables so no more continuous variable assumptions) and without missing values. Banks do this for more advanced modeling purposes that we will talk about in the spring.)
o Check each variable for separation concerns. Document in the report and adjust any variables with complete or quasi-separation concerns.
• Build a main effects only binary logistic regression model to predict the purchase of the insurance product.
o Use backward selection to do the variable selection – the Bank currently uses 𝛼 = 0.002 and p-values to perform backward, but is open to another technique and/or significance level if documented in your report.
o Report the final variables from this model ranked by p-value.
§ (HINT: Even if you choose to not use p-values to select your variables, you should still rank all final variables by their p-value in this report.)
• Interpret one variable’s odds ratio from your final model as an example.
o Report on any interesting findings from your odds ratios from your model.
§ (HINT: This is open-ended and has no correct answer. However, you should get use to keeping an eye out for what you might deem important or interesting when exploring data to report in an executive summary.)
• Investigate possible interactions using forward selection including only the main effects from your previous final model.
o Report the final interaction variables from this model ranked by p-value.
• Report your final logistic regression model’s variables by significance.
o (HINT: These steps are here to help you build your model, but not to tell you which order to write your report. Consider the most important information when done with these questions and write your report accordingly.)
Data Provided
The following two sets of data are provided for the proposal:
• The training data set insurance_t_bin contains 8,495 observations and 47 variables.
o All of these customers have been offered the product in the data set under the variable INS, which takes a value of 1 if they bought and 0 if they did not buy.
o There are 46 variables describing the customer’s attributes before they were offered the new insurance product.
o The Bank has strategically binned each of the continuous variables in the data set to help facilitate any further analysis.
§ (HINT: The original insurance_t and the new insurance_t_bin can be 1:1 row matched in case you wanted to know where the bins were split on.)
• The validation data set insurance_v_bin contains 2,124 observations and 47 variables.
• The table below describes the Roles and Description of the variables found in both data sets.
Name Model Role Description
ACCTAGE
Input
Age of oldest account
DDA
DDABAL
DEPAMT
CASHBK
CHECKS
DIRDEP NSF
NSFAMT
PHONE
TELLER SAV
SAVBAL
ATM
ATMAMT
POS
POSAMT
CD
CDBAL IRA
IRABAL LOC
LOCBAL INV
INVBAL
ILS
ILSBAL MM
MMBAL
MMCRED MTG
MTGBAL
CC
CCBAL
CCPURC SDB
INCOME
HMOWN
LORES
HMVAL AGE
Input
Indicator for checking account
Input
Checking account balance
Input
Total amount deposited
Input
Number of cash back requests
Input
Number of checks written
Input
Indicator for direct deposit
Input
Number of insufficient fund issues
Input
Amount of NSF
Input
Number of telephone banking interactions
Input
Number of teller visit interactions
Input
Indicator for savings account
Input
Savings account balance
Input
Indicator for ATM interaction
Input
Total ATM withdrawal amount
Input
Number of point of sale interactions
Input
Total amount for point of sale interactions
Input
Indicator for certificate of deposit account
Input
CD balance
Input
Indicator for retirement account
Input
IRA balance
Input
Indicator for line of credit
Input
LOC balance
Input
Indicator for investment account
Input
INV balance
Input
Indicator for installment loan
Input
ILS balance
Input
Indicator for money market account
Input
MM balance
Input
Number of money market credits
Input
Indicator for mortgage
Input
MTG balance
Input
Indicator for credit card
Input
CC balance
Input
Number of credit card purchases
Input
Indicator for safety deposit box
Input
Income
Input
Indicator for home ownership
Input
Length of residence in years
Input
Value of home
Input
Age
CRSCORE
Input
Credit score
MOVED
INAREA INS
BRANCH RES
Input
Recent address change
Input
Indicator for local address
Target
Indicator for purchase of insurance product
Input
Branch of bank
Input
Area classification