$30
In the second project you are using the same data as for the first one. The project has two parts:
Part I
The first part aims at creating a predictive model to classify whether a client will buy or not the new product. You have to use at least 3 distinct methods and to assess how good are the predictions made by your models.
Part II
The second part refers to clustering. Forget about the subscribe variable. You focus on client related data namely the variables
# bank client data:
1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education (categorical: basic.4y','basic.6y','basic.9y', 'high.school','illiterate','professional.course', 'university.degree','unknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7- loan: has personal loan? (categorical: 'no','yes','unknown')
# other attributes:
8 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
9 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
10- previous: number of contacts performed before this campaign and for this client (numeric)
11 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
Here we want to use these variables to cluster the clients and to characterize the clusters. Does your final clustering relates to the subscribe variable?
You can use whatever method you like. You need to explain why you have selected the respective variables from the list to use and in general you need to describe in sufficient detail your approach.