Starting from:


CS3120-Homework 2 Logistic Regression for Binary Classification Solved

1.     Select a dataset with binary target values using

       e.g. banknote or diabetes dataset


2.     Use pandas to read CSV file as dataframe. e.g. The following code helps import pima diabetes dataset

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']

# load dataset

pima = pd.read_csv("pima-indians-diabetes-database.csv", header=None, names=col_names)


3.     Select 5 (if not possible then select 4) features from the chosen dataset.  List all features you selected in your report.

For example, the following code will select two features

feature_cols = ['pregnant', 'age']

X = pima[feature_cols]


4.     Use “train _test_split” from “sklearn.cross_validationtrain” to split test and training data by 40% testing + 60% training.   

5.     Fit your model with training data and test your model after fitting.


6.     Calculate and plot out

 the confusion matrix 

 precision score, recall score, F score

 Copy your console output (these scores) to your report.


7.     Plot out the ROC curve and print out the ROC_AUC score (sklearn.metrics.roc_curve() and sklearn.metrics.roc_auc_score() can be used.)

More products