Starting from:

$25

CSCI5260- Lab 10: Scikit-Learn Solved

Lab 10 – Scikit-Learn

Overview
The most common machine learning library in Python is scikit-learn. This lab walks you through the sklearn library to perform machine learning tasks. Step 1 – Tutorial

Review the tutorial at https://scikit-learn.org/stable/tutorial/basic/tutorial.html.  Step 2 – Explore

Download the lab10.py file, which has the following features:

Imports:numpy – for Linear Algebra o pyplot – for Plotting results o Scikit-Learn Data Manipulation:datasets 
model_selection.train_test_split 
model_selection.learning_curve o Scikit-Learn ML Classifiers
linear_model.LinearRegression 
svm.SVC 
tree.DecisionTreeClassifier § sklearn.neighbors.KNeighborsClassifier 
Functions: o linear_regression(X_train, Y_train, X_test, Y_test) o support_vector_machine(X_train, Y_train, X_test, Y_test) o decision_tree(X_train, Y_train, X_test, Y_test) o k_nearest_neighbors(X_train, Y_train, X_test, Y_test) o split_test_train(test_percent, X, Y)plot_learning_curve(estimator, title, X, y, axes=None, ylim=None, cv=None, n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)) Step 3 – Complete the Code
Setup Classifiers
Each of the four ML functions should have the following basic layout. Use this layout to complete the code for each.

Set an estimator variable equal to the appropriate classifier.
Set a model variable equal to the estimator’s fit() method, passing in the X_train and Y_train This returns the model created by the classifier based on the training set.
Set a score variable by calling the score() method, passing in the X_test and Y_test This returns the average accuracy of the model.
Return estimator, model, score
Setup Data SetsLoad the iris data set from the sklearn.datasets.load_iris() method.
Store observed data from iris.data in a variable named X.
Store labels for the observed data from iris.target in a variable named y.
 

Note that X[0] is a vector whose label is y[0].

 

Explore the data. Determine how many classes exist, and how many observations exist within each class. Is the data balanced?
Create a test and training set. Note that split_test_train returns in this order: X_train, X_test, Y_train, Y_test. Note also that you should decide how large the test set should be. A typical train/test split is 70/30 or 80/20.
Perform Machine LearningCall each function to perform the machine learning and create models based on each classifier. Collect the output in variables, keeping in mind that each function returns estimator, model, and score.
Plot the learning curve for each classifier. Call the plot_learning_curve method, passing the appropriate estimator, a title, and the X and y train variables.
Output the value of the testing score results for each model to the console.
Analyze Results
Use the information above to analyze the results. Include screenshots of your learning curves and include the average accuracy scores.

Which classifier performed best based on your train/test split? Why do you think it outperformed the others? Use the screenshots to justify your answer.
Try a different train/test split. Did this affect the results? If so, how? Record the screenshots for the new train/test split.
As a note, the train/test split will be different each time you run the program, which can affect results. Most often, people run several iterations to determine an overall average accuracy. You only need to run once here.

More products