Starting from:

$30

MDS-Manufacturing Data Science Solved

1. Linear Regression Analysis for Wine Quality
For the attached metal furnace dataset (MDS_Assignment1_furnace.csv), please use “multiple regression” to find the potential linear pattern (i.e., linear regression equation) for 621 observations with 28 input variables (f0-f27) and 1 output variable (grade) (label variable is regarded as continuous variable . Pl

Python software and package: 
(aShow the results of regression analysis as follows. 
variable 
estimate 
std. error 
t-value 
p-value 
f0 
 R-squared:  0.xxxx, Adjusted R-squared:  0.xxxx
(b) The fitting of the linear regression is a good idea? If yes, why? If no, why? What’s the possible reason of poor fitting?  

(c) Based on the results, rank the independent variables by p-values and which one are statistically significant variables with p-values<0.01? (i.e. 重要變數挑選) 

(d)  Testify the underlying assumptions of regression (1) Normality, (2) Independence, and (3) Homogeneity of Variance with respect to residual. 


Context 
Manufacturing of any alloy is not a simple process. Many complicated factors are involved in the making of a perfect alloy, from the temperature at which various metals are melted to the presence of impurities to the cooling temperature set to cool down the alloy. Very minor changes in any of these factors can affect the quality or grade of the alloy produced. 

Content 


Given are 28 distinguishing factors in the manufacturing of an alloy, the objective is to build a Machine Learning model that can predict the grade of the product using these factors. 

You are provided with 28 anonymized factors (f0 to f27) that influence the making of a perfect alloy that is to be used for various applications based on the grade/quality of the obtained product. 

2.  Data Preprocessing and Generalized Linear Model (GLM)/Logistic Regression 

This dataset can be used to predict the census income and it can be collected from the 1994 Census database. Data set is MDS_Assignment1_census.csv and data source is https://archive.ics.uci.edu/ml/datasets/Census+Income. The dataset includes 48842 observations, 14 attributes, and 1 response variable. The last attribute is the “Class” label. 

3.   Association Rule- Market Basket Analysis 

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket – and therefore ‘Market Basket Analysis’. 

That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. You can see the Groceries data set (groceries.csv). Use “association rule” to find the potential patterns which satisfy the following criterion: 

⚫   Set the minimum support to 0.001 

⚫   Set the minimum confidence of 0.15 

More products