Starting from:

$30

MSDS6372- Project 1 Solved

Introduction 

We will analyze the relationship between vehicle characteristics, MSRP and the relevance of the popularity score that is calculated across platforms.  We will explore what vehicle characteristics influence MSRP and whether the popularity rating has an influence in the price of a vehicle. 

 

Data Clean Up 

Upon closer inspection, it was determined that the data required some clean-up and pre-processing before fitting the models. The data in appendix sections 1.2 and 1.3 illustrates how these values confound the true vehicle price. 

 

Categorical values

First, we examined the categorical attributes. Make and Model were excluded from the model due to the excess number of unique values.  

 

Continuous values: 

Plotting MSRP by year revealed a data quality issue, where vehicles manufactured prior to 1999 had a default value of $2000, which we believe would confound the true relationship of a vehicle’s price and the age of the car. We limited the data to cars manufactured after the year 2000.  A strong left skew was evident in the MSRP values, which is common in monetary values, as a result, we log transformed it for our analysis. 

 

MSRP by Year                                                           Distribution of MSRP Values  

                                             

 

Missing Data: 

Engine HP had 69 missing values and Engine Cylinders had 30, we replaced them with the their median as determined by car size. 

Engine Fuel Type and Transmission Type had blanks and ‘unknown’ values, we removed the 5 blank records, kept the category ‘unknown’ and excluded natural gas. 

 

Outliers 

There were two sets of outliers in the MSRP data. As seen in Appendix 1.6, we can observe there are car values for exceptionally expensive vehicles, we limited the vehicle data set to cars valued under $100,000 



Data Types: 

There were some inconsistencies in the numerical data types so we aligned them as doubles. Any attributes that were updated we appended the word Clean to the end of the column name. 

 

 

Test / Train / Validation Data: 

The data will be divided into Train, Test and Validation to ensure the models are not corrupted by the test results. 

 

Popularity: 

Evaluating the Popularity data in Appendix 1.7 it appears very bi-modal, we were not able to effectively delineate what drives the two distinct segments, there may be other attributes that are not considered in our dataset influencing this data, it could also be due to the collection method, source would be a good addition to this analysis.  Looking at more complex models, it was evident Popularity was influential as a interaction feature, this merits further exploration. 

 

Evaluation after Data Prep: 

Evaluating the data before and after the imputation and update of missing values, as seen in the summary statistics in Appendix 1.8, we can see that the population mean remained the about the same. 

 

Objective 1, Interpretative Model 

Aiming for interpretability, we modeled the linear relationship between various individual vehicle characteristics and MSRP.  To preserve interpretability of the model, we did not pursue interaction or quadratic terms.  We assessed the linear relationships, multicollinearity and verified the assumptions with MLR to determine which attributes were statistically significant.  We then applied a lasso regression to finalize the variable selection.  Appendix 2.5. 

 

Collinearity 

After preparing the data for modeling, we evaluate the relationship of the data by generating a correlation plot, We can observe that the correlations between Highway and City MPG, Engine HP and Number of Doors show signs of collinearity. 


 

We verify our observations, with a first pass linear modeling of MSRP vs the cleaned subset of data, including the highly correlated attributes. 

 

The pattern of the residuals on Appendix 2.1, appears random and the QQ plot looks normally distributed. After confirming our assumptions, we move on to look at the model results. The Variance Inflation Factor data for the regressor variables, , appendix 2.2, confirm the highly correlated attributes also carry a VIF greater than 2, so we will remove City MPG from the first pair and Number of Doors and reassess our model. 

 

Lasso Regression 

We ran a lasso regression, iterating to minimize lamda, and identified the variables that did not shrink to zero.  The highest coefficients were Cylinders, Fuel Type, Vehicle Style, Transmission, Driven Wheels and Vehicle Style, in that order. 

 

Linear Model 

Once we ran the lasso regression, we selected the attributes with coefficients that were not reduced to zero and applied them to our linear regression model for final assessment and interpretation. The resulting model was: 

   

This model implies that the price of a vehicle depends on the Cylinders, the Engine Fuel Type the Vehicle Style and the Year a car was made. 

 

Median (MSRP | Engine Cylinders + Engine Fuel Type + Vehicle Style + Year) = 

 

β_0(Intercept)  

+

β_1(Engine.Cylinders.Clean4) + β_2(Engine.Cylinders.Clean5) + β_3(Engine.Cylinders.Clean6) + β_4(Engine.Cylinders.Clean8 + β_5(Engine.Cylinders.Clean10)+ β_6(Engine.Cylinders.Clean12)

 +

β_7(Engine.Fuel.Typeflex-fuel (premium unleaded recommended)+ β_8(Engine.Fuel.Typeflex-fuel (premium unleaded required)+ β_9(Engine.Fuel.Typeflex-fuel (unleaded/E85)) + β_10(Engine.Fuel.Typeflex-fuel (unleaded/natural gas))+ β_11(Engine.Fuel.Typepremium unleaded (recommended))+ β_12(Engine.Fuel.Typepremium unleaded (required)) + β_13(Engine.Fuel.Typeregular unleaded)    

+

β_14(Vehicle.Style2dr SUV) + β_15(Vehicle.Style4dr Hatchback) + β_16(Vehicle.Style4dr SUV) + β_17(Vehicle.StyleCargo Minivan) + β_18(Vehicle.StyleCargo Van) + β_19(Vehicle.StyleConvertible) + β_20(Vehicle.StyleConvertible SUV) + β_21(Vehicle.StyleCoupe) + β_22(Vehicle.StyleCrew Cab Pickup) + β_23(Vehicle.StyleExtended Cab Pickup) + β_24(Vehicle.StylePassenger Minivan) + β_25(Vehicle.StylePassenger Van) + β_26(Vehicle.StyleRegular Cab Pickup) + β_27(Vehicle.StyleSedan) + β_28(Vehicle.StyleWagon)

+

β_29(Year) 

 

Interpretation  

 

Engine Cylinders = 12

With all other factors held constant, when a vehicle has 12 cylinders, with respect to our reference of 3 cylinders, the multiplicative effect on the Predicted Median MSRP = e^1.73 (with a 95% confidence interval between e^1.50 and e^1.97). This translates to a predicted median vehicle price increase of $5.64 with a 95% confidence interval of an increase of $4.48 and $7.17).  

 

Engine Cylinders = 6

With all other factors held constant, when a vehicle has 6 cylinders, with respect to our reference of 3 cylinders, the multiplicative effect on the Predicted Median MSRP = e^.7598 (with a 95% confidence interval between e^.6582216 and e^0.86146459). This translates to a predicted median vehicle price increase of $2.14 with a 95% confidence interval of the increase being between $1.93 and $ 2.37).  

 

For the rest of the parameter interpretations please refer to Appendix 2.6 for the list of 

estimates and the 95% confidence intervals. 

 

An Example of predicted log(MSRP) given a car with a 12 Cylinder Engine, Regular Unleaded, Convertible (which is a real car from our dataset) can be seen below. 

 

Predicted (log(MSRP) | 12 Cylinder Engine, Regular Unleaded, Convertible) =  

1.73(12 Cylinder Engine) - .36 (Regular Unleaded) + .27 (Vehicle Style=Convertible) - 53

 

 

Objective 2: Multiple Linear Regression (Complex Model) 

With the goal of prediction and not interpretability, we fit a more complex multiple linear regression model to the car data mentioned above.  For the complete list of variables included in the fully saturated model, please see Appendix section 3.2. 

 

Data Prep, Variable selection Validation splits:  

With regards to handling missing data and outliers (imputed or otherwise), we implemented the same approach as Objective 1 (described above). The same is true with how we split the training, test, and validation data (80%, 10%, 10% respectively).  

 

Variable Selection (LASSO): 

 

Log MSRP 

Based on the analysis of the MSRP data in objective one, we continue to use log(MSRP) to conduct our analysis.

 

Fully Saturated Model 

With the goal of prediction in mind, we implemented a constrained optimization technique called LASSO regression (L1 optimization). We did this against a fully saturated (order 2) multiple linear regression model. That is, we performed LASSO optimized regression on a model including: 

·         all squared terms, 2-way interactions, and independent variable estimates 

·         all levels of categorical explanatory variables, numerical explanatory variables 

 

A complete list of all these features is in our “cleaned up” dataset and is described in Appendix 1.4.

 

Prior to running LASSO regression for our fully saturated model, we began with n=995 unique combinations of features. Again, this includes all levels of all variables in our cleaned-up dataset. The fully saturated model ends up looking something like: 

 

Please see Appendix section 3.3 for the complete list of non-zero coefficients.

 

LASSO 

In order to parse out the relevant estimates (those which contribute the least to a L1-Penalty based error metric) from our fully saturated list of 995 features we performed LASSO L1-regularization on our full model. First, we scaled and centered our data set. Then we picked an optimal Lambda value. This is described below. 

 

L1-Lambda  

In order to determine the best nominal L1-weight to use for LASSO’s constraint multiplier (Lambda) we ran LASSO regression with a grid of potential lambdas against our training split of data. (See Appendix 3.5)

 

Note: “Lambda Min” represents the value of Lambda with the lowest associated MSE where  “Lambda 1SE” represents the highest value of Lambda which produces an MSE within 1 Standard Error of Lambda Min. In our case Lasso produces, Lambda Min = .001 and Lambda 1SE = .0012589.  

The effect of increasing Lambda is that MSE increases for our training set.  Conversely, decreasing Lambda allows the coefficients of more features to diverge from zero and be considered significant by LASSO. The plots below illustrate these effects.  

 

               

 

In order to check for overfitting, that is, whether it made the most sense to use Lambda Min vs Lamda 1SE with our LASSO model; we ran LASSO with a grid of lambdas against our test set. The results of this can be seen in the plot below.

  

 

When validated against our test data, LASSO with Lambda Min continues to produce the lowest mean squared error.  This likely means that when we run LASSO using Lambda Min we are either, not overfitting badly, our test data is similar to our training data (due to large sample sizes and removal of outliers), or both. Further, the test MSE (for Log(MSRP)) with Lambda min was Test MSE = 0.02430239. By comparison, our train MSE (for Log(MSRP)) with Lambda min was: Train MSE = .023330728. As such we will use Lambda Min for our LASSO regression. 

 

When this is done, our Validation MSE for Log(MSRP) is Validation MSE = 0.02381303. Because our validation MSE is roughly the same as the Test and Train MSE’s, we can cautiously suggest that our complex model is not overfitting badly.  

 

When using LASSO penalization with our fully saturated model and Lambda Min = .001, we got a reduced list of influential regressor variables that is 241 elements long. Appendix 3.2.

 

Although we won’t dive into interpretation of our complex model’s residuals, For convenience they are visible in the plots below and in Appendix 3.1

   

Although we won’t dive into an interpretation of each of the 241 “significant” regressors in our complex Multiple Linear Regression model, it is worth noting that if such analysis were done, we would start with a reduced version of our 241 regressor model where we only use variables with an estimate from LASSO above 0.01. This filtering process yields a list of 31 regressors. Appendix 3.2 Such a model would look something like:  

   

 

Objective 3 Non-Parametric Model: KNN 

 

Intuition 

K-Nearest Neighbor or K-NN is an algorithm that is used to classify or regress data.  For regression, the intuition behind KNN is that nearest neighbors are used to predict response variables such as Log(MSRP). The points are deemed "nearest” are those that are considered to have the closest Euclidean distance from the input we are attempting to regress. If k = 3, this would be the 3 nearest points.  Both K-NN and Linear Regression can be used to solve problems where the output needed is a continuous variable.  

  

Scaling and centering is important since K-NN uses Euclidian distance to establish the closest points on the plot.  If the data is not on the same scale, the nearest neighbors could be closer or further away than they should.  

Evaluation of a K-NN model depends on the problem.  K-NN models can easily overfit. 

 

 

 

Variable Selection 

Data Cleanup and removal of outliers the same way we did in Objective 1 (see above for more) 

Further, we only used continuous explanatory variables in our KNN model. That is, no categorical variables were considered as part of any distance metric when performing KNN.  

Next all continuous explanatory variables which were used were scaled and centered.  

To select the best subset of variables for use in our KNN model we ran a random forest variable importance selection. The output of that selection was:  

  As can be seen, all continuous variables came back with relatively similar variable importance (besides Engine.HP.clean which is more important that the others).  

 

Model:  

For the same reasons as noted in Objective 1, we are used Log(MSRP) for our response.  

As such the Model we use in KNN ends up consisting of  

Response 
Explanatory Variables 
Log(MSRP) 
Engine.HP.Clean, city.mpg, highway.MPG, Popularity, & Year 
 

Selection of Best K 

In order to select the best value of K to use in our KNN model, we ran an in iterative set of KNN fits, iterating on the value of K for our test dataset. The associated MSE values for each K are visible below for the Log(MSRP).  

  

We ended up using K=2 because which had the lowest Test MSE of all values of K. Using K=2, we got a of TestMSE = 0.005.  The associated Test R-Squared values for the test set was 0.97055.  Validation MSE was 0.006.  

 

Interpreting the output of KNN 

Our KNN-2 model produced high R-Squared values and had small MSE for Log(MSRP) for both test and validation sets. This suggests that we have a good fit with KNN-2 and It is possible that we may be suffering from a small amount of overfitting as our validation MSE was very slightly higher. 

  

 

Comparison of Model Results 

 

Model 
Test MSE 
Validation MSE 
1 - Simple Multiple Linear Regression 
0.0325 
0.0329 
2 - Complex Multiple Linear Regression - LASSO 
0.023 
0.024 
3 - Non-Parametric: KNN-2 
0.005 
0.006 
 

Our goal for model 1 (simple multiple linear regression) was interpretability. To make a comparison between model 1, model 2 (Complex MLR) and model 3 (Non-parametric KNN-2) which were strictly implemented and evaluated for the purpose of prediction, we must only look at Prediction metrics such as residuals, test and validation MSE metrics 

When comparing the test and Validation MSE values for Model 1, Model 2 and Model 3, we see that the Non-parametric KNN-2 Model performs the best both on the test and validation split. It appears to outperform the parametric models with validation error at roughly 25% of our Complex model.  

 

When comparing our complex model (model 2) to our Simple Multiple Linear Regression model (model 1) we see that the complex model performs with lower Test and Validation MSE.  

 

If we had to choose a model, for the purposes of prediction, based on the results we have thus far, the KNN-2 model would be our choice.  

It is possible that KNN-2 is mapping relationships between features in our data which are highly non-linear and that may be why it seems to perform the best.  

 

Summary  

 

Scope of Inference: 

This is an observational study whose findings can be applied to the population of vehicles that were studied, we cannot extend this interpretation to the general population because we cannot verify the collection method and population subset from which this data was collected.  As a result, we cannot make any causal inferences.

 

Objective 1:  

Our interpretable model revealed that it was possible to predict vehicle price with a simple combination of the data attributes.  Our model’s results feel intuitive, suggesting that a vehicle’s style and whether is powered by regular or unleaded fuel in addition to its number of cylinders impact the price of a car.

 

Objective 2:  

Our Complex Linear Regression model highlights that for log transformed MSRP there is a linear relationship between MSRP (log transformed) and other explanatory variables in our model. It also highlights the fact that when we carefully include more variables from a fully saturated quadratic model, in final MLR model (by using LASSO regularization) we get improved prediction error and an improved overall prediction for Log-MSRP.  

Our Non-Parametric model which uses a KNN-2 algorithm on Log-MSRP, we  

 

Problems / Concerns:  

There are some concerns with the quality of the data pre-2001, these values, if considered would confound the true price of the vehicle.  

 

We observed in the fully saturated model that Popularity seems to be a valuable attribute when considered as an interaction term; however, the lack of domain knowledge regarding its source , scale caused by its collection method complicates this analysis.  Ideally, this data would be collected along with the name of its source to better segment and interpret the values.

 

If we had more time, we would tidy the various marketing categories concatenated in varying sequence in a single field, to further understand this feature and its impact to both vehicle price and popularity.

 

We would also take more care to assign a specific reference in our models, since the system auto-selected the first one, which in some cases were edge-case levels.

 



Appendix

 

1.0  Data Dictionary 

Variable Name 
Data Type 
Description 
MSRP 
Numeric 
The response variable 
Car Make 
Factor 
The company that made the car. Ex: Honda, Toyota, etc. 
Car Model 
Factor 
The model of the car. Ex: 4Runner, Accord, etc. 
Year 
Numeric 
Year the car was produced 
Engine Fuel Type 
Factor 
Type of fuel the car accepts.  Ex: Regular unleaded, Premium unleaded, Diesel 
Engine HP 
Numeric 
Horsepower of the car’s engine. 
Engine Cylinders 
Numeric 
Number of cylinders in the car’s engine. 
Transmission Type 
Factor 
Type of transmission in the car. Usually manual or automatic, but there are a few specialty transmission types in the data. 
Driven_Wheels 
Numeric 
The wheels that are powered by the engine.  Ex: Front Wheel, Rear Wheel, Four Wheel Drive 
Number of Doors 
Numeric 
The number of doors that the car has. Usually 2 or 4 
Market Category 
Factor 
Various special factors for each car.  Ex: Exotic, Luxury, High-Performance, Flex Fuel. Note: we created a new feature using Exotic/Not Exotic for our analysis 
Vehicle Size 
Factor 
The size of the vehicle.  Ex: Midsize, Large, Compact 
Vehicle Style 
Factor 
Body type of the vehicle. Ex: Coupe, Convertible, etc. 
Highway MPG 
Numeric 
Fuel efficiency on the highway in MPG 
City MPG 
Numeric 
Fuel efficiency in the city in MPG 
Popularity 
Numeric 
A popularity score for each car.  The dataset does not detail how the popularity score is calculated. 
 

1.1  Unique values by attribute 

   

1.2  Count of N/As by column 

   

 

1.3  MSRP by Year 

   

1.4 DISTRIBUTION OF DATA BY CATEGORY 

    

 

1.5 EDA: MSRP evaluation 
 

 

1.6 Outlier analysis 

   

1.7 EDA: Popularity evaluation 

   

 

 

1.8 Summary of Data Before and After Imputing (Data Prep Continued) 

   

   

  

2.0 MSRP ~ ALL MODEL 

   

   

2.1 MSRP ~ ALL MODEL STATISTICALLY SIGNIFICANT 

   

 

 

 

 

2.2 Verify assumptions 

      

 

 

2.3 Correlation Plot 

   

 

2.4 MSRP ~ . MODEL- VIF DATA 

   

 

 

2.5 Lasso Regression All Data ~ Logged MSRP 

   

2.6 OBJECTIVE ONE ~ INTERPRETABLE MODEL 

   

   

 

     

2.7 INTERPRETABLE MODEL - VIF   

 



 

2.8 INTERPRETABLE MODEL – Diagnostic 

                              

 

3.0  Fully Saturated Lasso Model top regressors 

 

Engine.Cylinders.Clean12 
Vehicle.Size*Vehicle.Style 
Engine.Fuel.Type*Vehicle.Style 
Engine.Fuel.Type*Engine.Cylinders 
Vehicle.StyleConvertible 
Engine.Fuel.Type*Driven_Wheels 
Vehicle.Style*Engine.Cylinders 
Driven_Wheels*Vehicle.Style 
Vehicle.Size*Engine.Cylinders 
Vehicle.StyleCoupe 
Vehicle.Size*Engine.Fuel.Type 
Vehicle.Style4dr Hatchback 
Engine.Fuel.Type*Num.Doors.Clean 
Driven_Wheels*Engine.Cylinders 
Driven_Wheelsfront wheel drive 
 


 

3.1 Fully Saturated Lasso Model ~ coefficients .01 

   

3.2 Fully Saturated Lasso Model ~ coefficient plot   

  

3.3 Fully Saturated Lasso Model ~ All non-zero coefficients 

variable 
estimate 
(Intercept) 
-4.391955097 
Year 
0.007034003 
Engine.Fuel.Typeregular unleaded 
-0.017248917 
Transmission.TypeAUTOMATIC 
-0.03523848 
Transmission.TypeDIRECT_DRIVE 
-0.028531094 
Driven_Wheelsfront wheel drive 
-0.101740197 
Vehicle.Style4dr Hatchback 
-0.13230092 
Vehicle.StyleConvertible 
0.197868249 
Vehicle.StyleCoupe 
0.141130154 
Vehicle.StyleRegular Cab Pickup 
-0.074362395 
highway.MPG 
-2.20E-05 
Engine.HP.Clean 
1.22E-06 
Engine.Cylinders.Clean4 
-0.00058933 
Engine.Cylinders.Clean8 
0.019816785 
Engine.Cylinders.Clean10 
0.082140387 
Engine.Cylinders.Clean12 
0.589598683 
Vehicle.SizeLarge:Engine.Fuel.Typediesel 
0.031490198 
Vehicle.SizeLarge:Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85) 
0.028346107 
Vehicle.SizeLarge:Engine.Fuel.Typeflex-fuel (premium unleaded required/E85) 
0.1385937 
Vehicle.SizeMidsize:Engine.Fuel.Typeflex-fuel (unleaded/E85) 
-0.017922758 
Vehicle.SizeLarge:Engine.Fuel.Typeflex-fuel (unleaded/natural gas) 
-7.61E-05 
Vehicle.SizeLarge:Engine.Fuel.Typepremium unleaded (recommended) 
-0.003444155 
Vehicle.SizeMidsize:Engine.Fuel.Typepremium unleaded (recommended) 
-0.017937987 
Vehicle.SizeLarge:Engine.Fuel.Typepremium unleaded (required) 
0.048742274 
Vehicle.SizeMidsize:Engine.Fuel.Typepremium unleaded (required) 
0.003744238 
Vehicle.SizeMidsize:Engine.Fuel.Typeregular unleaded 
0.011715113 
Vehicle.SizeMidsize:Transmission.TypeAUTOMATIC 
0.052446078 
Vehicle.SizeMidsize:Transmission.TypeMANUAL 
-0.000967092 
Vehicle.SizeLarge:Driven_Wheelsfront wheel drive 
0.01137959 
Vehicle.SizeMidsize:Driven_Wheelsfront wheel drive 
0.013494697 
Vehicle.SizeLarge:Driven_Wheelsrear wheel drive 
-0.00152986 
Vehicle.SizeMidsize:Driven_Wheelsrear wheel drive 
-0.046803993 
Vehicle.SizeLarge:Vehicle.Style4dr Hatchback 
0.174908079 
Vehicle.SizeMidsize:Vehicle.Style4dr Hatchback 
0.109412639 
Vehicle.SizeMidsize:Vehicle.Style4dr SUV 
-0.05397705 
Vehicle.SizeMidsize:Vehicle.StyleCargo Van 
0.013103948 
Vehicle.SizeMidsize:Vehicle.StyleConvertible 
-0.004648629 
Vehicle.SizeLarge:Vehicle.StyleCoupe 
-0.332620984 
Vehicle.SizeLarge:Vehicle.StyleCrew Cab Pickup 
0.019977817 
Vehicle.SizeMidsize:Vehicle.StylePassenger Minivan 
0.044532795 
Vehicle.SizeMidsize:Vehicle.StylePassenger Van 
4.85E-05 
Vehicle.SizeLarge:Vehicle.StyleRegular Cab Pickup 
-0.069566466 
Vehicle.SizeLarge:Vehicle.StyleSedan 
0.111831489 
Vehicle.SizeMidsize:Vehicle.StyleSedan 
0.051343185 
Vehicle.SizeMidsize:Vehicle.StyleWagon 
0.191621609 
Vehicle.SizeMidsize:Popularity 
-2.60E-06 
Vehicle.SizeLarge:Engine.Cylinders.Clean4 
0.156015657 
Vehicle.SizeLarge:Engine.Cylinders.Clean6 
0.100430204 
Vehicle.SizeLarge:Engine.Cylinders.Clean8 
-0.015870005 
Vehicle.SizeMidsize:Engine.Cylinders.Clean8 
-0.15673966 
Vehicle.SizeLarge:Engine.Cylinders.Clean12 
0.002637239 
Vehicle.SizeMidsize:NumDoors.Clean3 
0.041972951 
Year:Engine.Fuel.Typeflex-fuel (unleaded/natural gas) 
-1.20E-05 
Year:Engine.Fuel.Typeregular unleaded 
-2.03E-05 
Year:Transmission.TypeAUTOMATIC 
-1.44E-07 
Year:Transmission.TypeDIRECT_DRIVE 
-9.35E-08 
Year:Vehicle.StyleConvertible 
7.10E-09 
Year:Vehicle.StyleRegular Cab Pickup 
-1.46E-07 
Year:Engine.HP.Clean 
7.19E-07 
Year:Engine.Cylinders.Clean4 
-4.55E-09 
Year:Engine.Cylinders.Clean8 
4.56E-05 
Year:Engine.Cylinders.Clean10 
2.60E-07 
Year:Engine.Cylinders.Clean12 
7.20E-07 
Engine.Fuel.Typediesel:Transmission.TypeAUTOMATIC 
0.039828587 
Engine.Fuel.Typepremium unleaded (required):Transmission.TypeAUTOMATIC 
-0.001259095 
Engine.Fuel.Typeregular unleaded:Transmission.TypeDIRECT_DRIVE 
-6.07E-05 
Engine.Fuel.Typediesel:Transmission.TypeMANUAL 
0.022437667 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Transmission.TypeMANUAL 
-0.054985434 
Engine.Fuel.Typeregular unleaded:Transmission.TypeMANUAL 
-0.038744363 
Engine.Fuel.Typediesel:Driven_Wheelsfour wheel drive 
0.189985124 
Engine.Fuel.Typepremium unleaded (recommended):Driven_Wheelsfour wheel drive 
0.105173795 
Engine.Fuel.Typepremium unleaded (required):Driven_Wheelsfour wheel drive 
0.127305494 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Driven_Wheelsfront wheel drive 
-0.096557854 
Engine.Fuel.Typepremium unleaded (required):Driven_Wheelsfront wheel drive 
-0.089658617 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Driven_Wheelsrear wheel drive 
0.024777979 
Engine.Fuel.Typepremium unleaded (recommended):Driven_Wheelsrear wheel drive 
-0.015403419 
Engine.Fuel.Typepremium unleaded (required):Driven_Wheelsrear wheel drive 
0.021555128 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.Style2dr SUV 
0.271463029 
Engine.Fuel.Typediesel:Vehicle.Style4dr Hatchback 
-0.059664164 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.Style4dr Hatchback 
-0.00854634 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.Style4dr Hatchback 
-0.024783232 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.Style4dr SUV 
0.005693584 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.Style4dr SUV 
0.006324302 
Engine.Fuel.Typepremium unleaded (required):Vehicle.Style4dr SUV 
-0.00407387 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.StyleCargo Minivan 
-0.080691453 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.StyleCargo Van 
0.01747816 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleCargo Van 
-0.078228445 
Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85):Vehicle.StyleConvertible 
0.056495771 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.StyleConvertible 
0.01410838 
Engine.Fuel.Typepremium unleaded (required):Vehicle.StyleConvertible 
0.04810462 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleConvertible 
0.001892967 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.StyleConvertible SUV 
0.260394756 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleConvertible SUV 
-0.017341322 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.StyleCoupe 
0.051392862 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleCoupe 
0.013035414 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.StyleExtended Cab Pickup 
-0.019611536 
Engine.Fuel.Typeregular unleaded:Vehicle.StylePassenger Minivan 
0.004378092 
Engine.Fuel.Typepremium unleaded (recommended):Vehicle.StylePassenger Van 
0.016606774 
Engine.Fuel.Typediesel:Vehicle.StyleSedan 
0.039838189 
Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85):Vehicle.StyleSedan 
-0.025273458 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Vehicle.StyleSedan 
-0.037999824 
Engine.Fuel.Typeflex-fuel (unleaded/natural gas):Vehicle.StyleSedan 
-6.79E-06 
Engine.Fuel.Typepremium unleaded (required):Vehicle.StyleSedan 
0.032229565 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleSedan 
-0.040765995 
Engine.Fuel.Typepremium unleaded (required):Vehicle.StyleWagon 
0.039258433 
Engine.Fuel.Typeregular unleaded:Vehicle.StyleWagon 
-0.00294583 
Engine.Fuel.Typeflex-fuel (premium unleaded required/E85):highway.MPG 
0.004704078 
Engine.Fuel.Typeregular unleaded:highway.MPG 
-0.00331805 
Engine.Fuel.Typeflex-fuel (premium unleaded required/E85):city.mpg 
9.93E-05 
Engine.Fuel.Typeflex-fuel (unleaded/natural gas):city.mpg 
-5.65E-07 
Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85):Popularity 
7.43E-06 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Popularity 
-1.93E-06 
Engine.Fuel.Typepremium unleaded (recommended):Popularity 
-2.75E-06 
Engine.Fuel.Typepremium unleaded (required):Popularity 
2.13E-05 
Engine.Fuel.Typediesel:Engine.HP.Clean 
0.000196872 
Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85):Engine.HP.Clean 
0.000702246 
Engine.Fuel.Typeregular unleaded:Engine.HP.Clean 
0.000412489 
Engine.Fuel.Typepremium unleaded (required):Engine.Cylinders.Clean4 
-0.001395946 
Engine.Fuel.Typeregular unleaded:Engine.Cylinders.Clean4 
-0.032267514 
Engine.Fuel.Typepremium unleaded (recommended):Engine.Cylinders.Clean5 
0.003588853 
Engine.Fuel.Typeregular unleaded:Engine.Cylinders.Clean5 
-0.021537774 
Engine.Fuel.Typediesel:Engine.Cylinders.Clean6 
0.178642073 
Engine.Fuel.Typeflex-fuel (premium unleaded required/E85):Engine.Cylinders.Clean6 
0.253027269 
Engine.Fuel.Typeflex-fuel (unleaded/E85):Engine.Cylinders.Clean6 
-0.014610055 
Engine.Fuel.Typepremium unleaded (recommended):Engine.Cylinders.Clean6 
6.52E-05 
Engine.Fuel.Typepremium unleaded (required):Engine.Cylinders.Clean6 
0.119436877 
Engine.Fuel.Typeflex-fuel (premium unleaded recommended/E85):Engine.Cylinders.Clean8 
0.002297051 
Engine.Fuel.Typepremium unleaded (recommended):Engine.Cylinders.Clean8 
-0.017290585 
Engine.Fuel.Typepremium unleaded (required):Engine.Cylinders.Clean8 
0.083114088 
Engine.Fuel.Typepremium unleaded (required):Engine.Cylinders.Clean10 
0.00292684 
Engine.Fuel.Typepremium unleaded (recommended):Engine.Cylinders.Clean12 
0.077320107 
Engine.Fuel.Typeregular unleaded:NumDoors.Clean3 
-0.002593543 
Engine.Fuel.Typediesel:NumDoors.Clean4 
0.114092134 
Engine.Fuel.Typepremium unleaded (recommended):NumDoors.Clean4 
0.042619385 
Engine.Fuel.Typeregular unleaded:NumDoors.Clean4 
-0.012719156 
Transmission.TypeDIRECT_DRIVE:Driven_Wheelsfront wheel drive 
-0.001666998 
Transmission.TypeAUTOMATIC:Driven_Wheelsrear wheel drive 
-0.002198988 
Transmission.TypeMANUAL:Driven_Wheelsrear wheel drive 
-0.021070564 
Transmission.TypeAUTOMATIC:Vehicle.Style2dr SUV 
0.039999161 
Transmission.TypeAUTOMATIC:Vehicle.Style4dr Hatchback 
-0.015881149 
Transmission.TypeMANUAL:Vehicle.Style4dr SUV 
-0.039465504 
Transmission.TypeAUTOMATIC:Vehicle.StyleConvertible 
0.026283666 
Transmission.TypeAUTOMATIC:Vehicle.StyleCoupe 
-0.024040832 
Transmission.TypeMANUAL:Vehicle.StyleSedan 
-0.011850305 
Transmission.TypeMANUAL:city.mpg 
-0.003901099 
Transmission.TypeAUTOMATIC:Popularity 
-1.91E-06 
Transmission.TypeMANUAL:Popularity 
-8.61E-06 
Transmission.TypeDIRECT_DRIVE:Engine.HP.Clean 
-7.66E-08 
Transmission.TypeDIRECT_DRIVE:Engine.Cylinders.Clean4 
-2.89E-14 
Transmission.TypeMANUAL:Engine.Cylinders.Clean6 
0.06456897 
Transmission.TypeMANUAL:Engine.Cylinders.Clean8 
-0.043923233 
Transmission.TypeAUTOMATIC:Engine.Cylinders.Clean12 
0.000236228 
Transmission.TypeDIRECT_DRIVE:NumDoors.Clean4 
-7.75E-05 
Transmission.TypeMANUAL:NumDoors.Clean4 
-0.008489677 
Driven_Wheelsrear wheel drive:Vehicle.Style4dr Hatchback 
0.040523716 
Driven_Wheelsfour wheel drive:Vehicle.Style4dr SUV 
-0.011677578 
Driven_Wheelsfront wheel drive:Vehicle.Style4dr SUV 
0.031662719 
Driven_Wheelsrear wheel drive:Vehicle.Style4dr SUV 
0.013880531 
Driven_Wheelsrear wheel drive:Vehicle.StyleCargo Minivan 
-0.01348054 
Driven_Wheelsrear wheel drive:Vehicle.StyleConvertible SUV 
-0.03497651 
Driven_Wheelsfront wheel drive:Vehicle.StyleCoupe 
-0.159816927 
Driven_Wheelsfour wheel drive:Vehicle.StyleCrew Cab Pickup 
0.003017614 
Driven_Wheelsrear wheel drive:Vehicle.StyleExtended Cab Pickup 
-0.040381992 
Driven_Wheelsfront wheel drive:Vehicle.StylePassenger Minivan 
-2.51E-05 
Driven_Wheelsrear wheel drive:Vehicle.StylePassenger Van 
0.023883448 
Driven_Wheelsrear wheel drive:Vehicle.StyleRegular Cab Pickup 
-0.075421619 
Driven_Wheelsfour wheel drive:Vehicle.StyleSedan 
-0.114895884 
Driven_Wheelsrear wheel drive:Vehicle.StyleSedan 
0.028627411 
Driven_Wheelsfront wheel drive:Vehicle.StyleWagon 
-0.011529504 
Driven_Wheelsfour wheel drive:Popularity 
1.17E-06 
Driven_Wheelsfront wheel drive:Popularity 
-4.63E-06 
Driven_Wheelsrear wheel drive:Engine.HP.Clean 
-0.000255614 
Driven_Wheelsrear wheel drive:Engine.Cylinders.Clean4 
0.003948212 
Driven_Wheelsrear wheel drive:Engine.Cylinders.Clean5 
-0.011922296 
Driven_Wheelsfour wheel drive:Engine.Cylinders.Clean6 
0.0092135 
Driven_Wheelsfour wheel drive:Engine.Cylinders.Clean8 
-0.003578737 
Driven_Wheelsfront wheel drive:Engine.Cylinders.Clean8 
0.109770077 
Driven_Wheelsrear wheel drive:Engine.Cylinders.Clean10 
0.000171279 
Vehicle.Style2dr SUV:city.mpg 
0.001501678 
Vehicle.Style4dr Hatchback:city.mpg 
0.003583087 
Vehicle.Style4dr SUV:Popularity 
-7.51E-06 
Vehicle.StyleCargo Van:Popularity 
7.33E-06 
Vehicle.StyleConvertible:Popularity 
-1.03E-05 
Vehicle.StyleCoupe:Popularity 
-2.04E-05 
Vehicle.StyleCrew Cab Pickup:Popularity 
3.79E-06 
Vehicle.StylePassenger Minivan:Popularity 
7.20E-06 
Vehicle.StylePassenger Van:Popularity 
2.63E-05 
Vehicle.StyleRegular Cab Pickup:Popularity 
8.86E-06 
Vehicle.StyleSedan:Popularity 
-5.59E-07 
Vehicle.Style4dr SUV:Engine.HP.Clean 
0.000515478 
Vehicle.StyleExtended Cab Pickup:Engine.HP.Clean 
-0.000117068 
Vehicle.Style2dr SUV:Engine.Cylinders.Clean4 
0.000941664 
Vehicle.StyleCargo Minivan:Engine.Cylinders.Clean4 
0.055933727 
Vehicle.StyleCrew Cab Pickup:Engine.Cylinders.Clean4 
0.006886366 
Vehicle.StyleExtended Cab Pickup:Engine.Cylinders.Clean4 
0.062608927 
Vehicle.StyleRegular Cab Pickup:Engine.Cylinders.Clean4 
-0.031348509 
Vehicle.StyleSedan:Engine.Cylinders.Clean4 
-0.02103876 
Vehicle.StyleWagon:Engine.Cylinders.Clean4 
-0.014954989 
Vehicle.Style4dr Hatchback:Engine.Cylinders.Clean5 
-0.038141278 
Vehicle.Style4dr SUV:Engine.Cylinders.Clean5 
0.090581719 
Vehicle.StyleConvertible:Engine.Cylinders.Clean5 
0.044086614 
Vehicle.StyleCoupe:Engine.Cylinders.Clean5 
0.131254035 
Vehicle.StyleSedan:Engine.Cylinders.Clean5 
0.027419564 
Vehicle.Style4dr SUV:Engine.Cylinders.Clean6 
-0.005421391 
Vehicle.StyleCargo Minivan:Engine.Cylinders.Clean6 
-0.04736555 
Vehicle.StyleConvertible:Engine.Cylinders.Clean6 
0.014303536 
Vehicle.StyleCoupe:Engine.Cylinders.Clean6 
0.012348008 
Vehicle.StyleCrew Cab Pickup:Engine.Cylinders.Clean6 
-0.001338878 
Vehicle.StyleWagon:Engine.Cylinders.Clean6 
0.000403872 
Vehicle.Style4dr SUV:Engine.Cylinders.Clean8 
0.125927924 
Vehicle.StyleConvertible:Engine.Cylinders.Clean8 
0.15236921 
Vehicle.StyleCoupe:Engine.Cylinders.Clean8 
0.180841382 
Vehicle.StyleExtended Cab Pickup:Engine.Cylinders.Clean8 
-0.068374916 
Vehicle.StylePassenger Van:Engine.Cylinders.Clean8 
0.074301007 
Vehicle.StyleRegular Cab Pickup:Engine.Cylinders.Clean8 
-0.074643689 
Vehicle.StyleSedan:Engine.Cylinders.Clean8 
0.096801315 
Vehicle.StyleWagon:Engine.Cylinders.Clean8 
-0.053073987 
Vehicle.StyleCoupe:Engine.Cylinders.Clean10 
0.000175093 
Vehicle.StyleSedan:Engine.Cylinders.Clean12 
0.007278907 
Vehicle.StyleExtended Cab Pickup:NumDoors.Clean3 
-0.028840371 
Vehicle.Style4dr Hatchback:NumDoors.Clean4 
-0.003934486 
Vehicle.StyleCoupe:NumDoors.Clean4 
-0.029064494 
Vehicle.StylePassenger Minivan:NumDoors.Clean4 
0.066284267 
highway.MPG:Engine.HP.Clean 
6.74E-06 
highway.MPG:Engine.Cylinders.Clean10 
2.14E-06 
highway.MPG:Engine.Cylinders.Clean12 
3.56E-05 
city.mpg:Engine.HP.Clean 
5.39E-05 
city.mpg:Engine.Cylinders.Clean8 
0.000196987 
city.mpg:Engine.Cylinders.Clean10 
0.015688277 
city.mpg:Engine.Cylinders.Clean12 
0.000405855 
Popularity:Engine.HP.Clean 
-9.12E-09 
Popularity:Engine.Cylinders.Clean4 
7.57E-06 
Popularity:Engine.Cylinders.Clean8 
-3.95E-06 
Popularity:Engine.Cylinders.Clean10 
2.65E-06 
Popularity:Engine.Cylinders.Clean12 
2.12E-06 
Engine.HP.Clean:Engine.Cylinders.Clean4 
0.000120421 
Engine.HP.Clean:Engine.Cylinders.Clean8 
-6.03E-06 
Engine.HP.Clean:Engine.Cylinders.Clean10 
4.34E-08 
Engine.Cylinders.Clean4:NumDoors.Clean3 
-0.033998496 
Engine.Cylinders.Clean5:NumDoors.Clean4 
0.078560395 
Engine.Cylinders.Clean12:NumDoors.Clean4 
0.000109866 
 

 


 

3.4 Fully Saturated Lasso Model ~ model diagnostics 

   

3.5 Find best Lamda 

   

 

3.6 Model Statistics for Fully Saturated Lasso Model 

     

 

 

3.7 Random Forest Variable Importance 

 

randomForest(log(MSRP)~., data=myDataKNN_Scaled) 

   

 

4.0 KNN Optimizer 

 

   

4.1 KNN Plot 

   

 
 

5.0 R-Code

 

More products