Starting from:

$30

CPSC375-Project 2 Solved

# install.packages("sparklyr")

# library(sparklyr)

# spark_install()

 

library(tidyverse) library(sparklyr)

 

setwd("c:/temp/cpsc375proj2/")

 

# mylocaldata <- read_csv ("http://staff.pubhealth.ku.dk/~tag/Teaching/share/data/Bodyfat.csv") mylocaldata <- read_csv("Bodyfat.csv")

 

sc <- spark_connect(master = "local") myremotedata <- copy_to(sc, mylocaldata, overwrite = TRUE) # our group's formula for the best model that describes bodyfat

# bodyfat ~ Wrist + log(Abdomen) + Weight^2

# unfortunately, ml_linear_regression has a problem with the log and exponent functions \

# so we have to simplify the formula mymodel <- ml_linear_regression(x=myremotedata , formula = bodyfat~Wrist+Abdomen+Weight)

 

summary(mymodel)

 

spark_web(sc)

 

 

 

             

1b. Output of summary(model): summary(mymodel) Deviance Residuals:

     Min       1Q   Median       3Q      Max 

-13.0803  -3.2463  -0.2175   3.2472   9.8018 

 

Coefficients:

(Intercept)       Wrist     Abdomen      Weight 

-27.9299169  -1.2448589   0.9751296  -0.1144609 

 

R-Squared: 0.7277

Root Mean Squared Error: 4.358

 

 

             


1c. A screen capture image of the running Apache Spark web UI

More products