Starting from:

$30

CSC487 Data Mining Homework 1 -Solved


11. Use Su_raw_matrix.txt for the following questions (
(a) Use read.delim function to read Su_raw_matrix.txt into a variable called su. (Notice 
that su has become a data frame 

(b) Use mean and sd functions to find mean and standard deviation of Liver_2.CEL column. 

(c) Use colMeans and colSums functions to get the average and total values of each column. 
cm_cs <- function(df) 


22. Use rnorm(n, mean = 0, sd = 1) function in R to generate 10000 numbers 
for the following (mean, sigma) pairs and plot histogram for each, meaning you 
need to change the function parameter accordingly. Then comment on how these 
histograms are different from each other and state the reason. (20 points). 
(
3(*) Compare and Contrast 
We can clearly see that p_2_a has a much tighter distribution that p_2_b. 
Note: This is because , σ = 0.2, has a smaller standard deviation than p_2_b, σ = 0.5. 
We also can see both samples have sample mean around 0 as they were drawn from a random 
normal distribution with poulation mean 0. 
43. Perform the steps below with ”dat” dataframe which is just a sample data for 
you to observe how each plot function (3b through 3e) works. Notice that you 
need to have ggplot2 library installed on your system. Please refer slides how 
to install and import a library. Installation is done only once, but you need to 
import the library every time you need it by saying library(ggplot2). Then Run 
the following commands and observe how the plots are generated. (40 points). 
(a) Data generation 

(b) Overlaid histograms 

(c) Interleaved histograms 

5(d) Density plots 

(e) Density plots w/ semi-transparent fill 

(f) sing diabetes_train.csv
doints). 

More products