Starting from:

$30

STAT240 Lab 1 Solved

Problem 1: Random variables review
A random variable X is a real number with a value that depends on a random event. For example, X may be the measurement from a sensor, or the outcome of a die roll, or a property of an item chosen at random from a population of items. The probability that a random variable X is less than or equal to a real number x is denoted PrpX ď xq. The function that maps x to PrpX ď xq is known as the cumulative distribution function (cdf) of X (this function may be denoted fX). The cdf has many properties, for example lim PrpX ď xq “ 1. In some situations, and under some xÑ8 assumptions, random variables often have a cdf that is of a particularly well studied form. These forms are called laws, and we may say for example ‘X is distributed according to the law Exppλq’ or ‘X „ Exppλq’ to specify that a random variable is exponentially distributed with rate λ ą 0.

Given the cdf of a random variable, we may ask how the value of the random variable concentrates around a particular real number x. The expression PrpX ą x and X ď x ` dxq gives the probability that X lies in the interval px,x ` dxs. By the laws of probability, this expression is equal to fXpx ` dxq ´ fXpxq. By taking the limit as dx goes to zero, and scaling by 1{dx, we find the concentration of probability around x: d

lim pfXpx ` dxq ´ fXpxqq{dx “  fXpxq. This derivative (if it exists) is dxÑ0                    dx known as the probability density function (pdf) of the random variable X.

a)    Suppose X is a random variable with pdf proportional to e´λx if x is positive and 0 otherwise. What is the pdf of X? (i.e., what is the constant of proportionality?) What is the cdf of X?

b) Suppose X is a random variable with pdf p. The mean of X is:

        ErXs “ ż                                               (1)

´8
what is the mean of the random variable given in part a)?

Suppose x P R. For the random variable given in part a), what is PrpX “ xq?

d) Prove that the pdf of a random variable X is non-negative (provided that the pdf exists).

Problem 2: Review of R
Consider the following for loops in R. For each for loop, list the values (in order) that the variable i takes on in the body of the loop. Briefly (in no more than a few sentences) explain why.

for(i in 1+2:3.4*5) { }
b)for(i in dim(matrix(0, nr = 7, nc = 8))) { }for(i in rnorm(3)) { 
d)for(i in iris[1:3,3]) { }
 e)for(j in c(1, 2, 3, 4, 5))

 f)for(i in (function(x) x*x)(c(1, 2, 3))) { }
g)for(i in NULL) { 
for(i in strsplit(as.character(4*atan(1)),’’)
[[1]][1:10]) { }

Problem 3: Using knitr
There are several ways to interleave R code and the output of R code (including plots) into a pdf. The R package knitr is one such way, and you may find it useful for creating reports and doing subsequent assignments. The way knitr works is by using a style of coding and creating documents called ‘markdown’. In RStudio, install knitr using the command install.packages(’knitr’). Then, create a new R markdown document by selecting ‘File — New File — R Markdown’ from the menus. You’ll be prompted to give a name to the new R markdown file and select if you want it to output to pdf or html. Choose pdf. The basic format of an R markdown file is as follows (and your document may be populated with an example text such as this):

--title: "Untitled" output: pdf_document

‘‘‘{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS

Word documents. For more details on using R

Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: ‘‘‘{r cars} summary(cars) ‘‘‘

## Including Plots

You can also embed plots, for example:

‘‘‘{r pressure, echo=FALSE} plot(pressure) ‘‘‘

Click the ‘knit’ on the toolbar of RStudio’s editor to render the markdown as a pdf. The document should pop up in a preview window. The pdf will also be saved in the same directory that your new R markdown file is saved in. The element ‘##’ specifies a section title, the element ‘title: "Untitled"’ specifies the document title, the elements ‘‘‘‘{r ...} ... ‘‘‘’ specify R code that is to be executed. The element echo=FALSE indicates that the R code should not be emitted in the file (and instead, only the results of the code should be emitted). Markdown allows you to specify bolding, hyperlinks, bullets and other text aspects through annotations such as **Knit** for a bold ‘Knit’ (the asterisks indicate the bolding). An overview of the options for formatting and running code in R markdown is available here: https://www.rstudio.com/wp-content/uploads/ 2015/02/rmarkdown-cheatsheet.pdf.

a) The University of California at Irvine provides a repository of datasets that are popular for demonstrations of machine learning and statistical methodology. Choose one of their datasets from this site: https:// archive.ics.uci.edu/ml/datasets.php, and download it. The downloads usually include two files: one ending in .data which can be loaded by R using the command read.table and one ending in .names containing a detailed description of the dataset.

In an R markdown, provide a short summary of the dataset What

is the dataset about? When was it collected? How many items are in the dataset? How many variables are provided? Broadly, what types of variables are there, and broadly, what are their units? (For example, if there are thousands of variables all with the same units indicating measurements at different times, you can just say what the measurement is, what the units are and what the times are: you don’t have to list each individual variable.) This summary should be no more than half a page.

Choose one of the variables and plot a histogram of that variable. Ensure that the x-axis is labelled correctly, with units. Make the histogram so that its y-axis is ‘proportion’ and not ‘count’ (i.e., the sum of the areas of the histogram rectangles should equal 1). Superimpose on top of the histogram a plot of the pdf of a normal distribution (a.k.a. Gaussian distribution, or bell-curve) with mean and variance given by the empirical mean and variance of the variable. (For example, if you’ve chosen the 5th variable of the dataset and the dataset is loaded into R as the variable df, then the empirical mean is mean(df[, 5]) and the empirical variance is var(df[, 5]).) Provide a single pdf including the rendered R markdown followed by a listing of the text of the R markdown file.


More products