Starting from:

$25

GU4206/GR5206-Homework 2 Solved

Part 1 (Iris)
Background

The R data description follows:

This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

Task
1) Using ggplot, as apposed to Base R, produce the same plot constructed by the following code. That is, plot Petal Length versus Sepal Length split by Species. The colors of the points should be split according to Species. Also overlay three regression lines on the plot, one for each Species level. Make sure to include an appropriate legend and labels to the plot. Note: The function coef() extracts the intercept and the slope of an estimated line.

# Base plot

plot(iris$Sepal.Length,iris$Petal.Length,col=iris$Species,xlab="Sepal",ylab="Petal",main=

# loop to construct each LOBF

for (i in 1:length(levels(iris$Species))) { extract <- iris$Species==levels(iris$Species)[i]

abline(lm(iris$Petal.Length[extract]~iris$Sepal.Length[extract]),col=i) }

# Legend

legend("right",legend=levels(iris$Species),fill = 1:length(levels(iris$Species)), cex =
"Gabriel's Plot

.75)

# Add points and text

points(iris$Sepal.Length[15],iris$Petal.Length[15], pch = "*", col = "black") text(iris$Sepal.Length[15]+.4,iris$Petal.Length[15],"(5.8,1.2)",col="black") points(iris$Sepal.Length[99],iris$Petal.Length[99], pch = "*", col = "red") text(iris$Sepal.Length[99]+.35,iris$Petal.Length[99],"(5.1,3)",col = "red") points(iris$Sepal.Length[107],iris$Petal.Length[107],pch = "*", col = "green") text(iris$Sepal.Length[107],iris$Petal.Length[107]+.35,"(4.9,4.5)",col = "green")

Gabriel's Plot
 

Sepal

Solution goes below:

library(ggplot2) ## Plot.

Part 2 (World’s Richest)

Background
We consider a data set containing information about the world’s richest people. The data set us taken form the World Top Incomes Database (WTID) hosted by the Paris School of Economics [http://top-incomes.gmond.parisschoolofeconomics.eu]. This is derived from income tax reports, and compiles information about the very highest incomes in various countries over time, trying as hard as possible to produce numbers that are comparable across time and space.

Tasks
2)    Open the file and make a new variable (dataframe) containing only the year, “P99”, “P99.5” and “P99.9” variables; these are the income levels which put someone at the 99th, 99.5th, and 99.9th, percentile of income. What was P99 in 1993? P99.5 in 1942? You must identify these using your code rather than looking up the values manually. The code for this part is given below.

Solution goes below:

wtid <- read.csv("wtid-report.csv", as.is = TRUE)

wtid <- wtid[, c("Year", "P99.income.threshold","P99.5.income.threshold", names(wtid) <- c("Year", "P99", "P99.5", "P99.9")
"P99.9.income.threshold")]

3)    Using ggplot, display three line plots on the same graph showing the income threshold amount against time for each group, P99, P99.5 and P99.9. Make sure the axes are labeled appropriately, and in particular that the horizontal axis is labeled with years between 1913 and 2012, not just numbers from 1 to 100. Also make sure a legend is displayed that describes the multiple time series plot. Write one or two sentences describing how income inequality has changed throughout time.

Solution goes below:

## Plot

Part 3 (Titanic)

Background
In this part we’ll be studying a data set which provides information on the survival rates of passengers on the fatal voyage of the ocean liner Titanic. The dataset provides information on each passenger including, for example, economic status, sex, age, cabin, name, and survival status. This is a training dataset taken from the Kaggle competition website; for more information on Kaggle competitions, please refer to https://www.kaggle.com. Students should download the data set on Canvas.

Tasks
4) Run the following code and describe what the two plots are producing

# Read in data

titanic <- read.table("Titanic.txt", header = TRUE, as.is = TRUE) head(titanic)

##              PassengerId Survived Pclass

## 1        1              0              3 ## 2     2                1              1 ## 3     3              1                3

## 4                            4                   1              1

## 5                            5                   0              3

## 6                            6                   0              3

##                                                                                                                    Name             Sex Age SibSp Parch

## 1                                                                          Braund, Mr. Owen Harris             male 22              1            0

## 2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38                                             1            0

## 3                                                                                 Heikkinen, Miss. Laina female 26                    0            0

## 4                                    Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35                           1            0

## 5 Allen, Mr. William Henry male 35 0 0 ## 6 Moran, Mr. James male NA 0 0

##                                Ticket              Fare Cabin Embarked

## 1        A/5 21171 7.2500               S ## 2     PC 17599 71.2833 C85         C ## 3 STON/O2. 3101282 7.9250   S ## 4        113803 53.1000 C123        S ## 5     373450 8.0500   S

## 6                              330877 8.4583                                   Q

library(ggplot2)

# Plot 1 ggplot(data=titanic) + geom_bar(aes(x=Sex,fill=factor(Survived)))+ labs(title = "Title",fill="Survived")

Title

 

# plot 2

ggplot(data=titanic) + geom_bar(aes(x=factor(Survived),fill=factor(Survived)))+ facet_grid(~Sex)+

labs(title = "Title",fill="Survived",x="")
Title

 

5)    Create a similar plot with the variable Pclass. The easiest way to produce this plot is to facet by Pclass. Make sure to include appropriate labels and titles. Describe your

Solution goes below:

# Plots

6)    Create one more plot of your choice related to the titanic data set. Describe what information your plot is conveying.

Solution goes below:

# Plots

Part 4 (Simulating and Graphing Probability Density)
7) Simulate a n = 1000 random draws from a beta distribution with parameters α = 3 and β = 1. Plot a histogram of the simulated cases using ggplot. Also overlay the beta density on the histogram. Hint: look up the beta distribution using ?rbeta.

Solution goes below:

# Sim and plots

More products