Starting from:

$29.99

Assignment 1 Code Main Topics: Data Exploration & Data Visualization Solved

Tasks:
● Data Exploration & Visualization:
1. Download the file ‘parkingLA2017.csv’ from our class Blackboard site.
2. Read this file into your R environment (if it takes a while for the file to load, don’t worry – this is normal. Be patient). Show the step that you used to accomplish this.

a. What are the dimensions of this dataframe? Show the code that you used to determine this.

3. Filter the dataframe. Create a new object that only contains data for your assigned “Make” of car. Show the code that you used to accomplish this. For the next set of questions, use this new dataframe.
a. What are the dimensions of this new dataframe that only contains the rows for your assigned “Make” of car?


4. Dealing with missing values:
a. Are there any missing values in your dataset? How do you know this? Show the R code that you used to answer this question, along with the results that appearedin your console.

b. Does the variable RP.State.Plate contain any missing values? How do you know this? Again, show your steps and the results.

c. Find and display the standard deviation for the variable Fine.amount. Does the variable Fine.amount contain any missing values? How do you know this? Again, show your steps and the results. To deal with this issue, perform an imputation by replacing the NAs with a reasonable alternative. Now, find and display the standard deviation for this variable again. What happened? Why do you think this change occurred?


d. Replace any blank cells in the Location column with NA. In a sentence or two, what does this accomplish?


e. Now, remove all of the records that contain “NA” for AGENCY.SHORT.NAME from your dataset.

5. Dealing with the Date data type A. Right now, if you call the str() function on your dataset, you’ll see that R does not recognize the Issue.Date variable as a date. Fix this by explicitly telling R to treat this variable as a date. Show the code that you used to accomplish this.
Hide
CHRY$Issue.Date <- as.Date(CHRY$Issue.Date, format="%Y-%m-%d" ) # change the date based from View str(CHRY) # recall the structure
'data.frame': 11952 obs. of 21 variables:
$ Agency : int 1 1 1 1 1 1 1 1 1 1 ...
$ Ticket.number : Factor w/ 1048575 levels "1001197094","1001197105",..: 32198 5
3343 54395 55516 51736 53427 69070 53348 81380 16594 ...
$ Issue.Date : Date, format: "2017-10-11" "2017-03-13" ...
$ Issue.time : int 1540 1540 1630 1455 920 1335 217 1450 1538 900 ...
$ Meter.Id : Factor w/ 18141 levels "",".","#1","#2",..: 1 1 1 1 1 1 1 1 21
8 1 ...
$ Marked.Time : int NA NA NA NA NA NA NA NA NA NA ...
$ RP.State.Plate : Factor w/ 71 levels "AB","AK","AL",..: 8 8 8 8 8 8 8 8 8 8 ... $ Plate.Expiry.Date : int 201707 201606 NA 201708 201706 201709 201711 NA 201802 NA
...
$ VIN : logi NA NA NA NA NA NA ...
$ Make : Factor w/ 1076 levels "","ABAR","ABRI",..: 148 148 148 148 148
148 148 148 148 148 ...
$ Body.Style : Factor w/ 96 levels "","2D","2H","4D",..: 51 51 51 51 51 51 51
51 51 51 ...
$ Color : Factor w/ 68 levels "","AM","AP","AQ",..: 53 49 25 64 10 64 49 7 10 25 ...
$ Location : Factor w/ 307500 levels "",",10989 ROCHESTER AVE",..: 185483 3
04477 201295 179143 130195 33756 95692 304475 301187 304296 ...
$ Route : Factor w/ 3942 levels "","016V5","01A4",..: 1730 1779 1732 177 3 1 1079 3028 1779 2171 2103 ...
$ Violation.code : Factor w/ 220 levels "0","1","10","11",..: 82 44 78 75 163 146
6 44 201 219 ...
$ Violation.Description: Factor w/ 384 levels "","1504A","1564052",..: 264 271 259 285
295 358 30 271 280 120 ...
$ Fine.amount : num 25 363 25 50 93 93 68 363 63 25 ...
$ Latitude : num 99999 99999 99999 6473311 6488212 ...
$ Longitude : num 99999 99999 99999 1825895 1841816 ...
$ AGENCY.NAME : Factor w/ 20 levels "51 - DOT - WESTERN",..: 20 20 20 20 20 20
20 20 20 20 ...
$ AGENCY.SHORT.NAME : Factor w/ 10 levels "BLDG & SAF","DOT - HLYW",..: 10 10 10 10 10 10 10 10 10 10 ...
B. Are the rows currently displayed in chronological (i.e. date) order? How do you know this? Use the arrange()function from dplyr to put the dates in order. Show the code that you used to accomplish this.

C. What month were you born in? Using the subset() function, make a new object that only contains dates foryour particular birth month. Show the code that you used to accomplish this. We will not use this object again in any of the following steps.

6. We won’t need to use the variable ‘ticket number’ in our analysis. Remove this column from your dataframe. Show the code that you used to accomplish this.

7. Using the summary() function, find out even more about the distribution of fine amounts. Show a screenshot that displays the Minimum, 1st Quartile, Median, 3rd Quartile, Maximum, and Mean values for parking fine amounts.

8. Identify the five most common types of violation descriptions in your dataset. Show the code that you used to accomplish this, and a screenshot that shows the names of the five most common violation descriptions.

NO PARK/STREET CLEAN METER EXP.
3717 1578
DISPLAY OF TABS RED ZONE
910 707
NO PARKING PREFERENTIAL PARKING
613 609
NO EVIDENCE OF REG PARKED OVER TIME LIMIT
411 393
DISPLAY OF PLATES EXCEED 72HRS-ST
319 238
WHITE ZONE BLOCKING DRIVEWAY
181 163
NO STOPPING/ANTI-GRIDLOCK ZONE STANDNG IN ALLEY
160 141
NO STOP/STANDING 18 IN. CURB/2 WAY
133 110
PARKED ON SIDEWALK NO STOP/STAND
107 90
DISABLED PARKING/NO DP ID FIRE HYDRANT
87 87
EXPIRED TAGS HANDICAP/NO DP ID
78 72
STOP/STAND PROHIBIT YELLOW ZONE
70 59
OUTSIDE LINES/METER WHITE CURB
55 55
PK IN PROH AREA PRIVATE PROPERTY
53 44
OFF STR/OVERTIME/MTR DOUBLE PARKING
43 35
DSPLYPLATE A NO STOPPING/STANDING
33 32
2251157A 22514
27 25
PREF PARKING NO STOP/STAND PM
25 24
PARKED IN PARKWAY OVNIGHT PRK W/OUT PE
23 22
STNDNG IN ALLEY 22500H
18 17
NO PARKING BETWEEN POSTED HOURS PK OUTSD PK STL
17 17
22500F EXCEED 72 HOURS
15 15
METER EXPIRED PARK IN GRID LOCK ZN
14 14
2251157B 5204
13 13
DISABLED PARKING/CROSS HATCH WITHIN INTERSECTION
13 13
PARKED IN BUS ZONE 22500E
12 11
4000A RED CURB 10 10
22502A BLK BIKE PATH OR LANE
9 9
PARKING AREA TIME LIMIT/CITY LOT
9 9
OFF STR MTR/OUT LINE PARKED IN CROSSWALK
8 8
22500B 18 IN. CURB/1 WAY
7 6
DP-BLKNG ACCESS RAMP WRG SD/NOT PRL
6 6
18 IN/CURB/COMM VEH 5200
5 5
CITY PARK/PROHIB DP- RO NOT PRESENT
5 5
EXCEED TIME LMT PARK-PSTD AREAS
5 5
PARKED IN FIRE LANE PARKING/FRONT YARD
5 5
RESTRICTED TAXI ZONE 3 FT. SIDEWALK RAMP
5 4
NO PK BET 1-3AM NO STOP/STAND AM
4 4
PUBLIC GROUNDS 22500I
4 3
22502E CLEANING VEH/STREET
3 3
COMM VEH OVER TIME LIMIT LOAD/UNLOAD ONLY
3 3
R/PRIV PARKING AREA 22522
3 2
80581 DISABLED PARKING/BOUNDARIES
2 2
DISABLED PARKING/OBSTRUCT ACCESS DP-REFUSE ID
2 2
HANDICAP/CROSS HATCH PARKING OUTSIDE PARKING STALLS
2 2
PK OTSD PSTD AR SAFETY ZONE/CURB
2 2
STORING VEH/ON STR
2 1
22500A 22502
1 1
225078 22523AB
1 1
22651C (Other)
1 19
9. Create a new dataframe that only contains data for the five most common violations. Show the code that you used to accomplish this. You will use this new dataframe for all the following steps in this assignment.
Hide

10. Using ggplot, create a barplot that displays the number of occurrences for the five most common violations. Be sure to label your axes, to give the graph a title, and to color each of your bars. In a sentence or two, describe what your barplot is showing you.

11. How did the average size of a fine vary from agency to agency? (use the AGENCY.SHORT.NAME variable to make this grouping). Find the average fine Size for each agency (show a screenshot of your code plus your results) and then display your results visually with a barplot built using ggplot. Give your barplot a title, and clearly label your x and y axes. Be sure to color your bars. In a sentence or two, describe what your barplot is showing you.


12. Using ggplot, create a violin plot that shows the agencies on the x-axis, and the fine amounts on the y-axis. Give your violin plot a title, and clearly label your x and y axes. In a sentence or two, describe what your violin plot is showing you.


13. Using ggplot, create a histogram that shows the frequency of ticket issuances per hour of the day. Use different colors to depict different types of violation descriptions. In a couple of sentences, describe what your histogram is showing you. What meaning could someone take from this? Why (or why not) does this histogram make sense, in terms of what it says about parking violations and times of day?

More products