This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Please look to the first R markdown file for the basic notes. This one will give a summary of other information we have covered since then (and not necessarily include all things we covered before this)
we wil be working with the bird weight data from canvas the first step is to make sure you download that data and set your working directly to the location with the file
first we will read in the data as an object called “bird”
bird<-read.csv("birdweight.csv")
you should always look at your data to see how it is formatted. if you do the summary function, you will see we do not see how the breakdown of area and morph is
summary(bird)
## area morph weight
## Length:12000 Length:12000 Min. :116.1
## Class :character Class :character 1st Qu.:268.2
## Mode :character Mode :character Median :362.6
## Mean :392.4
## 3rd Qu.:503.9
## Max. :722.2
if we look at the structure of the data, we will see that area and morph are as characters, we want them as factors let’s change that
str(bird)
## 'data.frame': 12000 obs. of 3 variables:
## $ area : chr "North" "South" "East" "West" ...
## $ morph : chr "Blue" "Red" "Red" "Blue" ...
## $ weight: num 310 421 129 657 345 ...
bird$area<-as.factor(bird$area)
bird$morph<-as.factor(bird$morph)
Let’s check that it changed
str(bird)
## 'data.frame': 12000 obs. of 3 variables:
## $ area : Factor w/ 4 levels "East","North",..: 2 3 1 4 2 3 1 4 2 3 ...
## $ morph : Factor w/ 2 levels "Blue","Red": 1 2 2 1 2 1 1 2 1 2 ...
## $ weight: num 310 421 129 657 345 ...
if you use the “levels” function, it will tell us what the factors are
levels(bird$morph)
## [1] "Blue" "Red"
levels(bird$area)
## [1] "East" "North" "South" "West"
if we do a summary, we can see that all of our data is equally distributed between areas as well as morphs
summary(bird)
## area morph weight
## East :3000 Blue:6000 Min. :116.1
## North:3000 Red :6000 1st Qu.:268.2
## South:3000 Median :362.6
## West :3000 Mean :392.4
## 3rd Qu.:503.9
## Max. :722.2
let’s make a histogram for the weight
hist(bird$weight, breaks=25, col="lightblue", main="Bird Weight",
xlab="Bird Weight (g)", xlim=c(0,730))
we can always add a box around our plot with the below command
hist(bird$weight, breaks=25, col="lightblue", main="Bird Weight",
xlab="Bird Weight (g)", xlim=c(0,730))
box()
we can see that the histogram has different “populations”, and we know that our data can be broken into both morphs and areas. Let’s do a boxplot of the morphs
boxplot(bird$weight~bird$morph, xlab="Color",
ylab="Weight (g)", main="Weight by Morph",
col=c("lightblue", "lightpink"))
we see there is no clear difference in weight between the morphs
however, we do see a difference by area!
boxplot(bird$weight~bird$area, xlab="Area",
ylab="Weight (g)", main="Weight by Area")
now what if I wanted to know the summary statistics of the weight for
the birds just in the west? we can subset the data to do that
to do this, we will make an object called “west.bird” in which we put bird$weight only for rows which have the exact value of “West” for the area. that can be written like the below. The double “=” means “exactly as”
west.bird<-bird$weight[bird$area == "West"]
we can then get our summary statistics using “summary()”
summary(west.bird)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 651.1 657.1 682.4 684.6 712.1 722.2
we can even do a histogram of this data
hist(west.bird)
We can do this subsetting for each population/area. Here’s an example for the north
north.bird<-bird$weight[bird$area == "North"]
summary(north.bird)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 299.4 311.8 333.8 329.0 346.1 352.8
hist(north.bird)
we see that each of these populations have two peaks in the histograms, suggesting that theymust have different values per morphs, perhaps. But we have red and blue morphs. we can easily plot this in a boxplot, we can have more than variable in “x” portion of our plot
notice the below “equation”. We can say that weight is a function of morph AND area
boxplot(bird$weight~ bird$morph + bird$area, ylab="Weight(g)",
xlab="location and morph", col=c("lightblue", "pink"))
Now what if we want to plot a boxplot of just the birds of the west with morph information. we already subset our weight by area, we should subset our morph by area. this line will give us bird morph data for the rows with the “West” as the area. this will be the same number of lines (and order) as the bird.west object we have for bird weight
west.bird.morph<-bird$morph[bird$area == "West"]
Now if we do a boxplot of this
boxplot(west.bird~west.bird.morph, col=c("lightblue", "pink"),
ylab="Weight (g)", xlab="Morph", main="West Population")
what if we want values for blue birds from the west?we can subset with
more than one qualifier using the ampersand (shift+7).
west.blue<-bird$weight[bird$morph == "Blue" & bird$area=="West"]
str(west.blue)
## num [1:1500] 657 658 662 658 655 ...
summary(west.blue)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 651.1 655.7 657.1 657.1 658.4 663.4
hist(west.blue)
we can also subset by disqualifying an option. if we wanted all the data for birds from every location except the west we can do the below. this means “not equal to”.
not.west<-bird$weight[bird$area != "West"]
str(not.west)
## num [1:9000] 310 421 129 345 408 ...
hist(not.west)
what if we want to get values for birds that are not west or east? We can do that as well with subsetting.
not.east.notwest<-bird$weight[bird$area !="West" & bird$area !="East"]
str(not.east.notwest)
## num [1:6000] 310 421 345 408 319 ...
hist(not.east.notwest)