R Graphics for beginner

Hi this is a R markdown document, to show you some of the basics to develop graphics in R. If you want the data set or more information, look at my Githubpage.

##Packages

library(car)
library(knitr)
library(foreign)
library(gplots)
library(psych)

#Reading the data We use a data set from the Department of Psychology of the Univerity of Colorado about Interests of people. You can get it also here. It is a spss data, so we need the package foreign to read the data set and to convert it to a data frame.

interest <- as.data.frame(read.spss("/home/matti/R-for-Beginners/Datasets/interest.sav"))
#file.choose() a good help

We have to change the label of the gender, because the plots will look better with it:

interest$gender  <- recode(interest$gender, '1="female" ; 2="male" ')

#Graphs for categorial variables

In this chapter we look for nominally and ordinal scaled variables.

##Barplot with absolute frequency

We will create a barplot for the variable “education”. Befor that we use the table() function to build a vector:

educ<-  table(interest$educ)

barplot(educ)

##Barplot with relative frequencys Relative frequencys show which percentage have some specific education. Just the y axis ist changing.

barplot(prop.table(educ))

barplot(prop.table(educ), horiz = T ,las=1)

To connect two variables, here education and gender:

barplot(table(interest$educ, interest$gender), legend=T)

barplot(table(interest$educ, interest$gender), legend=T,beside=T)

##Pie chart What is about some delicious pie?

pie(table(interest$educ))

#Graphs for metric variables

We build a barplot with the geometry variable:

barplot(interest$geometry)

Now a histogram:

hist(interest$geometry)

Now a histogram with 10 breaks and a suggested normal distribution:

mean  <-  mean(interest$geometry, na.rm=T)
sd  <-  sd(interest$geometry, na.rm = T)
hist(interest$geometry, breaks=10)
curve(dnorm(x, mean, sd)*length(interest$geometry), add=T)

Sometimes it is difficult to interpretate a histogram, because it depends on the breaks. So we build a density plot with a continous line:

plot(density(interest$geometry, na.rm = T))
curve(dnorm(x, mean, sd), add=T, lty=2)

#Boxplots

Boxplots are important for explorative data analysis and show characteristics of the distribution. Here we use the age:

boxplot(interest$age)

You can also create some Boxplots together:

boxplot(data.frame(interest$vocab, interest$reading, interest$sentcomp))

To show some Boxplots with informations about some groups (e.g. for women and men) you can use the ~ sign. With the identify() you can click on the Boxplots, for example to get to know extreme values.

boxplot(interest$age ~  interest$gender)

#identify(interest$gender, interest$age)

#Barplot for means We want to visualize the mean of education seperated for men and women. To do that we integrate the tapply() function in barplot():

barplot(tapply(interest$educ, interest$gender, mean, na.rm=T))

To integrate more means of variables we also use the tapply function:

barplot(tapply(interest$educ, data.frame(interest$gender, interest$age), mean, na.rm=T, na.omit=T), legend=T, xlab="age", ylab="education")

#for easier interpretations add: beside=TRUE

#Error bars To build an error bar, you can use the package gplots and the plotmeans() function. This function build confidence intervals and with p=0.95 we set it for the 95% confidence interval:

plotmeans(interest$educ~ interest$gender, p=0.95)

##Cats-eye plot

If you want to compare the confidence intervalss of some variables use the error.bars() function of the psych package. Here we compare geometry and reading:

geo.read <- na.omit(data.frame(interest$geometry, interest$reading))
error.bars(geo.read, within=F,alpha=0.95, eyes=T)

To compare means of several groups use the error.bars.by() function. But you have to build data.frames:

educ <- as.data.frame(na.omit(interest$educ))
gender  <- as.data.frame(na.omit(interest$gender))
error.bars.by(educ, gender, by.var=T, ylim=c(11,13), legend=1, eyes=T, alpha=0.95)

##Create own Error bars To create your own error bars use the arrow() function and build arrows around the means of the barplot with a distance of the standard deviation. We do this for the first block in the barplot:

des.read  <-  describeBy(interest$reading,interest$gender, mat=T)
plot.read  <- barplot(des.read$mean, names.arg = des.read$group1)
arrows(plot.read[1], des.read$mean[1]-des.read$mean[1], plot.read[1], des.read$mean[1]+des.read$mean[1], code=3, angle=90)

To do this for the second or maybe more blocks in the barplot define the i in the following command. It allows you to get all columns in the table of des.read:

plot.read  <- barplot(des.read$mean, names.arg = des.read$group1)
for(i in 1:nrow(des.read)){
  arrows(plot.read[i], des.read$mean[i]-des.read$mean[i], 
         plot.read[i], des.read$mean[i]+des.read$mean[i], 
         code=3, angle=90)
}

#Q-Q-Plot When we are interested in a normal distribution histograms can`t help us much. Better are qqplots. The nearer the points are at the line the better is the normal distribution. qqnorm() is in the car package:

qqnorm(interest$reading)

qqPlot() is in the car package and shows you the confidence interval. If the point lays near the interval it is normal distributed:

qqPlot(interest$reading)

#Scatterplot To get the connection between two variables use scatterplots with two vectors:

plot(interest$reading, interest$geometry)

##Adding a Lowess-curve A Lowess curve is a nonparametric curve that rebuilds the connection between two variables, but we need varaibles without missing values:

read.geo <- na.omit(data.frame(interest$reading, interest$geometry))
plot(interest$reading, interest$geometry)
lines(lowess(read.geo[,1],read.geo[,2]))

##Adding a regression line To get a regression line use the abline() function combinated with a lm(). With identify() you can again search for extreme values:

plot(interest$reading, interest$geometry)
abline(lm(interest$reading~interest$geometry))

#Edit graphs To change the standard setting use the par() function to visit the possibilities for changing something:

par()
help(par)
  • pch=x - type of points in a scatterplot
  • ces=x - size of points or lettering
  • lty=x - type of line
  • lwd=x - size of the line
  • las=1 - numbers on the y axis are shown horizontal
  • xlab=”x”, ylab=”x”- the names for the axes
  • xlim=c(min, max), ylim=c(min,max)- the scale of the axes
  • main=”text” - heading
  • sub=”text” - subheading
  • col=x - the color (colors() gives a full list)

#Adding elements

To add something, write it directly under the plot() function, than it will be added to this graph:

  • abline() - straight line
  • arrows() - arrows
  • axis() - lettering axes
  • curve(….,add=TRUE) - functioncurves
  • legend() - legend
  • lines() - lines
  • points() - points
  • symbols() - symbols
  • text() - text

To get more informations use the help(abline) function.