R tutorial for a Unix environment | Statistics and Actuarial Science

The following tutorial provides a basic introduction to using R in a Unix environment.

Invoking/interrupting/quitting
Getting help
Mathematical calculations
Storing/manipulating data using vectors and matrices
Managing the workspace image
Importing/exporting data
Functions
Distributions
Graphics

Invoking/interrupting/quitting

Before you begin this tutorial...

If you are not familiar with working in a Unix environment, please take the following Unix tutorial.

Invoking R

You start R by typing "R" at the Unix prompt on a machine that has R installed on it. R is installed on the following undergrad machines: bacon, agnesi, blanch, fenchel, fitch, maddison, magnus, merrill and rees. To obtain a Unix prompt, you must login to one of these machines by typing "rlogin maddison". For those students with a graduate account, R is available on the following graduate machines: pythagoras and beta.

Once you have started R, you must also decide if you want the commands that you type to be saved. Here are your choices:

Type all of your commands directly into the R window. The disadvantage is that your commands are not saved. Also, when you write a command, R doesn't allow you to go back and correct a typo so you will have to retype your command.
Use a text editor (emacs, for example), write your commands in it and copy-paste them into your R window.
Using the script command, you can create a file, say Rsession1, that is an exact copy of every keystroke that you perform in a Unix window as well as everything that is echoed onto the screen after you invoke the command. For instance, at the Unix prompt, type:
```
$ script 'Rsession1'
$ R
```
to produce the script. Rsession1, that will copy everything in your R session. To stop the script from copying, simply quit R and at the Unix prompt, type:
```
$ exit
```
If you are running an intensive program, you can run it using BATCH mode. Type:
```
help(BATCH)
```
in R for more information.

Interrupting R

To interrupt an R command, use the Ctrl-C key combination to stop execution of the command. This will return you to the R prompt.

Quitting R

To quit R, type:

> q()
> Save workspace image? [y/n/c]: y

When you quit R, you will be asked whether you want to save the workspace image. Refer to managing the workspace image for a more detailed description of the workspace image. Note that in

If you answer "y", all of the objects you have created are saved and are available the next time you start R. If you answer "n", all objects are removed. Answering "c" cancels quitting and you are returned to an R prompt.

Caution: it is not advisable to rely on saving the workspace image so that all objects are saved and ready for your next R session. For instance, you may overwrite a particular object and then can't remember how that object was obtained. In general, it is recommend that you use a text editor, like that discussed above, or at the very least save all of your R commands using the script command, to save all of the R commands that were used during your R session. In this case, all objects can then be easily input into future R sessions.

Getting help

How to get help on a particular R command

You can type either of

> help(log)
> ?log

to get help on the log command.

How to obtain a list of the help files for all R commands

Type:

> help.start()

to start a help window using Netscape. This is also a way to list all of the R commands.

Other ways of getting help in R

You may also find additional information in R manuals or FAQ's. These can be accessed from the internet using the following links:

Note that if you do not know the command in R that you want help on, the above methods of searching the R help files for relevant information concerning a particular subject are (most of the time) not useful. Some other ways of searching for help on a particular subject are:

Search for the particular subject in R using the Google newsgroups. Note that the keyword, R, is not particularly helpful when specified in Google because it is generic. Because Splus and R are quite similar in most cases, searching Google using Splus instead of R will most likely yield some information.

Above all, if you know know something about R, ask someone who might know!

Mathematical calculations

You can use R for many mathematical calculations. Note that the use of brackets in the typical way is recommended, if not needed, for large mathematical expressions.

Addition/subtraction

> 5 + 5
> 10 - 2

Multiplication/division

> 10*10
> 25/5

Power multiplication

> 3 ^ 2
> 2 ^ (-2)

Note that

> 100 ^ (1/2)

is equivalent to

> sqrt(100)

Logarithms

> # Compute a natural logarithm
> log(10)
> # Compute a logarithm with base 10
> log10(1000)
> # Compute a logarithm with base 2
> log2(8)
> # Compute a logarithm with base 4
> log(16,base=4)

Exponentials

> exp(1)

Trigonometric Functions

Type:

> help(sin)

to see a list of all functions. Note that angles are specified in radians so if you want sin of 90 degrees, type:

> sin(pi/2)

The value NA

NA is the value R uses for a missing or undefined value. For instance, type:

> sqrt(-3)

Storing/manipulating data using vectors and matrices

Creating vectors and extraction of its elements

> x <- c(1,3,5,7,8,9)
> x
[1] 1 3 5 7 8 9
> y <- 1:20
> y
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> z <- seq(1,2,0.1)
> z
[1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
> x[3]
[1] 5
> x[1:3]
[1] 1 3 5
> x[-2]
[1] 1 5 7 8 9

Mathematical operations on vectors

One of the quirks/strengths of R is it's use of vectorized operations. Create two vectors x and y, having the same length, and see what happens when you do each of the following operations

> x + y 
> x - y
> x / y  #division element by element
> x * y  #multiplication element by element
> sqrt(y)
> x^3
> x^y
> log(x)
> cos(x)

Creating matrices and extraction of its elements

> X <- cbind(x,1:6)  # You can also use rbind to combine rows
> X
     x  
[1,] 1 1
[2,] 3 2
[3,] 5 3
[4,] 7 4
[5,] 8 5
[6,] 9 6
> Y <- matrix(0,2,3)
> Y
       [,1] [,2] [,3]    
  [1,]    0    0    0
  [2,]    0    0    0
> Y <- matrix(x,2,3)
> Y
       [,1] [,2] [,3] 
  [1,]    1    5    8
  [2,]    3    7    9
Y[1,2]  # extract element on the first row and the second column 
[1] 5
> Y[1,]   # extract the first row
[1] 1 5 8 
> Y[,1]   # extract the first column
[1] 1 3
> Y[2,c(1,3)] # extract elements (1,3) of the second row
[1] 3 9

Mathematical operations on matrices

To multiply two matrices A and B together, type:

> A %*% B

To find the transpose of a matrix, type:

> t(A)

To find the inverse of a matrix, type:

> solve(A)

Comparing objects

Suppose you create a vector, x, in the following way:

> x <- c(-1,3,2,-4,5,6)

Now create a new vector.

> i <- x<0

This returns a vector of logical values

[1,]  (T,F,F,T,F,F)

The first element of x is less than 0, so the first element of i is T. The second element of x is greater than 0, so the second element of i is F, etc. Other comparison operators are >,<=, >=, == The logical operators are &(and) |(or) !(not)

> i <- x==3 | x==6 
> i
[1] F T F F F T
> i <- x<6 & x>3 
> i
[1] F F F F T F

Obtaining a subset of an object

Suppose you want to extract a subset of a vector based on its values. For instance, if you want to extract just those elements of x that are less than 0, type

> i <- x<0

> x[i]

This returns the vector

[1,]  -1,-4

A short cut is

> x[x<0]
> x[x<6 & x>3]

Say you have two vectors that give the weight and height of 7 people

> weight <- c(110,120,115,136,205,156,175)

> height <- c(64, 67, 62,  60, 77, 70, 66)

To extract the weight of all people whose height is greater than 65,

> weight[height>65]

To extract the weight of all people whose height is equal to 60,

> weight[height==65]

To extract the weight of all people whose height is not equal to 60,

> weight[height!=65]

Managing the workspace image

The workspace image includes all of the R objects that you have created in the current, and possibly other, R sessions. An R object is any vector, matrix, list, etc. that you have assigned a specific name. For instance, suppose you typed:

> x <- 1

Then x is now an object that is in the workspace image.

Listing objects from your workspace image

To view the objects in the workspace image, type:

> ls()

Removing objects from your workspace image

To remove an object, x, from the workspace image, type:

> rm(x)

To remove all objects from the workspace image, type:

> rm(list=ls())

Importing/exporting data

The following information will provide a basic introduction to importing and exporting data into and out of R.

Importing data from an external ascii file

Several functions can be used to read data from an external ascii file (text file) into R and store it into an R object. However, if data is stored in some other type of format (e.g. an Excel spreadsheet), you may want to use the application in which the data is stored in (e.g. Excel) to export the data to a text file.

The function, scan, will read data into a vector or list from a console or ascii file. To see how this is done, first create a file, "file1.txt", in the directory in which you run R. In Unix, this just means that "file1.txt" must be in the same directory in which you invoked R.

For instance, suppose the text file, "file1.txt", contains the following information on the weight of ten subjects.

195 143 210 105 177 154 123 233 118 166

To scan the data in "file1.txt" to the vector, x, type:

> x<-scan("file1.txt")
> x

Type

> help(scan)

for more information on the available options.

The function, read.table, will read an ascii file (text file) in table format and create a data frame from it. Table format means that the lines in the file correspond to cases and the columns in the file correspond to variables. A header may (or may not) be included in the file indicating the name of the variable in each column. A data frame is used with most R modeling software and is similar in nature to matrices and lists.

We can read the data from the file, "file1.txt" into a data frame, x, using the function, read.table. To see how this is done, first create a file, "file1.txt", in the directory in which you run R. For instance, suppose the text file, "file1.txt", contains the following information on the weight and age of ten subjects.

To scan the data in "file1.txt" to the vector, x, type:

> x <- read.table("file1.txt",header=TRUE)
> x

Type

> help(read.table)

for more information on the available options.

Importing data from a text file on a webpage

It is likely that instead of having an ascii file saved in the directory in which you are using R, the file to be imported may exist on a webpage. Using any of the commands for importing data given previously (i.e. scan, read.table), the file can also be imported by specifying the URL as the filename. To import this dataset to R using the URL as the filename, just type:

> x <- read.table("http://www.math.uwaterloo.ca/~dbabinea/webpage/R.tut/R.tut.unix/file1",header=TRUE)

Exporting data

Several functions can be used to write an R object to an external ascii file (text file).

The function, write, writes out a matrix or vector in a specified number of columns. To see how this is done to the matrix, x, first put the matrix, x, into R by typing:

> x1 <- 1:10
> x2 <- 11:20
> x3 <- 21:30
> x <- cbind(x1,x2,x3)

To write the matrix, x, so that each column appears as a line in the text file, type:

> write(x,file="file1.txt",ncolumns=10)

Alternatively, if you want each row in the matrix, x, to appear as a line in the text file, type:

> write(t(x),file="file2.txt",ncolumns=3)

Using either of these commands will save the text files in the directory in which you run R. Type

> help(write)

for more information on the available options.

The function, write.table, will write a data frame (after converting it to a data frame if it isn't already one) to an external ascii file (text file). Entries in each line(row) are separated by the value of 'sep;. The default value for 'sep' is a space. This is done to the object, x2, by typing:

> x1 <- 1:10
> x2 <- 11:20
> x3 <- 21:30
> x <- cbind(x1,x2,x3)
> x2 <- data.frame(x)
> write.table(x2,"file3.txt",quote=FALSE)

In this case, the object is written to a file, file3.txt, which should now be in the directory in which you run R.

Type

> help(write.table)

for more information on the available options. Note that the function, write.table, can be slow for data frames that have hundreds of columns. The function, write.matrix, in the package, 'MASS', is a more efficient way of dealing with this problem if the object being exported can be represented as a numeric matrix.

Functions

R functions are used to provide users with programmed procedures for a number of statistical and non-statistical procedures. Many functions are available in R. All basic functions are part of the BASE package. However, other functions are part of other packages that must be loaded. If you need to write your own function, this is also possible.

R functions that are available through add-on packages

Functions that are part of the BASE package are available upon invoking R. Some R functions may not be available when you first invoke R and are available by specifying different add-on packages. If you don't know if a function is part of the BASE package or not, simply type:

> help(function.name)

If help is not given, check the spelling OR you have found a function that is not a part of the BASE package. Refer to the listing of packages given on the R homepage to see a listing of the different add-on packages that are available. Click on any package to see a listing of the functions that are a part of that package. For more detailed information, refer to the Section 5 of the FAQ under the R Homepage Documentation.

Suppose you want to use the function, survreg. This function is available in the package, survival. To load the package in Unix, type

> library(survival)

You will now be able to use the function, survreg.

You can find out which functions a package provides by typing

> library(help=survival)

> help(package=survival)

How to write your own functions

Here is a simple example of how to create your own function.

standardize<-function(x)
{
# Inputs: a vector x
# Outputs: the standardized version of x
#
        m<-mean(x)
        std<-sqrt(var(x))
        result<-(x - m)/std
        return(result)
}

The function takes one argument, a vector x, and returns a vector. The lines beginning with # are comments. The last line tells the function to return the value result, i.e. the original vector x, transformed by subtracting its mean, and then dividing by its standard deviation. To invoke the function on a vector x, type

> xstand <- standardize(x)

To see the commands which make up the function, just type

> standardize

Note that there are no brackets used. If you want to create a function which returns several outputs, here is a simple example.

sqacu<-function(x)
{
# Inputs: a vector x
# Outputs: the square of x and the cube of x
#
        res1<-x^2
        res2<-x^3
        return(list("square"=res1,"cube"=res2))
}

Now, if you type

>sqacu(2)
$square:
[1] 4

$cube:
[1] 8
>sqacu(2)$square
[1] 4

Iteration

When writing your own function, avoid iteration if you can; take advantage of R vectorized math and functions such as apply. It successively applies the function of your choice to each row (or column) of a matrix. Let's create a simple matrix and use apply to find the mean of each row/column.

> x <- matrix(1:12,3,4)
> apply(x,2,mean) #returns the mean of each column.
> apply(x,1,mean) #returns the mean of each row

Sometimes, iteration can not be avoided. The R commands, for or while , are useful in this situation. Here is an example of using for inside a function

jsum<-function(x)
{
        jsum <- 0
        for(i in 1:length(x)) 
        {
                jsum <- jsum + x[i]
        }
        return(jsum)
}

Note that R has its own function that performs this task, called sum. It will work much faster than this one, especially on large vectors.

Distributions

Data from different distributions can be easily calculated or simulated using R. The functions are named such that the first letter states what it calculates or simulates (d=density function, p=distribution function, q=quantile, r=random generation) and the last part of the function's name specifies the distribution (beta=beta, chisq = chi-squared, exp=exponential, f=f, gamma=gamma, logis=logistic, norm=normal, t=student t, unif = uniform, weibull=weibull, binom=binomial, nbinom=negative binomial, pois=poisson). For instance, the function, qnorm, returns the quantiles of the normal distribution.

Calculating the probability density function

To calculate the value of the p.d.f. for a N(2,25) using the quantile, x, type:

> dnorm(x,mean=2,sd=5)

Calculating the cumulative density function

To calculate the value of the c.d.f. for a N(2,25) using the quantile, x, type:

> pnorm(x,mean=2,sd=5)

Determining a quantile

To calculate the quantile associated with a N(2,25) using the probability, x, type:

> qnorm(x,mean=2,sd=5)

Generating a random value from a distribution

To generate 10 random values from a N(2,25), type:

> rnorm(10,mean=2,sd=5)

Graphics

Opening a graphics window

To open/close a graphics device :

> X11()      # open a graphics device in R
> dev.off()  # close a graphic device in R

Note that you do not need to open a graphics device to use a plotting function. An X11() graphics device is opened automatically when a plotting function is called.

Common types of graphs

There are several high level plot functions that can be used to plot graphs. For instance, a histogram can be made by typing:

> x <- rnorm(1000)
> hist(x)

To plot two R objects against one another (called a scatterplot), type:

> x <- 1:10
> y <- seq(from=1,to=20,by=2)
> plot(x,y)

Plotting functions, like hist or plot, have many features that can be changed to accomodate the type of plot that you want. For instance, suppose you wanted a line plot instead of a scatter plot with the x-axis labelled as "X Values". Type:

> plot(x,y,type="l",xlab="X Values")

See the individual help files for more information.

You may also want to add other plots to an existing plot (i.e. overlay a plot).

To add a straight line with y-intercept, 5, and slope, 1, to the above plot, type:
```
> plot(x,y,type="l",xlab="X Values")
> abline(5,1)
```

To add a line plot to the above plot, type:

> plot(x,y,type="l",xlab="X Values")
> x1<-1:10
> y1<-rep(c(5,15),5)
> lines(x1,y1)

To add points to the above plot, type:

> plot(x,y,type="l",xlab="X Values")
> x1<-1:10
> y1<-rep(c(5,15),5)
> points(x1,y1)

Note that when using the functions, lines or points, any graphical features on the original plot (i.e. title, axis labels, axis ranges) can not be changed without re-running the original plot. This forces the values specified for the x and y coordinates in the functions, lines or points, to lie within the range of the original plot.

Changing graphical features

Graphical plots, like 'hist' or 'plot', have many additional graphical features that aren't given specifically in the help files. For instance, type:

> help(plot)

to give the following help file.

plot                  package:base                  R Documentation

Generic X-Y Plotting

Description:

     Generic function for plotting of R objects.  For more details
     about the graphical parameter arguments, see `par'.

Usage:

     plot(x, y, xlim=range(x), ylim=range(y), type="p",
          main, xlab, ylab, ...)

Arguments:

       x: the coordinates of points in the plot. Alternatively, a
          single plotting structure, function or any R object with a
          `plot' method can be provided.

       y: the y coordinates of points in the plot, optional if `x' is
          an appropriate structure.

xlim, ylim: the ranges to be encompassed by the x and y axes.

    type: what type of plot should be drawn.  Possible types are

             *  `"p"' for points,

             *  `"l"' for lines,

             *  `"b"' for both,

             *  `"c"' for the lines part alone of `"b"',

             *  `"o"' for both ``overplotted'',

             *  `"h"' for ``histogram'' like (or ``high-density'')
                vertical lines,

             *  `"s"' for stair steps,

             *  `"S"' for other steps, see Details below,

             *  `"n"' for no plotting.

          All other `type's give a warning or an error; using, e.g.,
          `type = "punkte"' being equivalent to `type = "p"' for S
          compatibility.

    main: an overall title for the plot.

    xlab: a title for the x axis.

    ylab: a title for the y axis.

     ...: graphical parameters can be given as arguments to `plot'. 

    .

    .
    
    .

    etc.

The blue highlighted area indicates that there are other graphical parameters that can be given as arguments. To see these, type:

> help(par)

There are two ways to set the graphical parameters given by 'par'.

To set graphical parmaeters that apply to all plots, set the features using arguments to 'par'. The only arguments that can be set this way are:
```
*  `"ask"'

*  `"fig"', `"fin"'

*  `"mai"', `"mar"', `"mex"'

*  `"mfrow"', `"mfcol"', `"mfg"'

*  `"new"'

*  `"oma"', `"omd"', `"omi"'

*  `"pin"', `"plt"', `"ps"', `"pty"'

*  `"usr"'

*  `"xlog"', `"ylog"'
```
Here is an example. A common parameter that you may want to set is to graph several plots on one page. Suppose you wanted 6 plots on one page, with 2 plots per line (a total of 3 lines), and the plots are placed on the graph by filling each row in order. The you would type:
```
> par(mfrow=c(3,2))
> plot(1:10)
> title("Plot 1")
> plot(10:100)
> title("Plot 2")
> plot(5:7)
> title("Plot 3")
> plot(50:75)
> title("Plot 4")
> plot(22:33)
> title("Plot 5")
> plot(1000:10000)
> title("Plot 6")
```
If you wanted to have 6 plots on one page, like above, but instead place the graphs by filling each column in order, then you would use
```
> par(mfcol=c(3,2))
```
To place each of the 6 plots in locations that don't follow an ordering given by 'mfrow' or 'mfcol', first set the number of plots on each page using 'mfrow' or 'mfcol' and then use 'par(mfg=c(a,b))' preceding each individual plotting function to specify the position that you want the plot graphed.
To set graphical parameters that apply to specific high level plots, set the features within the plotting function itself. For instance, suppose you wanted to change the plotting symbol in a scatterplot to be the character 's'. Then you would type:
```
> plot(1:10,pch="s")
```
Note that the graphical parameters that are set using 'par' (described in the previous point) can not be set using this method.

Saving/printing graphs

Simple methods for saving/printing graphs in R are given below. For more detailed information about saving/printing in R, refer to Section 5.2 in the R tip sheet.

Generate a plot using any of the plotting functions in R. For instance, type
```
> plot(1:10) 
```
To save this plot to a postscript file called, graph1.ps, type
```
> dev.print(postscript,file="graph1.ps") 
```
The file, graph1.ps, is saved in the directory in which R is invoked. Other graphics devices (i.e. x11, jpeg, etc.) can be specified. Type
```
> help(Devices)
```
for a list of the available devices. When the graph is saved to an appropriate file, it can then be printed.