R tutorial for a Windows environment | Statistics and Actuarial Science

The following tutorial provides a basic introduction to using R in a Windows environment.

Invoking/interrupting/quitting
Getting help
Mathematical calculations
Managing the workspace image
Importing/exporting data
Functions
Distributions
Graphics

Starting/interrupting/quitting

If you want to install R on your Personal Computer (PC), please follow the installation and usage instructions carefully.

Starting R

Once you have installed R on your PC, simply open the package to begin. The RGui should immediately open. Unless otherwise stated, all commands given in this tutorial are written in the command window (called the R Console in Windows). This window should be open in the RGui once you have started R in Windows. A picture is given below. Unlike R in Unix, command-line editing is available, making it easy to make changes to your commands as you enter them. You may also want to save your commands at the end of your session. To do this, just left click on the File menu and select Save to File.... This is shown in the picture below.

You can then save all of your commands in a file located in the directory of your choice. This is shown in the picture below.

If you are running an intensive program, you can run it using BATCH mode only if you have Perl installed (and in your path).

Interrupting R

To interrupt an R command, use Esc button to stop execution of the command. This will return you to the R prompt.

Quitting R

To quit R, type: > q() or simply close the R package. When you quit R, you will be asked whether you want to save the workspace image. The workspace image includes all objects that have been created in that session. If you answer "yes", all of the objects you have created are saved and are available the next time you start R. If you answer "no", all objects are removed.

To change where your workspace image is stored, left click on Change dir... before you have quit R. This is shown in the image below.

You can then browse your directories to determine where you would like to save the workspace image. This allows for multiple workspace images and is useful if you are working on different projects. The next time you run R and you want the objects associated with a specific project, you must then change to the directory where the workspace image is stored in the same manner.

Caution: It is not advisable to rely on saving the workspace image so that all objects are saved and ready for your next R session. For instance, you may overwrite a particular object and then can't remember how that object was obtained. In general, it is recommended that you save your commands by selecting Save to File... in the File menu. In this case, all objects can then be easily input into future R sessions. The following tutorial is used to provide a basic introduction to R in a Windows environment. For more information regarding problems that may be encountered when using R in a Windows environment see the FAQ.

Getting help

How to get help on a particular R command

You can type either of

> help(log)

> ?log

to get help on the log command.

In Windows, you may also click on the Help menu. By left clicking on the R functions (text)... option, you may then enter the command you want help on. This is shown below.

How to obtain a list of the help files for all R commands type: > help.start() to start a help window using Netscape.

This is also a way to list all of the R commands. In Windows, you may also click on the Help menu and then left click on the HTML help option, as shown below.

This will also start a help window using your browser.

Other ways of getting help in R

You may also find additional information in R manuals or FAQs. These can be accessed from the Internet using the following links:

To access R manuals or FAQs directly from R in Windows, you can simply click on the Help menu, left click on the FAQ or Manuals option and choose from a list of manuals.

Note that if you do not know the command in R that you want help on, the above methods of searching the R help files for relevant information concerning a particular subject are (most of the time) not useful.

Some other ways of searching for help on a particular subject are:

Search for the particular subject in R using the Google newsgroups. Note that the keyword, R, is not particularly helpful when specified in Google because it is generic. Because Splus and R are quite similar in most cases, searching Google using Splus instead of R will most likely yield some information.

Above all, if you don't know something about R, ask someone who might know!

Mathematical calculations

You can use R for many mathematical calculations. Note that the use of brackets in the typical way is recommended, if not needed, for large mathematical expressions.

Addition/subtraction

> 5 + 5
> 10 - 2

Multiplication/division

> 10*10
> 25/5

Power multiplication

> 3 ^ 2
> 2 ^ (-2)

Note that

> 100 ^ (1/2)

is equivalent to

> sqrt(100)

Logarithms

> # Compute a natural logarithm
> log(10) > # Compute a logarithm with base 10
> log10(1000)
> # Compute a logarithm with base 2
> log2(8)
> # Compute a logarithm with base 4
> log(16,base=4)

Exponentials

> exp(1)

Trigonometric functions

Type:

> help(sin)

to see a list of all functions. Note that angles are specified in radians so if you want sin of 90 degrees, type:

> sin(pi/2)

The value NA

NA is the value R uses for a missing or undefined value. For instance, type:

> sqrt(-3)

Managing the workspace image

The workspace image includes all of the R objects that you have created in the current, and possibly other, R sessions. An R object is any vector, matrix, list, etc. that you have assigned a specific name. For instance, suppose you typed:

> x <- 1

Then x is now an object that is in the workspace image.

Listing objects from your workspace image

To view the objects in the workspace image, type:

> ls()

Removing objects from your workspace image

To remove an object, x, from the workspace image, type:

> rm(x)

To remove all objects from the workspace image, type:

> rm(list=ls())

Importing/exporting data

The following information will provide a basic introduction to importing and exporting data into and out of R. For more details, please refer to the "R Data Import/Export" manual that is found on the R Project website.

Importing data from an external ascii file

Several functions can be used to read data from an external ascii file (text file) into R and store it into an R object. However, if data is stored in some other type of format (e.g., an Excel spreadsheet), you may want to use the application in which the data is stored in (e.g., Excel) to export the data to a text file.

The function, scan, will read data into a vector or list from a console or ascii file. To see how this is done, first create a file, "file1.txt", in the directory in which you run R. In Windows, this means that you must change your working directory to the directory in which "file1.txt" is in. You can do this by simply left clicking on the File menu and selecting Change dir....

For instance, suppose the text file, "file1.txt", contains the following information on the weight of ten subjects.

195 143 210 105 177 154 123 233 118 166

To scan the data in "file1.txt" to the vector, x, type:

> x<-scan("file1.txt")

> x

Type

> help(scan)

for more information on the available options.

The function, read.table, will read an ascii file (text file) in table format and create a data frame from it. Table format means that the lines in the file correspond to cases and the columns in the file correspond to variables. A header may (or may not) be included in the file indicating the name of the variable in each column. A data frame is used with most R modeling software and is similar in nature to matrices and lists.

We can read the data from the file, "file1.txt" into a data frame, x, using the function, read.table. To see how this is done, first create a file, "file1.txt", in the directory in which you run R. For instance, suppose the text file, "file1.txt", contains the following information on the weight and age of ten subjects.

To scan the data in "file1.txt" to the vector, x, type:

> x

<- read.table("file1.txt",header=TRUE) > x

Type

> help(read.table)

for more information on the available options.

Importing data from a text file on a web page

It is likely that instead of having an ascii file saved in the directory in which you are using R, the file to be imported may exist on a webpage. Using any of the commands for importing data given previously (i.e. scan, read.table), the file can also be imported by specifying the URL as the filename.

To import this dataset to R using the URL as the filename, just type:

> x <

Exporting data

Several functions can be used to write an R object to an external ascii file (text file).

The function, write, writes out a matrix or vector in a specified number of columns. To see how this is done to the matrix, x, first put the matrix, x, into R by typing:

> x1 <- 1:10

> x2 <- 11:20

> x3 <- 21:30

> x <- cbind (x1,x2,x3)

To write the matrix, x, so that each column appears as a line in the text file, type:

> write(x,file="file1.txt",ncolumns=10)

Alternatively, if you want each row in the matrix, x, to appear as a line in the text file, type:

> write(t(x),file="file2.txt",ncolumns=3)

The function, write.table, will write a data frame (after converting it to a data frame if it isn't already one) to an external ascii file (text file). Entries in each line(row) are separated by the value of 'sep;. The default value for 'sep' is a space. This is done to the object, x2, by typing:

> x1<-1:10

> x2<-11:20

> x3<-21:30

> x<-cbind(x1,x2,x3)

> x2<-data.frame(x)

> write.table(x2,"file3.txt",quote=FALSE)

In this case, the object is written to a file, file3.txt, which sould now be in the firectory in which you run R.

Type

> help(write.table)

for more information on the available options. Note that the function, write.table, can be slow for data frames that have hundreds of columns. The function, write.matrix, in the package, 'MASS', is a more efficient way of dealing with this problem if the object being exported can be represented as a numeric matrix.

Functions

R functions are used to provide users with programmed procedures for a number of statistical and non-statistical procedures. Many functions are available in R. All basic functions are part of the BASE package. However, other functions are part of other packages that must be loaded. If you need to write your own function, this is also possible.

R functions that are available through add-on packages

Functions that are part of the BASE package are available upon invoking R. Some R functions may not be available when you first invoke R and are available by specifying different add-on packages. If you don't know if a function is part of the BASE package or not, simply type:

> help(function.name)

If help is not given, check the spelling or you have found a function that is not a part of the BASE package. Refer to the listing of packages given on the R home page to see a listing of the different add-on packages that are available. Click on any package to see a listing of the functions that are a part of that package. For more detailed information, refer to the Section 5 of the FAQ under the R Home page Documentation. Suppose you want to use the function, survreg. This function is available in the package, survival. To load the package, type:

> library(survival)

You will now be able to use the function, survreg. Another way to load the package, survival, in Windows, is by left clicking on the Packages menu. The image shown below will appear.

Left click on Load package.... This will give you a list of the different packages that you can load. Select the package, survival, and then left click on the OK button. You will now be able to use the function, survreg.

You can find out which functions a package provides by typing

> libarary(help=survival)

> help(package=survival)

How to write your own functions

Here is a simple example of how to create your own function.

standardize<-function(x)
{
# Inputs: a vector x
# Outputs: the standardized version of x
#
        m<-mean(x)
        std<-sqrt(var(x))
        result<-(x - m)/std
        return(result)
}

The function takes one argument, a vector x, and returns a vector. The lines beginning with # are comments. The last line tells the function to return the value result, i.e. the original vector x, transformed by subtracting its mean, and then dividing by its standard deviation. To invoke the function on a vector x, type

> xstand <- standardize(x)

To see the commands which make up the function, just type

> standardize

Note that there are no brackets used. If you want to create a function which returns several outputs, here is a simple example.

sqacu<-function(x)
{
# Inputs: a vector x
# Outputs: the square of x and the cube of x
#
        res1<-x^2
        res2<-x^3
        return(list("square"=res1,"cube"=res2))
}

Now, if you type:

>sqacu(2)
$square:
[1] 4

$cube:
[1] 8
>sqacu(2)$square
[1] 4

Iteration

When writing your own function, avoid iteration if you can; take advantage of R vectorized math and functions such as apply. It successively applies the function of your choice to each row (or column) of a matrix. Let's create a simple matrix and use apply to find the mean of each row/column.

> x <- matrix(1:12,3,4)
> apply(x,2,mean) #returns the mean of each column.
> apply(x,1,mean) #returns the mean of each row

Sometimes, iteration can not be avoided. The R commands, for or while , are useful in this situation. Here is an example of using for inside a function

jsum<-function(x)
{
        jsum <- 0
        for(i in 1:length(x)) 
        {
                jsum <- jsum + x[i]
        }
        return(jsum)
}

Note that R has its own function that performs this task, called sum. It will work much faster than this one, especially on large vectors.

Distributions

Data from different distributions can be easily calculated or simulated using R. The functions are named such that the first letter states what it calculates or simulates (d=density function, p=distribution function, q=quantile, r=random generation) and the last part of the function's name specifies the distribution (beta=beta, chisq = chi-squared, exp=exponential, f=f, gamma=gamma, logis=logistic, norm=normal, t=student t, unif = uniform, weibull=weibull, binom=binomial, nbinom=negative binomial, pois=poisson). For instance, the function, qnorm, returns the quantiles of the normal distribution.

Calculating the probability density function

To calculate the value of the p.d.f. for a N(2,25) using the quantile, x, type:

> dnorm(x,mean=2,sd=5)

Calculating the cumulative density function

To calculate the value of the c.d.f. for a N(2,25) using the quantile, x, type:

> pnorm(x,mean=2,sd=5)

Determining a quantile

To calculate the quantile associated with a N(2,25) using the probability, x, type:

> qnorm(x,mean=2,sd=5)

Generating a random value from a distribution

To generate 10 random values from a N(2,25), type:

> rnorm(10,mean=2,sd=5)

Graphics

Opening a graphics window

To open/close a graphics device :

> X11()      # open a graphics device in R
> dev.off()  # close a graphic device in R

Note that you do not need to open a graphics device to use a plotting function. An X11() graphics device is opened automatically when a plotting function is called.

Common types of graphs

There are several high level plot functions that can be used to plot graphs. For instance, a histogram can be made by typing:

> x <- rnorm(1000)
> hist(x)

To plot two R objects against one another (called a scatterplot), type:

> x <- 1:10
> y <- seq(from=1,to=20,by=2)
> plot(x,y)

Plotting functions, like hist or plot, have many features that can be changed to accomodate the type of plot that you want. For instance, suppose you wanted a line plot instead of a scatter plot with the x-axis labelled as "X Values". Type:

> plot(x,y,type="l",xlab="X Values")

See the individual help files for more information.

You may also want to add other plots to an existing plot (i.e. overlay a plot).

To add a straight line with y-intercept, 5, and slope, 1, to the above plot, type:
```
> plot(x,y,type="l",xlab="X Values")
> abline(5,1)
```

To add a line plot to the above plot, type:

> plot(x,y,type="l",xlab="X Values")
> x1<-1:10
> y1<-rep(c(5,15),5)
> lines(x1,y1)

To add points to the above plot, type:

> plot(x,y,type="l",xlab="X Values")
> x1<-1:10
> y1<-rep(c(5,15),5)
> points(x1,y1)

Note that when using the functions, lines or points, any graphical features on the original plot (i.e. title, axis labels, axis ranges) can not be changed without re-running the original plot. This forces the values specified for the x and y coordinates in the functions, lines or points, to lie within the range of the original plot.

Changing graphical features

Graphical plots, like 'hist' or 'plot', have many additional graphical features that aren't given specifically in the help files. For instance, type:

> help(plot)

to give the following help file.

plot                  package:base                  R Documentation

Generic X-Y Plotting

Description:

     Generic function for plotting of R objects.  For more details
     about the graphical parameter arguments, see `par'.

Usage:

     plot(x, y, xlim=range(x), ylim=range(y), type="p",
          main, xlab, ylab, ...)

Arguments:

       x: the coordinates of points in the plot. Alternatively, a
          single plotting structure, function or any R object with a
          `plot' method can be provided.

       y: the y coordinates of points in the plot, optional if `x' is
          an appropriate structure.

xlim, ylim: the ranges to be encompassed by the x and y axes.

    type: what type of plot should be drawn.  Possible types are

             *  `"p"' for points,

             *  `"l"' for lines,

             *  `"b"' for both,

             *  `"c"' for the lines part alone of `"b"',

             *  `"o"' for both ``overplotted'',

             *  `"h"' for ``histogram'' like (or ``high-density'')
                vertical lines,

             *  `"s"' for stair steps,

             *  `"S"' for other steps, see Details below,

             *  `"n"' for no plotting.

          All other `type's give a warning or an error; using, e.g.,
          `type = "punkte"' being equivalent to `type = "p"' for S
          compatibility.

    main: an overall title for the plot.

    xlab: a title for the x axis.

    ylab: a title for the y axis.

     ...: graphical parameters can be given as arguments to `plot'. 

    .

    .
    
    .

    etc.

The blue highlighted area indicates that there are other graphical parameters that can be given as arguments. To see these, type:

> help(par)

There are two ways to set the graphical parameters given by 'par'.

To set graphical parmaeters that apply to all plots, set the features using arguments to 'par'. The only arguments that can be set this way are:
```
*  `"ask"'

*  `"fig"', `"fin"'

*  `"mai"', `"mar"', `"mex"'

*  `"mfrow"', `"mfcol"', `"mfg"'

*  `"new"'

*  `"oma"', `"omd"', `"omi"'

*  `"pin"', `"plt"', `"ps"', `"pty"'

*  `"usr"'

*  `"xlog"', `"ylog"'
```
Here is an example. A common parameter that you may want to set is to graph several plots on one page. Suppose you wanted 6 plots on one page, with 2 plots per line (a total of 3 lines), and the plots are placed on the graph by filling each row in order. The you would type:
```
> par(mfrow=c(3,2))
> plot(1:10)
> title("Plot 1")
> plot(10:100)
> title("Plot 2")
> plot(5:7)
> title("Plot 3")
> plot(50:75)
> title("Plot 4")
> plot(22:33)
> title("Plot 5")
> plot(1000:10000)
> title("Plot 6")
```
If you wanted to have 6 plots on one page, like above, but instead place the graphs by filling each column in order, then you would use
```
> par(mfcol=c(3,2))
```
To place each of the 6 plots in locations that don't follow an ordering given by 'mfrow' or 'mfcol', first set the number of plots on each page using 'mfrow' or 'mfcol' and then use 'par(mfg=c(a,b))' preceding each individual plotting function to specify the position that you want the plot graphed.
To set graphical parameters that apply to specific high level plots, set the features within the plotting function itself. For instance, suppose you wanted to change the plotting symbol in a scatterplot to be the character 's'. Then you would type:
```
> plot(1:10,pch="s")
```
Note that the graphical parameters that are set using 'par' (described in the previous point) can not be set using this method.

Saving/printing graphs

Simple methods for saving/printing graphs in R are given below. For more detailed information about saving/printing in R, refer to Section 5.2 in the R tip sheet.

Generate a plot using any of the plotting functions in R. For instance, type
```
> plot(1:10) 
```
To save this plot to a postscript file called, graph1.ps, type
```
> dev.print(postscript,file="graph1.ps") 
```
The file, graph1.ps, is saved in the directory in which R is invoked. Other graphics devices (i.e. x11, jpeg, etc.) can be specified. Type
```
> help(Devices) 
```
for a list of the available devices. When the graph is saved to an appropriate file, it can then be printed.