University of Washington STATZST下IC3 STAT 425 Introduction to Nonparametric Statistics Introduction/Review of R Fritz Scholz Spring Quarter 2009 March 22,2009
STAT 425 Introduction to Nonparametric Statistics Introduction/Review of R Fritz Scholz Spring Quarter 2009 March 22, 2009
Purpose The following slides are intended as review of or as a basic introduction to the statistical analysis platform R.They are by no means exhaustive! The class web page contains links to more extensive introductions and they should be consulted when these slides raise questions. The internal documentation accompanying R via Help on the R toolbar is another resource.More on Help later. Also,it is always useful to experiment with certain command ideas to see and understand/interpret what happens. Start right away as you reread these slides to get a feel. We will be using R extensively throughout this course. 1
Purpose The following slides are intended as review of or as a basic introduction to the statistical analysis platform R. They are by no means exhaustive! The class web page contains links to more extensive introductions and they should be consulted when these slides raise questions. The internal documentation accompanying R via Help on the R toolbar is another resource. More on Help later. Also, it is always useful to experiment with certain command ideas to see and understand/interpret what happens. Start right away as you reread these slides to get a feel. We will be using R extensively throughout this course. 1
Statistical Analysis Platform Freely available from http://cran.r-project.org/ See also http://en.wikipedia.org/wiki/R_(programming-language) For Windows version:at CRAN site→Windows(95 and later))→base Download R-2.8.1-win32.exe or latest version to your desktop. For installation double-click on this executable program R-2.8.1-win32.exe and follow the instructions(use defaults when prompted). This creates a blue R icon on your desktop. Double-clicking this R icon opens up an R session. You close the session by typing q()or quit () This prompts you to save workspace image(with all changes in current session) or not (leave the workspace as it was when starting this session), or cancel(continue working in R). 2
Statistical Analysis Platform Freely available from http://cran.r-project.org/ See also http://en.wikipedia.org/wiki/R (programming language) For Windows version: at CRAN site −→ Windows (95 and later) −→ base Download R-2.8.1-win32.exe or latest version to your desktop. For installation double-click on this executable program R-2.8.1-win32.exe and follow the instructions (use defaults when prompted). This creates a blue R icon on your desktop. Double-clicking this R icon opens up an R session. You close the session by typing q()or quit(). This prompts you to save workspace image (with all changes in current session) or not (leave the workspace as it was when starting this session), or cancel (continue working in R). 2
Workspaces and Directories By default the workspace image is saved in {\tt C:\Program Files\R\R-2.8.1\.RData} It is a good idea to keep separate workspace images for different projects(HW?), otherwise the clutter will become unmanageable. Keep each separate workspace in a separate directory. To save an open workspace in a specific directory,say R-practice,click on File on the tool bar in the R work session,choose Change dir,browse to that directory R-practice and choose OK. When you quit,g(),after that change of directories,the workspace.RData is saved in that new directory. A new icon R with name RData appears in that directory. 3
Workspaces and Directories By default the workspace image is saved in {\tt C:\Program Files\R\R-2.8.1\.RData} It is a good idea to keep separate workspace images for different projects (HW?), otherwise the clutter will become unmanageable. Keep each separate workspace in a separate directory. To save an open workspace in a specific directory, say R-practice, click on File on the tool bar in the R work session, choose Change dir, browse to that directory R-practice and choose OK. When you quit, q(), after that change of directories, the workspace .RData is saved in that new directory. A new icon R with name .RData appears in that directory. 3
Starting a Session from a Directory When you have a directory containing an R icon with name.RData you can open a session using that workspace by double-clicking on that icon. To see the objects in that workspace type 1s()or objects () When there are no objects the response is character(0). If you want to start with a clean(empty)workspace you can remove all those objects by typing the command rm(list=1s ()) If you want to keep every object except specific ones,say myobject and dataset.x, you would remove them by typing rm(myobject,dataset.x) 4
Starting a Session from a Directory When you have a directory containing an R icon with name .RData you can open a session using that workspace by double-clicking on that icon. To see the objects in that workspace type ls() or objects(). When there are no objects the response is character(0). If you want to start with a clean (empty) workspace you can remove all those objects by typing the command rm(list=ls()). If you want to keep every object except specific ones, say myobject and dataset.x, you would remove them by typing rm(myobject,dataset.x). 4
Help in R If you know the name of a data or function object you just type ?that.object.name. For example:?rivers or ?median If you don't know such object names you should open the web browser based help facility in R by typing help.start() This web browser interface gives access to all R related information. It has a search engine and provides entry via keywords or by topic such as Basics,Graphics,Mathematics,Programming,Statistics. The R Reference Manual has over 1500 pages documenting available functions, operators and data sets.Resist the temptation to print it,if it was installed Access it via the R tool bar→Help→Manuals→R Reference Manual. 5
Help in R If you know the name of a data or function object you just type ?that.object.name. For example: ?rivers or ?median If you don’t know such object names you should open the web browser based help facility in R by typing help.start() This web browser interface gives access to all R related information. It has a search engine and provides entry via keywords or by topic such as Basics, Graphics, Mathematics, Programming, Statistics. The R Reference Manual has over 1500 pages documenting available functions, operators and data sets. Resist the temptation to print it, if it was installed Access it via the R tool bar −→ Help −→ Manuals −→ R Reference Manual. 5
?rivers yields rivers(datasets)R Documentation Lengths of Major North American Rivers Description This data set gives the lengths (in miles)of 141 major rivers in North America,as compiled by the US Geological Survey. Usage rivers Format A vector containing 141 observations. Source World Almanac and Book of Facts,1975,page 406. References McNeil,D.R.(1977)Interactive Data Analysis.New York:Wiley. 6
?rivers yields rivers(datasets) R Documentation Lengths of Major North American Rivers Description This data set gives the lengths (in miles) of 141 major rivers in North America, as compiled by the US Geological Survey. Usage rivers Format A vector containing 141 observations. Source World Almanac and Book of Facts, 1975, page 406. References McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley. 6
?median yields Median Value Description Compute the sample median. Usage median(x,na.rm FALSE) Arquments x an object for which a method has been defined,or a numeric vector containing the values whose median is to be computed.na.rm a logical value indicating whether NA values should be stripped before the computation proceeds. Details This is a generic function for which methods can be written.However,the default method makes use of sort and mean,both of which are generic,and so the default method will work for most classes (e.g."Date")for which a median is a reasonable concept. References Becker,R.A.,Chambers,J.M.and Wilks,A.R.(1988)The New S Language. Wadsworth Brooks/Cole. See Also quantile for general quantiles......and more 7
?median yields Median Value Description Compute the sample median. Usage median(x, na.rm = FALSE) Arguments x an object for which a method has been defined, or a numeric vector containing the values whose median is to be computed. na.rm a logical value indicating whether NA values should be stripped before the computation proceeds. Details This is a generic function for which methods can be written. However, the default method makes use of sort and mean, both of which are generic, and so the default method will work for most classes (e.g. "Date") for which a median is a reasonable concept. References Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. See Also quantile for general quantiles. ..... and more 7
Naming Conventions in R R is case sensitive.Object names(data or functions)should only contain alpha-numeric characters (A-Z,a-z,0-9)or a period. Such names cannot start with a digit. Object names can start with a period,but they are hidden when you type 1s(). This is useful when you want to define hidden or background objects. You should avoid using object names that are already used by R,such as t,c,q,T,F,1s,pt,mean,var,pi,etc..Use descriptive names. Any object that you create using such system names would mask the built-in R object.For example,pi=3 would create a new object pi in your workspace, with value 3 and not 3.141593. You get the old pi back by removing the masking pi via rm(pi)from your workspace. 8
Naming Conventions in R R is case sensitive. Object names (data or functions) should only contain alpha-numeric characters (A-Z, a-z, 0-9) or a period. Such names cannot start with a digit. Object names can start with a period, but they are hidden when you type ls(). This is useful when you want to define hidden or background objects. You should avoid using object names that are already used by R, such as t, c, q, T, F, ls, pt, mean, var, pi, etc.. Use descriptive names. Any object that you create using such system names would mask the built-in R object. For example, pi=3 would create a new object pi in your workspace, with value 3 and not 3.141593. You get the old pi back by removing the masking pi via rm(pi) from your workspace. 8
Basic Usage of R We can use R as an oversized scientific calculator. >5+6*3 [1]23 >exp(1og(10)) [1]10 pi [1]3.141593 sin(pi) [1]1.224606e-16 practically zero sin(2)2+cos(2)2 text after is treated as a comment [1]1 and is not executed 1/Inf [1]0 Inf Inf stands for infinity [1]Inf and operations with it will yield sensible results exp(-Inf) [1]0 exp(Inf) [1]Inf 9
Basic Usage of R We can use R as an oversized scientific calculator. > 5+6*3 [1] 23 > exp(log(10)) [1] 10 > pi [1] 3.141593 > sin(pi) [1] 1.224606e-16 # practically zero > sin(2)ˆ2+cos(2)ˆ2 # text after # is treated as a comment [1] 1 # and is not executed > 1/Inf [1] 0 > Inf # Inf stands for infinity [1] Inf # and operations with it will yield sensible results > exp(-Inf) [1] 0 > exp(Inf) [1] Inf 9