R (programming language)

Summary

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.[8]

R
R terminal
ParadigmsMulti-paradigm: procedural, object-oriented, functional, reflective, imperative, array[1]
Designed byRoss Ihaka and Robert Gentleman
DeveloperR Core Team
First appearedAugust 1993; 30 years ago (1993-08)
Stable release
4.3.3[2] Edit this on Wikidata / 29 February 2024; 18 days ago (29 February 2024)
Typing disciplineDynamic
Platformarm64 and x86-64
LicenseGNU GPL v2[3]
Filename extensions
  • .r[4]
  • .rdata
  • .rhistory
  • .rds
  • .rda[5]
Websitewww.r-project.org Edit this at Wikidata
Influenced by
Influenced
Julia[7]
  • R Programming at Wikibooks

The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.

R software is open-source and free software. It is licensed by the GNU Project and available under the GNU General Public License.[3] It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.

As an interpreted language, R has a native command line interface. Moreover, multiple third-party graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.

History edit

 
Ross Ihaka, co-originator of R

R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.[9] The language was inspired by the S programming language, with most S programs able to run unaltered in R.[6] The language was also inspired by Scheme's lexical scoping, allowing for local variables.[1]

The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[10] In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website. At the same time, they announced the posting on the s-news mailing list.[11] On December 5, 1997, R became a GNU project when version 0.60 was released.[12] On February 29, 2000, the first official 1.0 version was released.[13]

Packages edit

 
Violin plot created from the R visualization package ggplot2

R packages are collections of functions, documentation, and data that expand R.[14] For example, packages add report features such as RMarkdown, knitr and Sweave. Easy package installation and use have contributed to the language's adoption in data science.[15]

The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Fritz Leisch to host R's source code, executable files, documentation, and user-created packages.[16] Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.[16] CRAN originally had three mirrors and 12 contributed packages.[17] As of December 2022, it has 103 mirrors[18] and 18,976 contributed packages.[19] Packages are also available on repositories R-Forge, Omegahat, and GitHub.

The Task Views on the CRAN website lists packages in fields such as finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.

The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and high-throughput sequencing methods.

Packages add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, time-series analysis, and clustering.

The tidyverse package is organized to have a common interface around accessing and processing data contained in the data frame data structure, a two-dimensional table of rows and columns. Each function in the package is designed to couple together all the other functions in the package.[14]

Installing a package occurs only once. To install tidyverse:[14]

> install.packages( "tidyverse" )

To instantiate the functions, data, and documentation of a package, execute the library() function. To instantiate tidyverse:[a]

> library( tidyverse )

Interfaces edit

R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and Tinn-R.

General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.

Editors that support R include Emacs, Vim via the Nvim-R plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).

Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).

General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).

Statistical frameworks which use R in the background include Jamovi and JASP.

Community edit

The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.

The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.

The R community hosts many conferences and in-person meetups. These groups include:

  • UseR!: an annual international R user conference (website)
  • Directions in Statistical Computing (DSC) (website)
  • R-Ladies: an organization to promote gender diversity in the R community (website)
  • SatRdays: R-focused conferences held on Saturdays (website)
  • R Conference (website)
  • posit::conf (formerly known as rstudio::conf) (website)

Implementations edit

The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:

Microsoft R Open (MRO) was a R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution.[22]

Commercial support edit

Although R is an open-source project, some companies provide commercial support:

  • Revolution Analytics provides commercial support for Revolution R.
  • Oracle provides commercial support for the Big Data Appliance, which integrates R into its other products.
  • IBM provides commercial support for in-Hadoop execution of R.

Examples edit

Basic syntax edit

The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[23])

In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[24]

> x <- 1:6 # Create a numeric vector in the current environment
> y <- x^2 # Create vector based on the values in x.
> print(y) # Print the vector’s contents.
[1]  1  4  9 16 25 36

> z <- x + y # Create a new vector that is the sum of x and y
> z # Return the contents of z to the current environment.
[1]  2  6 12 20 30 42

> z_matrix <- matrix(z, nrow=3) # Create a new matrix that turns the vector z into a 3x2 matrix object
> z_matrix 
     [,1] [,2]
[1,]    2   20
[2,]    6   30
[3,]   12   42

> 2*t(z_matrix)-2 # Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal.
     [,1] [,2] [,3]
[1,]    2   10   22
[2,]   38   58   82

> new_df <- data.frame(t(z_matrix), row.names=c('A','B')) # Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'
> names(new_df) <- c('X','Y','Z') # Set the column names of new_df as X, Y, and Z.
> print(new_df)  # Print the current results.
   X  Y  Z
A  2  6 12
B 20 30 42

> new_df$Z # Output the Z column
[1] 12 42

> new_df$Z==new_df['Z'] && new_df[3]==new_df$Z # The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. 
[1] TRUE

> attributes(new_df) # Print attributes information about the new_df object
$names
[1] "X" "Y" "Z"

$row.names
[1] "A" "B"

$class
[1] "data.frame"

> attributes(new_df)$row.names <- c('one','two') # Access and then change the row.names attribute; can also be done using rownames()
> new_df
     X  Y  Z
one  2  6 12
two 20 30 42

Structure of a function edit

One of R's strengths is the ease of creating new functions.[25] Objects in the function body remain local to the function, and any data type may be returned.

Create a function:

# The input parameters are x and y.
# The function returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y

  # this return() statement is optional
  return(z)
}

Usage output:

> f(1, 2)
[1] 11

> f(c(1,2,3), c(5,3,4))
[1] 23 18 25

> f(1:3, 4)
[1] 19 22 25

Modeling and plotting edit

 
Diagnostic plots from plotting “model” (q.v. “plot.lm()” function). Notice the mathematical notation allowed in labels (lower left plot).

The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.

# Create x and y values
x <- 1:6
y <- x^2

# Linear regression model y = A + B * x
model <- lm(y ~ x)

# Display an in-depth summary of the model
summary(model)

# Create a 2 by 2 layout for figures
par(mfrow = c(2, 2))

# Output diagnostic plots of the model
plot(model)

Output:

Residuals:
      1       2       3       4       5       6       7       8      9      10
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -9.3333     2.8441  -3.282 0.030453 * 
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583, Adjusted R-squared:  0.9478
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

Mandelbrot set edit

 
"Mandelbrot.gif" graphic created in R

This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.

Install the package that provides the write.gif() function beforehand:

install.packages("caTools")

R Source code:

library(caTools)

jet.colors <-
    colorRampPalette(
        c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
          "white", "#FF7F00", "red", "#7F0000"))

dx <- 1500 # define width
dy <- 1400 # define height

C  <-
    complex(
        real =
            rep(
                seq(-2.2, 1.0, length.out = dx), each = dy),
                imag = rep(seq(-1.2, 1.2, length.out = dy),
                dx))

# reshape as matrix of complex numbers
C <- matrix(C, dy, dx)

# initialize output 3D array
X <- array(0, c(dy, dx, 20))

Z <- 0

# loop with 20 iterations
for (k in 1:20) {

  # the central difference equation
  Z <- Z^2 + C

  # capture the results
  X[, , k] <- exp(-abs(Z))
}

write.gif(
    X,
    "Mandelbrot.gif",
    col = jet.colors,
    delay = 100)

See also edit

Further reading edit

  • Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN 978-1-4920-9740-2.
  • Gagolewski, Marek (2024). Deep R Programming. doi:10.5281/ZENODO.7490464. ISBN 978-0-6455719-2-9.

External links edit

  • R Technical Papers
  • Free Software Foundation
  • R FAQ
  • Big Book of R, curated list of R-related programming books
  • Books Related to R - R Project, partially annotated list of books that are related to S or R and may be useful to the R user community

Portal edit

Notes edit

  1. ^ This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display two errors showing conflict. The errors may be ignored.

References edit

  1. ^ a b c Morandat, Frances; Hill, Brandon; Osvald, Leo; Vitek, Jan (11 June 2012). "Evaluating the design of the R language: objects and functions for data analysis". European Conference on Object-Oriented Programming. 2012: 104–131. doi:10.1007/978-3-642-31057-7_6. Retrieved 17 May 2016 – via SpringerLink.
  2. ^ Peter Dalgaard (29 February 2024). "R 4.3.3 is released". Retrieved 1 March 2024.
  3. ^ a b "R - Free Software Directory". directory.fsf.org. Retrieved 26 January 2024.
  4. ^ "R scripts". mercury.webster.edu. Retrieved 17 July 2021.
  5. ^ "R Data Format Family (.rdata, .rda)". Loc.gov. 9 June 2017. Retrieved 17 July 2021.
  6. ^ a b Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
  7. ^ "Introduction". The Julia Manual. Archived from the original on 20 June 2018. Retrieved 5 August 2018.
  8. ^ Giorgi, Federico M.; Ceraolo, Carmine; Mercatelli, Daniele (27 April 2022). "The R Language: An Engine for Bioinformatics and Data Science". Life. 12 (5): 648. Bibcode:2022Life...12..648G. doi:10.3390/life12050648. PMC 9148156. PMID 35629316.
  9. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 12. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022. We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
  10. ^ Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 2.13 What is the R Foundation?. Archived from the original on 28 December 2022. Retrieved 28 December 2022.
  11. ^ Ihaka, Ross. "R: Past and Future History" (PDF). p. 4. Archived (PDF) from the original on 28 December 2022. Retrieved 28 December 2022.
  12. ^ Ihaka, Ross (5 December 1997). "New R Version for Unix". stat.ethz.ch. Archived from the original on 12 February 2023. Retrieved 12 February 2023.
  13. ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 18. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
  14. ^ a b c Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN 978-1-492-09740-2.
  15. ^ Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN 2073-4859. The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
  16. ^ a b Hornik, Kurt (2012). "The Comprehensive R Archive Network". WIREs Computational Statistics. 4 (4): 394–398. doi:10.1002/wics.1212. ISSN 1939-5108. S2CID 62231320.
  17. ^ Kurt Hornik (23 April 1997). "Announce: CRAN". r-help. Wikidata Q101068595..
  18. ^ "The Status of CRAN Mirrors". cran.r-project.org. Retrieved 30 December 2022.
  19. ^ "CRAN - Contributed Packages". cran.r-project.org. Retrieved 29 December 2022.
  20. ^ Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN 9781450311823. S2CID 1989369.
  21. ^ Jackson, Joab (16 May 2013). TIBCO offers free R to the enterprise. PC World. Retrieved 20 July 2015.
  22. ^ "Looking to the future for R in Azure SQL and SQL Server". 30 June 2021. Retrieved 7 November 2021.
  23. ^ "An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics" (PDF). Retrieved 3 January 2021.
  24. ^ R Development Core Team. "Assignments with the = Operator". Retrieved 11 September 2018.
  25. ^ Kabacoff, Robert (2012). "Quick-R: User-Defined Functions". statmethods.net. Retrieved 28 September 2018.