R (programming language)
R is a software platform that provides statistical data analysis and visualization capabilities. Initial development was done by Ross Ihaka and Robert Gentleman and currently it is developed by the R core team. The software is freely available, and it runs on major operating systems like Windows, Linux, and Mac OS. ^{[1]} R has established a reputation as an important tool for statistical modelling, data visualization, data mining and machine learning. The R language incorporates all of the standard statistical tests, models, graphics and analyses, as well as providing a comprehensive language for managing and manipulating data. Leading researchers in data science are widely using R in academia and software development. R is a GNU project which can be considered as a different implementation of S.
Contents
History
1970 S was developed by John Chambers while working at Bell labs.
1993 Initial development by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand as an implementation of the S programming language began.
1995 Source code was released under the GNU license.
1997 The R core development team was formed. ^{[2]}
Features
Average Programmer Salaries
Country | Average Salary | Years of Experience |
---|---|---|
USA | 115,000(US$)^{[3]} | 5 |
UK | 57,500(UK£)^{[4]} | 2-5 |
Strengths
- R is open source and freely available software.
- R implement a wide variety of statistical and graphical techniques including classical statistical tests, linear and nonlinear modeling, time-series analysis, classification, clustering, and others.
- R provides a very wide variety of graphics for visualizing data. These capabilities are found in the base language and in specialized packages like ggplot2, vcd and scatterplot3d.
- R has a large number of packages that virtually support any statistical technique and the R community is noted for its active contributions in terms of packages.
- R is able to consume data from multiple systems like Excel, SPSS, Stata, SAS and relational databases
- R runs on mostly used operating systems like Windows, Linux, and Mac OS. It is also supported on 32 and 64 bit systems.
- R has a vibrant community that offers support and commercial support is also available.
- There are many learning materials available freely or at a cost. ^{[5]}
- R has stronger object-oriented programming facilities than most statistical computing languages which is inherited from S. Extending R is also eased by its lexical scoping rules. ^{[6]}
Weaknesses
- R is difficult to learn for users without any computer programming background
- The documentation of R may be difficult to understand for a person without a good statistical training. ^{[7]}
- Managing large data-sets can be problematic because R stores its objects in memory. However, there are some packages that can remedy this by storing data on hard drive.
- Some packages have a quality deficiency. However if a package is useful to many people, it will quickly evolve into a very robust product through collaborative efforts.
- R lacks in speed and efficiency due to its design principles that are outdated.
Criticism
Although R is the most comprehensive statistical analysis package available. ^{[8]} some people believe R as an accessible language is not for advanced programmers " Mat Adams says."I wouldn't even say R is for programmers. It's best suited for people that have data-oriented problems they're trying to solve, regardless of their programming aptitude,".
Syntax
The following examples illustrate the basic syntax of the language and use of the command-line interface.
Basic syntax
The following examples illustrate the basic syntax of the language and plot a 3D Surface.
install.packages("rgl") # installing external package
library(rgl) # calling external package provide "rgl.surface" function
data(volcano)
z <- 2 * volcano # Exaggerate the relief
x <- 10 * (1:nrow(z)) # 10 meter spacing (S to N)
y <- 10 * (1:ncol(z)) # 10 meter spacing (E to W)
zlim <- range(z)
zlen <- zlim[2] - zlim[1] + 1
colorlut <-terrain.colors(zlen,alpha=0) # height color lookup table
col <- colorlut[ z-zlim[1]+1 ] # assign colors to heights for each point
open3d()
rgl.surface(x, y, z, color=col, alpha=0.75, back="lines")
"Hello World" Example
print("Hello World!")
Examples of R in use
- facebook used R to analyze and visualize updates of their users. ^{[9]}
- Google uses R to analyze massive data to optimize advert placement. ^{[10]}
- ANZ Bank used R to model mortgage loss. ^{[11]}
- FDA uses R for internal use and has approved its use for clinical trials it regulates. ^{[12]}
- Merck uses R for clinical trial design and data analysis. ^{[13]}
- Zillow uses R for analytical purposes to provide well detailed information. ^{[14]}
Feature Comparison Chart ^{[15]}
Feature | R | Python | SAS | SPSS | STATA |
---|---|---|---|---|---|
Outlier diagnostics | Available | Available | Available | Available | Available |
Generalized linear models | Available | Available | Available | Available | Available |
Univariate time series analysis | Available | Available | Available | Limited | Available |
Multivariate time series analysis | Available | Available | Available | ||
Cluster analysis | Available | Available | Available | Available | Available |
Discriminant analysis | Available | Available | Available | Available | Available |
Neural networks | Available | Available | Available | Limited | |
Classification and regression trees | Available | Available | Available | Limited | |
Random forests | Available | Available | Limited | ||
Support vector machines | Available | Available | Available | ||
Factor and principal component analysis |
Available | Available | Available | Available | Available |
Boosting Classification & Regression Trees | Available | Available | Limited | ||
Nearest neighbor analysis | Available | Available | Available | Available |
Top Companies Providing R Solutions
Revolution analytics ^{[16]} a Microsoft company, provides commercial analytics solutions based on R.
Mango solutions provides training, consultancy and support for R. ^{[17]}
MicroStrategy Data Mining Services ^{[18]}, a fully integrated component of the MicroStrategy BI platform that delivers the results of predictive models to all users in familiar, highly formatted, and interactive reports and documents. Also, deploy any R Analytic in MicroStrategy Visualizations with the New R Integration Pack.
Quadbase^{[19]}, provides software and services for data visualization, BI dashboards, reporting, R programming and predictive analytics.
simMachines ^{[20]} , provides the R-01 similarity search (k-nearest neighbor) engine, with high speed and zero tuning. We are the Berkeley DB of the Big Data era.
Text Analysis International ^{[21]}, offers tools and services for natural language processing and information extraction, building on the VisualText(TM) IDE and NLP++(R) programming language.
The future of R
The popularity of R as an analytics platform continues to grow. The number of analytics jobs posted on indeed.com showed demand for R skills was higher than that of SPSS, Matlab, Minitab and stata. Demand for SAS skills was higher than that of R but predictions show R will catch up in a few years. Data from Google scholar shows SPSS is the mostly used software ahead of SAS and R. However R and stata are closing in on the gap. On software discussion forums Linkedin and Quora, R topic followers outnumbered those following SAS, SPSS and Stata. A 2015 survey of data scientists by Rexer Analytics showed R was the most popular software. ^{[22]}
Top 5 Recent Tweets
Date | Author | Tweet |
---|---|---|
11 Dec 2015 | @Bbl_Astrophyscs | And the #Rangers strike again! Quantitative analyst position this time. STEM background, R programming. Not bad! |
11 Dec 2015 | @R_Programming | R Tip: Visualy asses clustering tendency of data with dissplot{seriation} #rstats #analytics http://rstatistics.net |
11 Dec 2015 | @cbinsa | Career Portals Ss r learning programming by designing their own digital game using Construct 2 software. #hgmsteach |
11 Dec 2015 | @analyticbridge | How to: Parallel Programming in R and Python [Video] http://ow.ly/VA2Vd |
11 Dec 2015 | @Rbloggers | New R job: R Programming for a Daily Fantasy Sports Application http://www.r-users.com/jobs/r-programming-for-a-daily-fantasy-sports-application/ |
Top 5 Lifetime Tweets
-Date | Author | Tweet |
---|---|---|
6 Dec 2015 | @analyticbridge | R Programming: 35 Job Interview Questions and Answers #Rstats http://www.datasciencecentral.com/profiles/blogs/r-programming-job-interview-questions-and-answers … |
1 Feb 2015 | @opensourceway | As demand for data scientists grows, companies are turning to open source programming language R: http://red.ht/15s6Aqt |
24 January 2015 | @DrQz | #Microsoft to acquire Revolution Analytics, heavily embracing the R programming language & tools http://www.wired.com/2015/01/microsoft-acquires-open-source-data-science-company-revolution-analytics/ … #rstats #marketbuzz |
5 Feb 2014 | @kdnuggets | An alternative to R and #Python: Julia: A High-Performance Programming Language for #DataScience and more http://buff.ly/1c5bcPe |
23 January 20154 | @mrb_bk | R is an interesting program language that slightly changes my point of view about programming languages. |
References
- ↑ https://www.r-project.org/about.html
- ↑ https://cran.r-project.org/doc/html/interface98-paper/paper_2.html
- ↑ http://marketing.dice.com/pdf/Dice_TechSalarySurvey_2015.pdf
- ↑ http://www.itjobswatch.co.uk/jobs/uk/r.do
- ↑ http://analyticstrainings.com/?p=101
- ↑ http://web.archive.org/web/20060721143309/http://polmeth.wustl.edu/tpm/tpm_v11_n2.pdf
- ↑ http://analyticstrainings.com/?p=101
- ↑ http://analyticstrainings.com/?p=101
- ↑ http://www.r-bloggers.com/analysis-of-facebook-status-updates/
- ↑ http://blog.revolutionanalytics.com/2011/08/google-r-effective-ads.html
- ↑ http://blog.revolutionanalytics.com/2011/08/how-anz-uses-r-for-credit-risk-analysis.html
- ↑ http://blog.revolutionanalytics.com/2012/06/fda-r-ok.html
- ↑ http://www.revolutionanalytics.com/content/merck-optimizes-clinical-drug-development-revolution-analytics-gsdesign-explorer
- ↑ http://conferences.oreilly.com/strata/stratany2012/public/schedule/detail/26345
- ↑ http://stanfordphd.com/Statistical_Software.html
- ↑ http://www.revolutionanalytics.com/
- ↑ http://www.mango-solutions.com/wp/
- ↑ http://www.microstrategy.com/us
- ↑ http://www.quadbase.com/
- ↑ http://simmachines.com/
- ↑ http://www.revolutionanalytics.com/what-r
- ↑ http://r4stats.com/articles/popularity/