R (programming language)

From Verify.Wiki
Jump to: navigation, search

R is a software platform that provides statistical data analysis and visualization capabilities. Initial development was done by Ross Ihaka and Robert Gentleman and currently it is developed by the R core team. The software is freely available, and it runs on major operating systems like Windows, Linux, and Mac OS. [1] R has established a reputation as an important tool for statistical modelling, data visualization, data mining and machine learning. The R language incorporates all of the standard statistical tests, models, graphics and analyses, as well as providing a comprehensive language for managing and manipulating data. Leading researchers in data science are widely using R in academia and software development. R is a GNU project which can be considered as a different implementation of S.

History

1970 S was developed by John Chambers while working at Bell labs.

1993 Initial development by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand as an implementation of the S programming language began.

1995 Source code was released under the GNU license.

1997 The R core development team was formed. [2]

Features

Average Programmer Salaries

Country Average Salary Years of Experience
USA 115,000(US$)[3] 5
UK 57,500(UK£)[4] 2-5

Strengths

  • R is open source and freely available software.
  • R implement a wide variety of statistical and graphical techniques including classical statistical tests, linear and nonlinear modeling, time-series analysis, classification, clustering, and others.
  • R provides a very wide variety of graphics for visualizing data. These capabilities are found in the base language and in specialized packages like ggplot2, vcd and scatterplot3d.
  • R has a large number of packages that virtually support any statistical technique and the R community is noted for its active contributions in terms of packages.
  • R is able to consume data from multiple systems like Excel, SPSS, Stata, SAS and relational databases
  • R runs on mostly used operating systems like Windows, Linux, and Mac OS. It is also supported on 32 and 64 bit systems.
  • R has a vibrant community that offers support and commercial support is also available.
  • There are many learning materials available freely or at a cost. [5]
  • R has stronger object-oriented programming facilities than most statistical computing languages which is inherited from S. Extending R is also eased by its lexical scoping rules. [6]

Weaknesses

  • R is difficult to learn for users without any computer programming background
  • The documentation of R may be difficult to understand for a person without a good statistical training. [7]
  • Managing large data-sets can be problematic because R stores its objects in memory. However, there are some packages that can remedy this by storing data on hard drive.
  • Some packages have a quality deficiency. However if a package is useful to many people, it will quickly evolve into a very robust product through collaborative efforts.
  • R lacks in speed and efficiency due to its design principles that are outdated.

Criticism

Although R is the most comprehensive statistical analysis package available. [8] some people believe R as an accessible language is not for advanced programmers " Mat Adams says."I wouldn't even say R is for programmers. It's best suited for people that have data-oriented problems they're trying to solve, regardless of their programming aptitude,".

Syntax

The following examples illustrate the basic syntax of the language and use of the command-line interface.

Basic syntax

The following examples illustrate the basic syntax of the language and plot a 3D Surface.

install.packages("rgl")			# installing external package			
library(rgl)				# calling external package provide "rgl.surface" function
data(volcano)
z <- 2 * volcano 			# Exaggerate the relief
x <- 10 * (1:nrow(z)) 			# 10 meter spacing (S to N)
y <- 10 * (1:ncol(z)) 			# 10 meter spacing (E to W)
zlim <- range(z)
zlen <- zlim[2] - zlim[1] + 1
colorlut <-terrain.colors(zlen,alpha=0) # height color lookup table
col <- colorlut[ z-zlim[1]+1 ] 		# assign colors to heights for each point
open3d()
rgl.surface(x, y, z, color=col, alpha=0.75, back="lines")

"Hello World" Example

print("Hello World!")

Examples of R in use

  • facebook used R to analyze and visualize updates of their users. [9]
  • Google uses R to analyze massive data to optimize advert placement. [10]
  • ANZ Bank used R to model mortgage loss. [11]
  • FDA uses R for internal use and has approved its use for clinical trials it regulates. [12]
  • Merck uses R for clinical trial design and data analysis. [13]
  • Zillow uses R for analytical purposes to provide well detailed information. [14]

Feature Comparison Chart [15]

Feature R Python SAS SPSS STATA
Outlier diagnostics Available Available Available Available Available
Generalized linear models Available Available Available Available Available
Univariate time series analysis Available Available Available Limited Available
Multivariate time series analysis Available Available Available
Cluster analysis Available Available Available Available Available
Discriminant analysis Available Available Available Available Available
Neural networks Available Available Available Limited
Classification and regression trees Available Available Available Limited
Random forests Available Available Limited
Support vector machines Available Available Available

Factor and principal component analysis

Available Available Available Available Available
Boosting Classification & Regression Trees Available Available Limited
Nearest neighbor analysis Available Available Available Available

Top Companies Providing R Solutions

Revolution analytics [16] a Microsoft company, provides commercial analytics solutions based on R.

Mango solutions provides training, consultancy and support for R. [17]

MicroStrategy Data Mining Services [18], a fully integrated component of the MicroStrategy BI platform that delivers the results of predictive models to all users in familiar, highly formatted, and interactive reports and documents. Also, deploy any R Analytic in MicroStrategy Visualizations with the New R Integration Pack.

Quadbase[19], provides software and services for data visualization, BI dashboards, reporting, R programming and predictive analytics.

simMachines [20] , provides the R-01 similarity search (k-nearest neighbor) engine, with high speed and zero tuning. We are the Berkeley DB of the Big Data era.

Text Analysis International [21], offers tools and services for natural language processing and information extraction, building on the VisualText(TM) IDE and NLP++(R) programming language.

The future of R

The popularity of R as an analytics platform continues to grow. The number of analytics jobs posted on indeed.com showed demand for R skills was higher than that of SPSS, Matlab, Minitab and stata. Demand for SAS skills was higher than that of R but predictions show R will catch up in a few years. Data from Google scholar shows SPSS is the mostly used software ahead of SAS and R. However R and stata are closing in on the gap. On software discussion forums Linkedin and Quora, R topic followers outnumbered those following SAS, SPSS and Stata. A 2015 survey of data scientists by Rexer Analytics showed R was the most popular software. [22]

Top 5 Recent Tweets

Date Author Tweet
11 Dec 2015 @Bbl_Astrophyscs And the #Rangers strike again! Quantitative analyst position this time. STEM background, R programming. Not bad!
11 Dec 2015 @R_Programming R Tip: Visualy asses clustering tendency of data with dissplot{seriation} #rstats #analytics http://rstatistics.net
11 Dec 2015 @cbinsa Career Portals Ss r learning programming by designing their own digital game using Construct 2 software. #hgmsteach
11 Dec 2015 @analyticbridge How to: Parallel Programming in R and Python [Video] http://ow.ly/VA2Vd
11 Dec 2015 @Rbloggers New R job: R Programming for a Daily Fantasy Sports Application http://www.r-users.com/jobs/r-programming-for-a-daily-fantasy-sports-application/

Top 5 Lifetime Tweets

-
Date Author Tweet
6 Dec 2015 @analyticbridge R Programming: 35 Job Interview Questions and Answers #Rstats http://www.datasciencecentral.com/profiles/blogs/r-programming-job-interview-questions-and-answers
1 Feb 2015 @opensourceway As demand for data scientists grows, companies are turning to open source programming language R: http://red.ht/15s6Aqt
24 January 2015 @DrQz #Microsoft to acquire Revolution Analytics, heavily embracing the R programming language & tools http://www.wired.com/2015/01/microsoft-acquires-open-source-data-science-company-revolution-analytics/ … #rstats #marketbuzz
5 Feb 2014 @kdnuggets An alternative to R and #Python: Julia: A High-Performance Programming Language for #DataScience and more http://buff.ly/1c5bcPe
23 January 20154 @mrb_bk R is an interesting program language that slightly changes my point of view about programming languages.

References

  1. https://www.r-project.org/about.html
  2. https://cran.r-project.org/doc/html/interface98-paper/paper_2.html
  3. http://marketing.dice.com/pdf/Dice_TechSalarySurvey_2015.pdf
  4. http://www.itjobswatch.co.uk/jobs/uk/r.do
  5. http://analyticstrainings.com/?p=101
  6. http://web.archive.org/web/20060721143309/http://polmeth.wustl.edu/tpm/tpm_v11_n2.pdf
  7. http://analyticstrainings.com/?p=101
  8. http://analyticstrainings.com/?p=101
  9. http://www.r-bloggers.com/analysis-of-facebook-status-updates/
  10. http://blog.revolutionanalytics.com/2011/08/google-r-effective-ads.html
  11. http://blog.revolutionanalytics.com/2011/08/how-anz-uses-r-for-credit-risk-analysis.html
  12. http://blog.revolutionanalytics.com/2012/06/fda-r-ok.html
  13. http://www.revolutionanalytics.com/content/merck-optimizes-clinical-drug-development-revolution-analytics-gsdesign-explorer
  14. http://conferences.oreilly.com/strata/stratany2012/public/schedule/detail/26345
  15. http://stanfordphd.com/Statistical_Software.html
  16. http://www.revolutionanalytics.com/
  17. http://www.mango-solutions.com/wp/
  18. http://www.microstrategy.com/us
  19. http://www.quadbase.com/
  20. http://simmachines.com/
  21. http://www.revolutionanalytics.com/what-r
  22. http://r4stats.com/articles/popularity/

Verification history