Multiple Regression
Contents
- 1 About
- 2 The Multiple Linear Regression Model
- 3 Assumptions
- 4 Partitioning the total variability
- 5 Residuals
- 6 Parameter Estimation With Ordinary Least Squares
- 7 Degrees of Freedom
- 8 Mean Squares
- 9 Coefficient of Determination
- 10 Controversies
- 11 History
- 12 Top 5 Recent Tweets
- 13 Top 5 Recent News Headlines
- 14 Top 5 Lifetime Tweets
- 15 Top 5 Lifetime News Headlines
About
The extension of simple linear regression to multiple explanatory (or predictor variables) is known as multiple linear regression (or multivariable linear regression). Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Multiple linear regression is used for two main purposes:
- To describe the linear dependence of the response on a collection of explanatory variables .
- To predict values of the response from values of the explanatory variable , for which more data are available.
Any hyper-plane fitted through a cloud of data will deviate from each data point to greater or lesser degree. The vertical distance between a data point and the fitted line is termed a "residual". This distance is a measure of prediction error, in the sense that it is the discrepancy between the actual value of the response variable and the value predicted by the hyper-plane. Linear regression determines the best-fit hyper-plane through a scattering of data, such that the sum of squared residuals is minimized; equivalently, it minimizes the error variance. The fit is "best" in precisely that sense: the sum of squared errors is as small as possible. That is why it is also termed "Ordinary Least Squares" regression. [1]
The Multiple Linear Regression Model
Formally, consider a collection of explanatory variables and a response variable and suppose there are randomly selected subjects in an experiment. With as unknown random errors and , the multiple linear regression model is:
.
We call the unknown parameters and additionally, we assume to be constants (and not random variables). Note that usually, multiple regression is written in terms of vectors and matrices. Specifically, it is usually written:
where .
Assumptions
- The are nonrandom and measured with negligible error. Note that the vectors from the matrix can also be transformations of explanatory variables.
- is a random vector.
- For each , where is the expected value. That is, the have mean equal to 0. This can also be written as .
- For each , where is the variance. That is, the have homogeneous variance . This can also be written as .
- For each , . That is, the are uncorrelated random variables.
- It is often assumed (although not necessary) that follows a multivariate normal distribution with mean and variance with a vector of zeros with length and the identity matrix.
Notice assumptions 4 and 5 can be written compactly as .
Partitioning the total variability
From the about section, the goal of multiple regression is to estimate for all . This is accomplished by using the estimates of which are attained through a partitioning of the "total variability" of the observed response where the total variability of is the quantity . Denoting the least squares estimate of as , the process follows in two steps.
- Estimate .
- Calculate . The are called the predicted values.
Note that the partitioning of the total variability of is achieved by adding to the equation of total variability in the following way: ,
and that the quantities from equation above are given special names:
- The sum of squares total is ,
- The sum of squares regression is , and
- The sum of squares error is .
Note that some authors refer to the sum of squares regression as the The sum of squares model.
Residuals
The residual is the difference between the observed and the predicted . This is written as . Now, as , it follows that . Due to this, in order to attain quality predictions, are chosen to minimize for each .
Parameter Estimation With Ordinary Least Squares
One method for estimating the matrix of unknown parameters is through the use of Ordinary Least Squares (OLS). This is accomplished by finding values for that minimize the sum of the squared residuals: where is the matrix transpose.
OLS proceeds by taking the matrix calculus partial derivatives of with respect to . After algebra, the OLS estimates (which are also known as the normal equations) we have
Therefore, assuming that the inverse of exists, the OLS predictions are attained with the equation . Note that is unbiased estimator for as . Note also that the unbiased property of the parameter estimates follows from the assumption that .
Degrees of Freedom
For each of the sum of squares equations (from the partitioning of total variability section), there are related degrees of freedom. For simple linear regression, we have:
- ,
- , and
- .
Note that is the number of explanatory variables in the model minus 1 and is the n minus the number of explanatory variables denoted in the model.
Mean Squares
The mean squares are the ratio of the sum of squares over the respective degrees of freedom. Therefore, for simple linear regression:
- the mean square for the model is and
- the mean square error is .
Note also that MSE is an unbiased estimate for the variance . Specifically, .
Coefficient of Determination
The coefficient of determination, denoted by is a measure of fit for the estimated model. Specifically, is a measure of the amount of variance (of Y) explained by the explanatory variable . For simple linear regression, the equation is:
.
Note that is a number between 0 and 1. For example, implies that all points fall on a straight line while (or if is close to 0) implies that the points are extremely scattered or that the points follow a non-linear pattern. In either case, the regression model is poor when is close to 0 while an close to 1 indicates that the model produces quality predictions (i.e. the model is a good fit).
Controversies
The main criticisms of multiple linear regression involve the required linearity (in the coefficients) of the model as well as the (optional) assumption of the normality in the response .