Студопедия
Случайная страница | ТОМ-1 | ТОМ-2 | ТОМ-3
АвтомобилиАстрономияБиологияГеографияДом и садДругие языкиДругоеИнформатика
ИсторияКультураЛитератураЛогикаМатематикаМедицинаМеталлургияМеханика
ОбразованиеОхрана трудаПедагогикаПолитикаПравоПсихологияРелигияРиторика
СоциологияСпортСтроительствоТехнологияТуризмФизикаФилософияФинансы
ХимияЧерчениеЭкологияЭкономикаЭлектроника

The explanatory power of a linear regression equation

Читайте также:
  1. Assumptions of the regression model
  2. Computer solution of multiple regressions
  3. Dummy variables in the regression models
  4. Linear defects, or dislocations – one-dimensional imperfection
  5. Mean and variance of linear function of a random variable
  6. Multiple regression model
  7. Powermove vs. Style.

 

In Figure 3.3 it is shown that the deviation of an individual y value from its mean can be

 

partitioned into deviation of the predicted value from the mean and the deviation of the observed value from the predicted value

We square each side of the equation-because the sum of deviations about the mean is equal to zero-and sum the results over all n points

Some of you may note the squaring of the right- hand side should include the cross product of the two terms in addition to their squared quantities. It can be shown that the cross predicted term goes to zero. This equation is expressed as

We see that the total variability- SST - consists of two components- SSR- the amount of variability explained by the regression equation- named “Regression Sum of Squares” and – SSE -random or unexplained deviation of points from the regression line-named “Error Sum of Squares”. Thus

Total sum of squares:

Regression Sum of Squares:

Error Sum of Squares:

For a given set of observed values of the dependent variables, y, the SST is fixed as the total variability of all observations from the mean. We see that in the partitioning larger values of SSR and hence smaller value of SSE indicate a regression equation that “fits” or comes closer to the observed data. This partitioning is shown graphically in Figure 3.3.

Example:

Let us find SST, SSR and SSE for the data on incomes and food expenditure.

Using calculation given in the table 3.3 we find the value of total sum of squares as

 

Table3.4

 

x y
    10.3884 14.0872 6.6896 11.4452 5.1044 8.5390 7.7464   -1.3884 0.9128 0.3104 -0.4452 -0.1044 -0.5390 1.2536 4.7143 18.7143 -9.2857 8.7143 -15.286 -2.2857 -5.2857 22.2246 350.225 86.2242 75.9390 233.653 5.2244 27.9386 1.9277 0.8332 0.0963 0.1982 0.0109 0.2905 1.5715
            801.429 4.9283

 

The error sum of squares SSE is given in the sum of the eights column in Table 3.4. Thus,

The regression sum of squares can be found from .

Thus

.

The value of SSR can also be computed by using the formula.(Check!!)

.

The total sum of squares SST is a measure of the total variation in food expenditures, SSR is the portion of total variation explained by the regression model (or by income), and the error sum of squares SSE is the portion of total variation not explained by the regression model.

 

3.6.1. Coefficient of determination

 

If we divide both side of the equation

by SST, we obtain

We have seen that the fit of the regression equation to the data is improved as SSR increases and SSE decreases. The ratio provides a descriptive measure of the proportion or percent of the total variability that is explained by the regression model. This measure is called the coefficient of determination- or more generally .

The coefficient of determination is often interpreted as the percent of variability in y that is explained by the regression equation. We see that increases directly with thespread of the independent variable.

can vary from 0 to 1 since SST is fixed and . A larger implies a better regression, everything else being equal.

Interpretation of : About of the sample variation in y (measured by the total sum of squares of deviations of the sample y values about their mean ) can be explained by using x to predict y in the straight line model.

Example:

Calculate the coefficient of determination for the data on monthly incomes and food expenditures of seven households.

Solution:

From earlier calculations

and

Hence,

We can state that 92% of the variability in y is explained by linear regression, and the linear model seems very satisfactory in this respect. In other words, we can state that 92% of the total variation in food expenditures of households occurs because of the variation in their incomes, and the remaining 8% is due to other variables, like differences in size of the household, preferences and tastes and so on.

 


Дата добавления: 2015-08-05; просмотров: 133 | Нарушение авторских прав


Читайте в этой же книге: Correlation analysis | Hypothesis test for correlation | Exercises | Spearman rank correlation | Exercises | The linear regression model | Least squares coefficient estimators | Least square procedure | Interpretation of a and b | Assumptions of the regression model |
<== предыдущая страница | следующая страница ==>
Exercises| Estimation of model error variance

mybiblioteka.su - 2015-2024 год. (0.014 сек.)