Читайте также: |
|
In the discussion of multiple regression we have assumed that the independent variables, , have existed over a range and contained many different values. All independent variables we have considered were quantitative. We may include in regression model a variable that is qualitative. Such a variable contains different categories instead of numerical values. We will introduce independent variable that will take only two values: 0 and 1. This structure is commonly defined as a “dummy variable”, and we will see that it provides a valuable tool for applying multiple regression to situations involving categorical variables.
Let us consider a simple regression equation
Now suppose that we introduce a dummy variable, , that has values 0 and 1 and the resulting equation becomes
When in this equation the constant is , but when the constant is . Thus we see that the dummy variable shift the linear relationship between and by the value of the coefficient .
The number of dummy variables in a regression model is equal to the number of categories minus 1. For instance, if a variable contains two categories, then we introduce one dummy variable in the regression model for this variable. If a qualitative variable contains three categories, we will introduce two dummy variables and so on.
The following example shows how a dummy variable is used in regression model.
Example:
Refer to example 1. Following table reproduces the data from that example with additional column that contains information for each of the 10 drivers.
Yearly premium | Driving experience | Number of violations (past 5 years) | Gender |
Male Female Female Female Female Male Female Female Male Male |
Using MINITAB, find the regression of yearly auto insurance premium on the years of experience, the number of driving violations, and the gender of drivers. Answer the following questions
a) Write the estimated regression equation.
b) Explain the meaning of the estimated regression coefficient of the independent variable gender.
c) What is the predicted auto insurance premium paid per year by a male driver with 14 years of driving experience and 3 driving violations?
d) What is the predicted auto insurance premium paid per year by a female driver with 14 years of driving experience and 3 driving violations?
e) Construct a 99% confidence interval for the coefficient of gender.
f) Using 1% significance level, test the null hypothesis that the coefficient of gender is zero.
Solution:
Gender is not a quantitative variable, it is a qualitative variable. So, we will use a dummy variable for it in regression model. Let
driving experience (in years)
number of driving violations (during past 5 years)
We can denote dummy variable by . Also we can denote it by letter D.
Suppose
In this case, our population regression model becomes
Assuming values of 0 and 1 to male and female respectively, we rewrite the data
Yearly premium | Driving experience | Number of violations (past 5 years) | Gender |
The following figure shows the MINITAB solution
Regression Analysis: y versus X1, X2, D
Дата добавления: 2015-08-05; просмотров: 177 | Нарушение авторских прав
<== предыдущая страница | | | следующая страница ==> |
Tests on sets of regression parameters | | | Source DF Seq SS |