Sample Solution

1(a) State whether the following statements are true or false and also give the reason in support of your answer.
(i) We define three indicator variables for an explanatory variable with three categories.
Answer:
The statement provided is somewhat ambiguous, so I’ll assume question is asking whether it’s correct to define three indicator (dummy) variables for an explanatory variable that has three categories. Let’s clarify this and provide a comprehensive answer.

Statement:

"We define three indicator variables for an explanatory variable with three categories."

Explanation:

In the context of regression analysis, when we have a categorical explanatory variable with k k kkk categories, we typically use k 1 k 1 k-1k-1k1 indicator (dummy) variables. This approach prevents perfect multicollinearity (also known as the dummy variable trap), where the dummy variables are perfectly collinear with the intercept term.
Let’s assume we have a categorical variable X X XXX with three categories: A, B, and C. Here’s how we typically define the dummy variables:
  1. Indicator Variable 1 ( D 1 D 1 D1D1D1):
    • D 1 = 1 D 1 = 1 D1=1D1 = 1D1=1 if the observation belongs to category A
    • D 1 = 0 D 1 = 0 D1=0D1 = 0D1=0 otherwise
  2. Indicator Variable 2 ( D 2 D 2 D2D2D2):
    • D 2 = 1 D 2 = 1 D2=1D2 = 1D2=1 if the observation belongs to category B
    • D 2 = 0 D 2 = 0 D2=0D2 = 0D2=0 otherwise
We do not need a third indicator variable for category C because its presence is already implied when D 1 D 1 D1D1D1 and D 2 D 2 D2D2D2 are both 0.

Justification:

If we create three dummy variables for a categorical variable with three categories, we will encounter perfect multicollinearity. Here’s why:
Suppose X X XXX has categories A, B, and C, and we create three indicator variables D 1 D 1 D1D1D1, D 2 D 2 D2D2D2, and D 3 D 3 D3D3D3:
  • D 1 D 1 D1D1D1 for category A
  • D 2 D 2 D2D2D2 for category B
  • D 3 D 3 D3D3D3 for category C
In this case, there is a linear relationship among these dummy variables:
D 1 + D 2 + D 3 = 1 D 1 + D 2 + D 3 = 1 D1+D2+D3=1D1 + D2 + D3 = 1D1+D2+D3=1
This relationship implies perfect multicollinearity, which makes the regression coefficients indeterminate because the design matrix X X XXX becomes singular (not invertible).

Correct Approach:

Define only k 1 k 1 k-1k-1k1 dummy variables for k k kkk categories to avoid multicollinearity. Thus, for three categories, we define only two dummy variables.

Conclusion:

The statement "We define three indicator variables for an explanatory variable with three categories" is false. We should define k 1 k 1 k-1k-1k1 indicator variables for k k kkk categories to avoid multicollinearity.

Example:

Let’s create an example with three categories:
  • X X XXX = A, B, C (categorical variable)
Define two indicator variables:
  • D 1 D 1 D1D1D1 = 1 if X X XXX = A, 0 otherwise
  • D 2 D 2 D2D2D2 = 1 if X X XXX = B, 0 otherwise
Category X = C X = C X=CX = CX=C is implied when D 1 = 0 D 1 = 0 D1=0D1 = 0D1=0 and D 2 = 0 D 2 = 0 D2=0D2 = 0D2=0.
When we run a regression model with these two dummy variables, you avoid multicollinearity and can interpret the coefficients appropriately.
(ii) If the coefficient of determination is 0.833 , the number of observations and explanatory variables are 12 and 3 , respectively, then the Adjusted R 2 R 2 R^(2)R^2R2 will be 0.84 .
Answer:
To determine whether the statement is true, we need to calculate the Adjusted R 2 R 2 R^(2)R^2R2 and compare it to 0.84.

Definitions and Formulas:

  1. Coefficient of Determination ( R 2 R 2 R^(2)R^2R2):
    R 2 = 0.833 R 2 = 0.833 R^(2)=0.833R^2 = 0.833R2=0.833
  2. Number of Observations (n):
    n = 12 n = 12 n=12n = 12n=12
  3. Number of Explanatory Variables (k):
    k = 3 k = 3 k=3k = 3k=3
  4. Adjusted R 2 R 2 R^(2)R^2R2 Formula:
    Adjusted R 2 = 1 ( ( 1 R 2 ) ( n 1 ) n k 1 ) Adjusted R 2 = 1 ( 1 R 2 ) ( n 1 ) n k 1 “Adjusted “R^(2)=1-(((1-R^(2))(n-1))/(n-k-1))\text{Adjusted } R^2 = 1 – \left( \frac{(1 – R^2)(n – 1)}{n – k – 1} \right)Adjusted R2=1((1R2)(n1)nk1)

Calculation:

  1. Calculate the numerator:
    1 R 2 = 1 0.833 = 0.167 1 R 2 = 1 0.833 = 0.167 1-R^(2)=1-0.833=0.1671 – R^2 = 1 – 0.833 = 0.1671R2=10.833=0.167
  2. Calculate the degrees of freedom adjustment:
    n 1 = 12 1 = 11 n 1 = 12 1 = 11 n-1=12-1=11n – 1 = 12 – 1 = 11n1=121=11
    n k 1 = 12 3 1 = 8 n k 1 = 12 3 1 = 8 n-k-1=12-3-1=8n – k – 1 = 12 – 3 – 1 = 8nk1=1231=8
  3. Calculate the fraction:
    ( 1 R 2 ) ( n 1 ) n k 1 = 0.167 × 11 8 = 1.837 8 = 0.229625 ( 1 R 2 ) ( n 1 ) n k 1 = 0.167 × 11 8 = 1.837 8 = 0.229625 ((1-R^(2))(n-1))/(n-k-1)=(0.167 xx11)/(8)=(1.837)/(8)=0.229625\frac{(1 – R^2)(n – 1)}{n – k – 1} = \frac{0.167 \times 11}{8} = \frac{1.837}{8} = 0.229625(1R2)(n1)nk1=0.167×118=1.8378=0.229625
  4. Calculate the Adjusted R 2 R 2 R^(2)R^2R2:
    Adjusted R 2 = 1 0.229625 = 0.770375 Adjusted R 2 = 1 0.229625 = 0.770375 “Adjusted “R^(2)=1-0.229625=0.770375\text{Adjusted } R^2 = 1 – 0.229625 = 0.770375Adjusted R2=10.229625=0.770375

Conclusion:

The calculated Adjusted R 2 R 2 R^(2)R^2R2 is approximately 0.770375, not 0.84.
Thus, the statement "If the coefficient of determination is 0.833, the number of observations and explanatory variables are 12 and 3, respectively, then the Adjusted R 2 R 2 R^(2)R^2R2 will be 0.84." is false.
Scroll to Top
Scroll to Top