Plot Analysis:
- If you plot $Y = X^{2}$ on a graph with $X$ on the x-axis and $Y$ on the y-axis, you will get a parabolic curve that opens upwards. This is not a straight line, which would be characteristic of a linear relationship.
Rate of Change:
- In a linear relationship, the rate of change of $Y$ with respect to $X$ is constant. However, in $Y = X^{2}$ , the rate of change (slope) of $Y$ is given by the derivative $\frac{d Y}{d X} = 2 X$ , which is not constant but varies with $X$ .

Example to Illustrate Non-linearity:
Consider two points:

When $X = 1$ , $Y = 1^{2} = 1$
When $X = 2$ , $Y = 2^{2} = 4$
When $X = 3$ , $Y = 3^{2} = 9$

The differences in

Y

for equal changes in

X

are not constant (from

X = 1

X = 2

Y

changes by 3, but from

X = 2

X = 3

Y

changes by 5), demonstrating a non-linear relationship.

Conclusion:

Since the relationship

Y = X^{2}

does not meet the criteria for linearity (a constant rate of change and a linear plot), the statement is false.

(b) In regression analysis, the two regression coefficients are -2 and

- 2 / 3

Answer:

The statement "In regression analysis, the two regression coefficients are -2 and

- 2 / 3

" is false.

Justification:

In regression analysis involving two variables

X

and

Y

, there are typically two regression equations:

The regression of $Y$ on $X$ :

$Y = a + b X$

where $b$ is the regression coefficient of $Y$ on $X$ .
The regression of $X$ on $Y$ :

$X = c + d Y$

where $d$ is the regression coefficient of $X$ on $Y$ .

Relationship Between Regression Coefficients:
There is a specific relationship between these two regression coefficients,

b

and

d

b \times d = r^{2}

where

r

is the correlation coefficient between

X

and

Y

Proof Using Given Coefficients:

Given

b = - 2

and

d = - \frac{2}{3}

, let’s check the relationship:

b \times d = (- 2) \times (- \frac{2}{3}) = \frac{4}{3}

For this product to be valid, it must equal

r^{2}

, where

r

is the correlation coefficient.

Constraints on $r^{2}$ :

The value of

r

(and thus

r^{2}

) must lie within the interval

[- 1, 1]

0 \leq r^{2} \leq 1

However, in this case:

r^{2} = \frac{4}{3}

This is not possible because

r^{2}

cannot exceed 1.

Conclusion:

Since the product

b \times d

should equal

r^{2}

and

r^{2}

cannot be greater than 1, the given regression coefficients

b = - 2

and

d = - \frac{2}{3}

cannot exist simultaneously in a valid regression model. Hence, the statement is false.

Answer:

The statement "Sum of deviations of the observations from their mean is zero" is true.

Justification:

Definition of Mean:
The mean (average) of a set of observations

x_{1}, x_{2}, \dots, x_{n}

is given by:

\bar{x} = \frac{1}{n} \sum_{i = 1} n x_{i}

Deviation from the Mean:
The deviation of an observation

x_{i}

from the mean

\bar{x}

is given by:

x_{i} - \bar{x}

Sum of Deviations:
The sum of the deviations of all observations from the mean is:

\sum_{i = 1}^{n} (x_{i} - \bar{x})

We can expand this sum:

\sum_{i = 1}^{n} (x_{i} - \bar{x}) = \sum_{i = 1}^{n} x_{i} - \sum_{i = 1}^{n} \bar{x}

Since

\bar{x}

is the mean, it is a constant value for all

n

observations. Therefore:

\sum_{i = 1}^{n} \bar{x} = n \cdot \bar{x}

So we can rewrite the sum of deviations as:

\sum_{i = 1}^{n} (x_{i} - \bar{x}) = \sum_{i = 1}^{n} x_{i} - n \cdot \bar{x}

But by the definition of the mean:

\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

Multiplying both sides by

n

n \cdot \bar{x} = \sum_{i = 1}^{n} x_{i}

Substituting back into the sum of deviations:

\sum_{i = 1}^{n} (x_{i} - \bar{x}) = \sum_{i = 1}^{n} x_{i} - \sum_{i = 1}^{n} x_{i} = 0

Conclusion:

The sum of the deviations of the observations from their mean is indeed zero. Hence, the statement is true.

(d) If the value of

β_{2} < 3

, then the curve is said to be leptokurtic.

Answer:

The statement "If the value of

β_{2} < 3

, then the curve is said to be leptokurtic" is false.

Justification:

Kurtosis Overview:
Kurtosis is a statistical measure that describes the distribution of data points in the tails relative to the overall shape of the distribution. It indicates the "tailedness" of the distribution. The excess kurtosis is often used to compare with the normal distribution’s kurtosis.

Beta Coefficient ( $β_{2}$ ):

$β_{2}$ is a measure of kurtosis. For a normal distribution, $β_{2} = 3$ .

Types of Kurtosis:

Leptokurtic: Distributions with kurtosis greater than 3 ( $β_{2} > 3$ ). These distributions have fatter tails and a sharper peak compared to the normal distribution.
Mesokurtic: Distributions with kurtosis equal to 3 ( $β_{2} = 3$ ). The normal distribution is an example of a mesokurtic distribution.
Platykurtic: Distributions with kurtosis less than 3 ( $β_{2} < 3$ ). These distributions have thinner tails and a flatter peak compared to the normal distribution.

Analysis of the Given Statement:

The statement asserts that if

β_{2} < 3

, the curve is leptokurtic. However, by the definitions provided:

If $β_{2} < 3$ , the curve is actually platykurtic, not leptokurtic.
A leptokurtic curve would require $β_{2} > 3$ .

Conclusion:

Given the definitions, the correct classification for

β_{2} < 3

is platykurtic, not leptokurtic. Therefore, the statement is false.

(e) In a company of 1000 persons, 750 were male out of whom 530 were married. Among females, the number of married ones was 350 , then the data is consistent.

Answer:

To determine whether the given data is consistent, we need to verify if all the numbers provided logically add up without any contradictions.

Given Data:

Total number of persons in the company: 1000
Number of males: 750
Number of married males: 530
Number of females: (Total persons – Number of males) = 1000 – 750 = 250
Number of married females: 350

Consistency Check:

The main point of inconsistency would be in the number of married persons. Specifically, if the number of married males and married females exceeds the total population, the data is inconsistent.

Number of Married Persons:
- Married males: 530
- Married females: 350
- Total number of married persons = Married males + Married females = 530 + 350 = 880
Total Population:
- Total persons in the company = 1000

The total number of married persons (880) is less than the total population (1000), which is logically possible. Thus, there is no immediate contradiction from this calculation alone. However, a closer look at the gender-wise distribution of the unmarried persons should be considered to fully confirm consistency.

Calculation of Unmarried Persons:

Unmarried Males:
- Total males = 750
- Married males = 530
- Unmarried males = 750 – 530 = 220
Unmarried Females:
- Total females = 250
- Married females = 350
- This presents an inconsistency because the number of married females (350) cannot exceed the total number of females (250).

Conclusion:

The data given is inconsistent because the number of married females (350) exceeds the total number of females (250). This contradiction indicates a clear error in the provided information. Therefore, the statement that the data is consistent is false.

Question:-02

2.(a) A candidate obtained the following percentage of marks in different courses of PGDAST programme :
MST-001-46%
MST-002-67%
MST-003-72%
MST-004-58%
MST-005-53%
It is agreed to give double weights to marks in MST-001 and MST-002 as compared to other courses. What is the simple mean and weighted mean?

Answer:

To calculate both the simple mean and the weighted mean of the percentages obtained by the candidate in the different courses, we need to follow these steps:

Step 1: List the percentages obtained

MST-001: 46%
MST-002: 67%
MST-003: 72%
MST-004: 58%
MST-005: 53%

Step 2: Calculate the Simple Mean

The simple mean is the average of all the percentages. It is calculated by summing up all the percentages and then dividing by the number of courses.

Simple Mean = \frac{\sum Percentages}{Number of Courses}

Simple Mean = \frac{46 + 67 + 72 + 58 + 53}{5}

Simple Mean = \frac{296}{5}

Simple Mean = 59.2 %

Step 3: Calculate the Weighted Mean

Given that MST-001 and MST-002 have double weights compared to the other courses, we need to assign appropriate weights. Let’s denote the weight for MST-001 and MST-002 as

w

and for the other courses as

w / 2

We can arbitrarily set

w = 2

for simplicity. Therefore:

Weight for MST-001: 2
Weight for MST-002: 2
Weight for MST-003: 1
Weight for MST-004: 1
Weight for MST-005: 1

Now, calculate the weighted mean using these weights:

Weighted Mean = \frac{\sum (Percentage \times Weight)}{\sum Weights}

Weighted Mean = \frac{(46 \times 2) + (67 \times 2) + (72 \times 1) + (58 \times 1) + (53 \times 1)}{2 + 2 + 1 + 1 + 1}

Weighted Mean = \frac{(92 + 134 + 72 + 58 + 53)}{7}

Weighted Mean = \frac{409}{7}

Weighted Mean = 58.4286 %

Conclusion:

Simple Mean: $59.2 %$
Weighted Mean: $58.4286 %$

(b) For two Firms A and B, the following details are available :

	A	B
No. of employees	100	200
Average salary	16000	18000
SD of salary	16	18

Compute the following :
(i) Which Firm pays larger package of salary?
(ii) Which Firm shows greater variability in the distribution of salary?
(iii) Compute the combined average salary and combined variance of both firms.

Answer:

Let’s address each of the questions step by step:

Given Data:

Firm A:
- Number of employees: $n_{A} = 100$
- Average salary: $μ_{A} = 16000$
- Standard deviation of salary: $σ_{A} = 16$
Firm B:
- Number of employees: $n_{B} = 200$
- Average salary: $μ_{B} = 18000$
- Standard deviation of salary: $σ_{B} = 18$

(i) Which Firm Pays a Larger Package of Salary?

The larger package of salary refers to the average salary. Comparing the average salaries of both firms:

Firm A: 16000
Firm B: 18000

Answer: Firm B pays a larger package of salary because the average salary at Firm B (18000) is higher than that at Firm A (16000).

(ii) Which Firm Shows Greater Variability in the Distribution of Salary?

The variability in the distribution of salary can be compared using the standard deviation.

Firm A: 16
Firm B: 18

Answer: Firm B shows greater variability in the distribution of salary because the standard deviation of salaries at Firm B (18) is higher than that at Firm A (16).

(iii) Compute the Combined Average Salary and Combined Variance of Both Firms

Combined Average Salary

The combined average salary (

μ

) for both firms can be calculated using a weighted average:

μ = \frac{n_{A} μ_{A} + n_{B} μ_{B}}{n_{A} + n_{B}}

Substituting the given values:

μ = \frac{100 \times 16000 + 200 \times 18000}{100 + 200}

μ = \frac{1600000 + 3600000}{300}

μ = \frac{5200000}{300}

μ = 17333. \bar{3}

So, the combined average salary is approximately

17333.33

Combined Variance

To compute the combined variance (

σ^{2}

), we need to use the following formula:

σ^{2} = \frac{1}{N} [\sum_{i = 1}^{k} n_{i} (σ_{i}^{2} + μ_{i}^{2}) - {(\sum_{i = 1}^{k} n_{i} μ_{i})}^{2} / N]

Where:

$N = n_{A} + n_{B}$
$σ_{i}^{2}$ is the variance of each firm (the square of the standard deviation).

First, let’s calculate the individual components:

$N = 100 + 200 = 300$
$σ_{A}^{2} = 16^{2} = 256$
$σ_{B}^{2} = 18^{2} = 324$

Now, calculate the terms inside the formula:

\sum_{i = 1}^{k} n_{i} (σ_{i}^{2} + μ_{i}^{2}) = 100 \times (256 + 16000^{2}) + 200 \times (324 + 18000^{2})

= 100 \times (256 + 256000000) + 200 \times (324 + 324000000)

= 100 \times 256000256 + 200 \times 324000324

= 25600025600 + 64800064800

= 90400090400

Now, calculate the term

{(\sum_{i = 1}^{k} n_{i} μ_{i})}^{2}

{(100 \times 16000 + 200 \times 18000)}^{2}

= (1600000 + 3600000)^{2}

= 5200000^{2}

= 27040000000000

Now, put these into the formula:

σ^{2} = \frac{1}{300} (90400090400 - \frac{27040000000000}{300})

σ^{2} = \frac{1}{300} (90400090400 - 90133333333.33)

σ^{2} = \frac{1}{300} \times 266757066.67

σ^{2} = 889190.22

So, the combined variance is

889190.22

Conclusion:

Larger package of salary: Firm B
Greater variability in the distribution of salary: Firm B
Combined average salary: $17333.33$
Combined variance: $889190.22$

Question:-03

3.(a) Define coefficient of determination and correlation ratio.

Answer:

Coefficient of Determination ( $R^{2}$ )

The coefficient of determination, denoted as

R^{2}

, is a statistical measure used in the context of regression analysis to assess the goodness of fit of a model. It provides the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

Definition:

R^{2} = 1 - \frac{{SS}_{res}}{{SS}_{tot}}

Where:

${SS}_{res}$ (Residual Sum of Squares): The sum of the squares of the residuals (the differences between the observed and predicted values).
${SS}_{tot}$ (Total Sum of Squares): The sum of the squares of the differences between the observed values and the mean of the observed values.

Interpretation:

$R^{2}$ $R^{2}$ R^(2)R^2 $R^{2}$ ranges from 0 to 1.
- $R^{2} = 0$ : The independent variable does not explain any of the variation in the dependent variable.
- $R^{2} = 1$ : The independent variable perfectly explains all the variation in the dependent variable.
- An $R^{2}$ value close to 1 indicates a strong relationship, whereas a value close to 0 indicates a weak relationship.

Example:

R^{2} = 0.85

, it means that 85% of the variance in the dependent variable is explained by the independent variable(s).

Correlation Ratio (η or $η^{2}$ )

The correlation ratio, denoted by the Greek letter eta (η), or

η^{2}

when referring to the squared correlation ratio, is a measure of the strength of the relationship between a continuous dependent variable and a categorical independent variable. It is used in analysis of variance (ANOVA) and is particularly useful when the relationship is not strictly linear.

Definition:

η^{2} = \frac{{SS}_{between}}{{SS}_{total}}

Where:

${SS}_{between}$ (Between-Group Sum of Squares): The sum of the squared deviations of the group means from the overall mean, weighted by the number of observations in each group.
${SS}_{total}$ (Total Sum of Squares): The sum of the squared deviations of each observation from the overall mean.

Interpretation:

$η^{2}$ $η^{2}$ eta^(2)\eta^2 $η^{2}$ ranges from 0 to 1.
- $η^{2} = 0$ : There is no relationship between the dependent variable and the categorical independent variable.
- $η^{2} = 1$ : There is a perfect relationship between the dependent variable and the categorical independent variable.
- Higher values of $η^{2}$ indicate a stronger relationship.

Example:

η^{2} = 0.65

, it means that 65% of the variance in the dependent variable can be attributed to the differences between groups defined by the categorical independent variable.

Summary:

Coefficient of Determination ( $R^{2}$ ): Measures the proportion of variance in the dependent variable explained by the independent variable(s) in a regression model. It ranges from 0 to 1.
Correlation Ratio (η or $η^{2}$ ): Measures the strength of the relationship between a continuous dependent variable and a categorical independent variable, particularly useful for non-linear relationships. It also ranges from 0 to 1.

(b) Calculate the correlation coefficient from the following data :

$X$	$Y$
12	14
9	8
8	6
10	9
11	11
13	12

Let now each value of

X

be multiplied by 2 and then 6 be added to it.

Similarly, multiply each value of

Y

by 3 and subtract 2 from it. What will be the correlation coefficient between the new series of

X

and

Y

Answer:

To calculate the correlation coefficient (

r

) for the given data and the transformed data, we will follow these steps:

Step 1: Calculate the Correlation Coefficient for the Original Data

Given data:

\begin{array}{cc} X & Y \\ 12 & 14 \\ 9 & 8 \\ 8 & 6 \\ 10 & 9 \\ 11 & 11 \\ 13 & 12 \end{array}

Calculate the means of $X$ and $Y$ :

$\bar{X} = \frac{12 + 9 + 8 + 10 + 11 + 13}{6} = \frac{63}{6} = 10.5$

$\bar{Y} = \frac{14 + 8 + 6 + 9 + 11 + 12}{6} = \frac{60}{6} = 10$
Calculate the deviations from the mean:

$(X_{i} - \bar{X}), (Y_{i} - \bar{Y})$

$\begin{array}{ccc} X_{i} & Y_{i} & (X_{i} - \bar{X}) (Y_{i} - \bar{Y}) \\ 12 & 14 & (12 - 10.5) (14 - 10) = 1.5 \cdot 4 = 6 \\ 9 & 8 & (9 - 10.5) (8 - 10) = - 1.5 \cdot - 2 = 3 \\ 8 & 6 & (8 - 10.5) (6 - 10) = - 2.5 \cdot - 4 = 10 \\ 10 & 9 & (10 - 10.5) (9 - 10) = - 0.5 \cdot - 1 = 0.5 \\ 11 & 11 & (11 - 10.5) (11 - 10) = 0.5 \cdot 1 = 0.5 \\ 13 & 12 & (13 - 10.5) (12 - 10) = 2.5 \cdot 2 = 5 \end{array}$
Sum of the products of deviations:

$\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y}) = 6 + 3 + 10 + 0.5 + 0.5 + 5 = 25$
Calculate the standard deviations of $X$ and $Y$ :

$σ_{X} = \sqrt{\frac{\sum (X_{i} - \bar{X})^{2}}{n}} = \sqrt{\frac{{1.5}^{2} + (- 1.5)^{2} + (- 2.5)^{2} + (- 0.5)^{2} + {0.5}^{2} + {2.5}^{2}}{6}}$

$σ_{X} = \sqrt{\frac{2.25 + 2.25 + 6.25 + 0.25 + 0.25 + 6.25}{6}} = \sqrt{\frac{17.5}{6}} = \sqrt{2.9167} \approx 1.71$

$σ_{Y} = \sqrt{\frac{\sum (Y_{i} - \bar{Y})^{2}}{n}} = \sqrt{\frac{4^{2} + (- 2)^{2} + (- 4)^{2} + (- 1)^{2} + 1^{2} + 2^{2}}{6}}$

$σ_{Y} = \sqrt{\frac{16 + 4 + 16 + 1 + 1 + 4}{6}} = \sqrt{\frac{42}{6}} = \sqrt{7} \approx 2.65$
Calculate the correlation coefficient:

$r = \frac{\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{n \cdot σ_{X} \cdot σ_{Y}} = \frac{25}{6 \cdot 1.71 \cdot 2.65} \approx \frac{25}{27.18} \approx 0.92$

Step 2: Calculate the Correlation Coefficient for the Transformed Data

The transformations are:

$X^{'} = 2 X + 6$
$Y^{'} = 3 Y - 2$

Transformation Impact:

The transformations are linear transformations of the original variables $X$ and $Y$ .
Linear transformations do not affect the correlation coefficient between the two variables. The correlation coefficient remains unchanged.

Conclusion:

The correlation coefficient for the original data is approximately $0.92$ .
The correlation coefficient for the transformed data will be the same as for the original data, which is approximately $0.92$ .

Question:-04

4.(a) Differentiate between correlation and regression.

Answer:

Correlation and regression are both statistical tools used to examine relationships between variables, but they serve different purposes and provide different information. Here’s a detailed differentiation between the two:

Correlation:

Purpose:
- Correlation measures the strength and direction of the linear relationship between two variables.
Nature of Analysis:
- It quantifies the degree to which two variables are related, but does not imply causation.
Output:
- The result is a correlation coefficient, typically denoted by $r$ .
- $r$ $r$ rr $r$ ranges from -1 to 1:
  - $r = 1$ indicates a perfect positive linear relationship.
  - $r = - 1$ indicates a perfect negative linear relationship.
  - $r = 0$ indicates no linear relationship.
Symmetry:
- Correlation is symmetric: $corr (X, Y) = corr (Y, X)$ .
Units:
- Correlation is a unitless measure, meaning it does not depend on the scale of the variables.
Types:
- Pearson correlation (measures linear relationship).
- Spearman correlation (measures monotonic relationship, suitable for non-parametric data).
Interpretation:
- Correlation only indicates the degree of association, not the exact nature or causality.

Regression:

Purpose:
- Regression assesses the relationship between a dependent variable and one or more independent variables, and models how the dependent variable changes when the independent variable(s) are varied.
Nature of Analysis:
- It explains the nature of the relationship and provides a predictive model.
- The focus is on predicting the value of the dependent variable based on the independent variable(s).
Output:
- The result is an equation that describes the relationship between the variables.
- In simple linear regression: $Y = a + b X$ $Y = a + b X$ Y=a+bXY = a + bX $Y = a + b X$
  - $Y$ is the dependent variable.
  - $X$ is the independent variable.
  - $a$ is the intercept.
  - $b$ is the slope (regression coefficient).
Symmetry:
- Regression is not symmetric: the regression of $Y$ on $X$ is not the same as the regression of $X$ on $Y$ .
Units:
- The regression coefficients have units and are interpreted as the change in the dependent variable for a one-unit change in the independent variable.
Types:
- Simple linear regression (one independent variable).
- Multiple linear regression (more than one independent variable).
- Non-linear regression (non-linear relationships).
Interpretation:
- Regression provides insights into the nature of the relationship, including magnitude and direction of influence.
- It also helps in making predictions and understanding causality, to an extent.

Summary:

Correlation: Measures strength and direction of linear relationship, symmetric, unitless, no causation.
Regression: Models relationship, provides predictive equation, not symmetric, coefficients have units, indicates causation to some extent.

Understanding these distinctions helps in choosing the appropriate method for analyzing data and interpreting the results accurately.

(b) In order to find the correlation between two variables

X

and

Y

from 12 pairs of observations, the following calculations were obtained :

\begin{aligned} Σ X = 30, Σ X^{2} = 670, Σ Y = 5, Σ Y^{2} = 285, \\ Σ X Y = 344 \end{aligned}

On subsequent verification, it was discovered that the pair

(X = 11, Y = 4)

was copied wrongly, the correct values being

(X = 10, Y = 14)

. After making necessary correction, find :

(i) regression coefficients,
(ii) two regression equations, and
(iii) correlation coefficient.

Answer:

To address the given problem, we need to first correct the calculations with the correct pair of observations

(X = 10, Y = 14)

instead of

(X = 11, Y = 4)

. Then, we will proceed to find the regression coefficients, the regression equations, and the correlation coefficient.

Step 1: Correct the Calculations

Given:

Original $Σ X = 30$
Original $Σ Y = 5$
Original $Σ X^{2} = 670$
Original $Σ Y^{2} = 285$
Original $Σ X Y = 344$

Incorrect pair:

(X = 11, Y = 4)

Correct pair:

(X = 10, Y = 14)

First, remove the contributions of the incorrect pair and then add the contributions of the correct pair.

Corrected Sums:

Corrected $Σ X$ :

$Σ X_{new} = Σ X - 11 + 10 = 30 - 11 + 10 = 29$
Corrected $Σ Y$ :

$Σ Y_{new} = Σ Y - 4 + 14 = 5 - 4 + 14 = 15$
Corrected $Σ X^{2}$ :

$Σ X_{new}^{2} = Σ X^{2} - 11^{2} + 10^{2} = 670 - 121 + 100 = 649$
Corrected $Σ Y^{2}$ :

$Σ Y_{new}^{2} = Σ Y^{2} - 4^{2} + 14^{2} = 285 - 16 + 196 = 465$
Corrected $Σ X Y$ :

$Σ X Y_{new} = Σ X Y - 11 \cdot 4 + 10 \cdot 14 = 344 - 44 + 140 = 440$

Step 2: Calculate the Means

Number of observations

n = 12

Mean of $X$ :

$\bar{X} = \frac{Σ X_{new}}{n} = \frac{29}{12}$
Mean of $Y$ :

$\bar{Y} = \frac{Σ Y_{new}}{n} = \frac{15}{12}$

Step 3: Calculate the Regression Coefficients

Regression Coefficient of $Y$ on $X$ ( $b_{Y X}$ ):

b_{Y X} = \frac{n Σ X Y_{new} - Σ X_{new} Σ Y_{new}}{n Σ X_{new}^{2} - (Σ X_{new})^{2}}

b_{Y X} = \frac{12 \cdot 440 - 29 \cdot 15}{12 \cdot 649 - 29^{2}}

b_{Y X} = \frac{5280 - 435}{7788 - 841}

b_{Y X} = \frac{4845}{6947} \approx 0.697

Regression Coefficient of $X$ on $Y$ ( $b_{X Y}$ ):

b_{X Y} = \frac{n Σ X Y_{new} - Σ X_{new} Σ Y_{new}}{n Σ Y_{new}^{2} - (Σ Y_{new})^{2}}

b_{X Y} = \frac{12 \cdot 440 - 29 \cdot 15}{12 \cdot 465 - 15^{2}}

b_{X Y} = \frac{5280 - 435}{5580 - 225}

b_{X Y} = \frac{4845}{5355} \approx 0.905

Step 4: Calculate the Regression Equations

Regression Equation of $Y$ on $X$ :

Y - \bar{Y} = b_{Y X} (X - \bar{X})

Y - \frac{15}{12} = 0.697 (X - \frac{29}{12})

Y = 0.697 X - 0.697 \cdot \frac{29}{12} + \frac{15}{12}

Y = 0.697 X - 1.687 + 1.25

Y = 0.697 X - 0.437

Regression Equation of $X$ on $Y$ :

X - \bar{X} = b_{X Y} (Y - \bar{Y})

X - \frac{29}{12} = 0.905 (Y - \frac{15}{12})

X = 0.905 Y - 0.905 \cdot \frac{15}{12} + \frac{29}{12}

X = 0.905 Y - 1.131 + 2.417

X = 0.905 Y + 1.286

Step 5: Calculate the Correlation Coefficient

The correlation coefficient

r

can be obtained from the regression coefficients

b_{Y X}

and

b_{X Y}

r = \sqrt{b_{Y X} \cdot b_{X Y}}

r = \sqrt{0.697 \cdot 0.905}

r = \sqrt{0.630} \approx 0.793

Summary:

Regression Coefficients:
- $b_{Y X} \approx 0.697$
- $b_{X Y} \approx 0.905$
Regression Equations:
- Regression equation of $Y$ on $X$ : $Y = 0.697 X - 0.437$
- Regression equation of $X$ on $Y$ : $X = 0.905 Y + 1.286$
Correlation Coefficient:
- $r \approx 0.793$

Question:-05

5.(a) In a musical contest, 168 contestants participated. The competition comprised three different stages. It was found that 57 contestants cleared first stage; 45 second stage and 72 third stage. The number of contestants who cleared all the stages, who did not clear any stage, who cleared only first two stages and who cleared only third stage were

17, 29, 11

and 20 , respectively. With the given information, find how many contestants cleared at least two stages.

Answer:

To solve this problem, we will use the principle of inclusion-exclusion and the given data to find the number of contestants who cleared at least two stages. We denote the following:

$A$ : Contestants who cleared the first stage
$B$ : Contestants who cleared the second stage
$C$ : Contestants who cleared the third stage

We are given:

$| A | = 57$
$| B | = 45$
$| C | = 72$
$| A \cap B \cap C | = 17$ (cleared all stages)
$| A^{c} \cap B^{c} \cap C^{c} | = 29$ (did not clear any stage)
$| A \cap B \cap C^{c} | = 11$ (cleared only the first two stages)
$| A^{c} \cap B^{c} \cap C | = 20$ (cleared only the third stage)

Finding the number of contestants who cleared at least two stages:

Contestants who cleared at least one stage:
Total number of contestants $= 168$
Contestants who did not clear any stage $= 29$
Contestants who cleared at least one stage $= 168 - 29 = 139$
Using Inclusion-Exclusion Principle:
We have the formula for the number of contestants who cleared at least one stage:

$| A \cup B \cup C | = | A | + | B | + | C | - | A \cap B | - | A \cap C | - | B \cap C | + | A \cap B \cap C |$

We know:

$| A \cup B \cup C | = 139$

Substituting the known values:

$139 = 57 + 45 + 72 - | A \cap B | - | A \cap C | - | B \cap C | + 17$

Simplifying:

$139 = 191 - (| A \cap B | + | A \cap C | + | B \cap C |) + 17$

$139 = 208 - (| A \cap B | + | A \cap C | + | B \cap C |)$

$| A \cap B | + | A \cap C | + | B \cap C | = 208 - 139 = 69$
Finding the individual intersections:

We know:

$| A \cap B \cap C^{c} | = 11 (cleared only first two stages)$

So:

$| A \cap B | = | A \cap B \cap C | + | A \cap B \cap C^{c} | = 17 + 11 = 28$

We also know:

$| A^{c} \cap B^{c} \cap C | = 20 (cleared only third stage)$
Using the total of intersections:

$| A \cap B | + | A \cap C | + | B \cap C | = 69$

We have:

$| A \cap B | = 28$

$| A \cap C | + | B \cap C | = 69 - 28 = 41$
Contestants who cleared only the third stage:

$| A^{c} \cap B^{c} \cap C | = 20$
Contestants who cleared at least two stages:

Let’s break this into:
- $| A \cap B |$ includes those who cleared both first and second stages
- $| A \cap C |$ includes those who cleared both first and third stages
- $| B \cap C |$ includes those who cleared both second and third stages
- $| A \cap B \cap C |$ is already counted thrice in the above summation
Using the intersection counts:

$| A \cap B \cap C | = 17$

Therefore, the contestants who cleared at least two stages are:

$| A \cap B | + | A \cap C | + | B \cap C | - 2 \times | A \cap B \cap C |$

We know:

$| A \cap B | = 28, | A \cap C | = x, | B \cap C | = y, x + y = 41$

And including all stages:

$Total contestants cleared at least two stages = 28 + x + y - 2 \times 17$

Since $x + y = 41$ :

$Total contestants cleared at least two stages = 28 + 41 - 34 = 35$

Therefore, the number of contestants who cleared at least two stages is $35$ .

(b) For a distribution, Bowley’s coefficient of Skewness is

- 0.56, Q_{1} = 16.4

and median

= 24.2

. What is its coefficient of quartile deviation.

Answer:

Bowley’s coefficient of skewness (

S_{k}

) is given by the formula:

S_{k} = \frac{Q_{1} + Q_{3} - 2 Median}{Q_{3} - Q_{1}}

Given:

Bowley’s coefficient of skewness $S_{k} = - 0.56$
First quartile ( $Q_{1}$ ) = 16.4
Median = 24.2

We need to find the third quartile (

Q_{3}

) and the coefficient of quartile deviation.

Step 1: Finding the Third Quartile ( $Q_{3}$ )

Rearrange the formula for Bowley’s coefficient of skewness to solve for

Q_{3}

- 0.56 = \frac{16.4 + Q_{3} - 2 (24.2)}{Q_{3} - 16.4}

Simplify the numerator:

- 0.56 = \frac{16.4 + Q_{3} - 48.4}{Q_{3} - 16.4}

- 0.56 = \frac{Q_{3} - 32}{Q_{3} - 16.4}

Multiply both sides by

(Q_{3} - 16.4)

- 0.56 (Q_{3} - 16.4) = Q_{3} - 32

Distribute

- 0.56

- 0.56 Q_{3} + 9.184 = Q_{3} - 32

Combine like terms:

9.184 + 32 = Q_{3} + 0.56 Q_{3}

41.184 = 1.56 Q_{3}

Solve for

Q_{3}

Q_{3} = \frac{41.184}{1.56}

Q_{3} \approx 26.4

Step 2: Calculating the Coefficient of Quartile Deviation

The coefficient of quartile deviation is given by:

Coefficient of Quartile Deviation = \frac{Q_{3} - Q_{1}}{Q_{3} + Q_{1}}

Substitute the known values:

Coefficient of Quartile Deviation = \frac{26.4 - 16.4}{26.4 + 16.4}

Coefficient of Quartile Deviation = \frac{10}{42.8}

Coefficient of Quartile Deviation \approx 0.2336

Summary:

The coefficient of quartile deviation is approximately

0.2336

Question:-06

6.A researcher wants to study the association between temperament of husband and wife. She examined 5120 pairs and made the following contingency tables :

Temperament

of Husband

Temperament of Wife

Quiet

Good

Natured

Sullen

Quiet

850

571

580

Good Natured

618

593

455

Sullen

540

456

457

Determine and interpret the association between the temperament of husband and wife.

Answer:

To determine the association between the temperament of husband and wife, we can use the chi-square test for independence. This test will help us determine whether there is a significant association between the temperament categories of husbands and wives.

Step 1: Set Up the Contingency Table

The given data is:

\begin{array}{lccc} Temperament of Husband & Quiet & Good Natured & Sullen \\ Quiet & 850 & 571 & 580 \\ Good Natured & 618 & 593 & 455 \\ Sullen & 540 & 456 & 457 \end{array}

Step 2: Calculate the Row and Column Totals

\begin{array}{lcccc} Temperament of Husband & Quiet & Good Natured & Sullen & Row Total \\ Quiet & 850 & 571 & 580 & 2001 \\ Good Natured & 618 & 593 & 455 & 1666 \\ Sullen & 540 & 456 & 457 & 1453 \\ Column Total & 2008 & 1620 & 1492 & 5120 \end{array}

Step 3: Calculate the Expected Frequencies

The expected frequency for each cell in a contingency table is calculated using the formula:

E_{i j} = \frac{({Row Total}_{i} \times {Column Total}_{j})}{Grand Total}

For example, the expected frequency for the cell corresponding to "Quiet Husband" and "Quiet Wife" is:

E_{11} = \frac{(2001 \times 2008)}{5120} \approx 784.7

We will calculate the expected frequencies for all cells:

Quiet Husband, Quiet Wife: $\frac{2001 \times 2008}{5120} \approx 784.7$
Quiet Husband, Good Natured Wife: $\frac{2001 \times 1620}{5120} \approx 633.5$
Quiet Husband, Sullen Wife: $\frac{2001 \times 1492}{5120} \approx 583.8$
Good Natured Husband, Quiet Wife: $\frac{1666 \times 2008}{5120} \approx 653.3$
Good Natured Husband, Good Natured Wife: $\frac{1666 \times 1620}{5120} \approx 527.1$
Good Natured Husband, Sullen Wife: $\frac{1666 \times 1492}{5120} \approx 484.6$
Sullen Husband, Quiet Wife: $\frac{1453 \times 2008}{5120} \approx 570.0$
Sullen Husband, Good Natured Wife: $\frac{1453 \times 1620}{5120} \approx 462.2$
Sullen Husband, Sullen Wife: $\frac{1453 \times 1492}{5120} \approx 425.6$

Step 4: Calculate the Chi-Square Statistic

The chi-square statistic is calculated using the formula:

χ^{2} = \sum \frac{(O_{i j} - E_{i j})^{2}}{E_{i j}}

Where

O_{i j}

are the observed frequencies and

E_{i j}

are the expected frequencies.

For each cell, calculate

\frac{(O_{i j} - E_{i j})^{2}}{E_{i j}}

(850 – 784.7)^2 / 784.7 ≈ 5.21
(571 – 633.5)^2 / 633.5 ≈ 3.96
(580 – 583.8)^2 / 583.8 ≈ 0.02
(618 – 653.3)^2 / 653.3 ≈ 1.83
(593 – 527.1)^2 / 527.1 ≈ 7.23
(455 – 484.6)^2 / 484.6 ≈ 1.89
(540 – 570.0)^2 / 570.0 ≈ 1.58
(456 – 462.2)^2 / 462.2 ≈ 0.08
(457 – 425.6)^2 / 425.6 ≈ 2.42

Sum these values to get the chi-square statistic:

χ^{2} \approx 5.21 + 3.96 + 0.02 + 1.83 + 7.23 + 1.89 + 1.58 + 0.08 + 2.42 = 24.22

Step 5: Determine the Degrees of Freedom and Critical Value

Degrees of freedom for a contingency table is calculated as:

d f = (r - 1) (c - 1)

where

r

is the number of rows and

c

is the number of columns.

In this case:

d f = (3 - 1) (3 - 1) = 2 \times 2 = 4

Using a chi-square distribution table, we find the critical value for

α = 0.05

and

d f = 4

χ_{c r i t i c a l}^{2} \approx 9.488

Step 6: Compare the Chi-Square Statistic with the Critical Value

Since

χ^{2} = 24.22

is greater than

χ_{c r i t i c a l}^{2} = 9.488

, we reject the null hypothesis.

Interpretation

There is a significant association between the temperament of husbands and wives. The observed frequencies differ significantly from the expected frequencies, suggesting that the temperament of one partner is related to the temperament of the other.

Question:-07

7.(a) Suppose a student of PGDAST calculated

r_{12} = 0.90, r_{13} = 0.30

and

r_{23} = 0.70

from a data set. Examine whether these computations are error free.

Answer:

To examine whether the computations of the correlation coefficients

r_{12} = 0.90

r_{13} = 0.30

, and

r_{23} = 0.70

are error-free, we can use the property of correlation coefficients and the concept of the determinant of the correlation matrix.

Step 1: Correlation Matrix

Construct the correlation matrix

R

R = (\begin{matrix} 1 & r_{12} & r_{13} \\ r_{12} & 1 & r_{23} \\ r_{13} & r_{23} & 1 \end{matrix})

Given values:

R = (\begin{matrix} 1 & 0.90 & 0.30 \\ 0.90 & 1 & 0.70 \\ 0.30 & 0.70 & 1 \end{matrix})

Step 2: Check for Positive Semidefiniteness

The determinant of the correlation matrix should be non-negative for the matrix to be positive semidefinite, which is a requirement for a valid correlation matrix.

Calculate the determinant of

R

det (R) = 1 (1 \cdot 1 - 0.70 \cdot 0.70) - 0.90 (0.90 \cdot 1 - 0.30 \cdot 0.70) + 0.30 (0.90 \cdot 0.70 - 0.30 \cdot 1)

Simplify the calculations:

det (R) = 1 (1 - 0.49) - 0.90 (0.90 - 0.21) + 0.30 (0.63 - 0.30)

det (R) = 1 (0.51) - 0.90 (0.69) + 0.30 (0.33)

det (R) = 0.51 - 0.621 + 0.099

det (R) = 0.51 - 0.621 + 0.099 = - 0.012

Step 3: Interpretation

The determinant of the correlation matrix

R

- 0.012

, which is negative. For a valid correlation matrix, the determinant should be non-negative (i.e.,

\geq 0

Conclusion

Since the determinant of the correlation matrix is negative, the given correlation coefficients

r_{12} = 0.90

r_{13} = 0.30

, and

r_{23} = 0.70

cannot all be correct simultaneously. Thus, the computations are not error-free. At least one of the given correlation coefficients must be incorrect.

(b) (i) Explain the method of least squares.
(ii) Fit an equation of the form

y = a b^{X}

on the following data using the method of least squares.

$X$	$Y$
2	144
3	172
4	207
5	248
6	298

Answer:

(i) Explanation of the Method of Least Squares

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns). It minimizes the sum of the squares of the residuals (the differences between observed and calculated values).

Steps in the Method of Least Squares:

Model Specification:
- Define the mathematical form of the relationship between the dependent variable $y$ and the independent variable $x$ . Common forms include linear $y = a + b x$ , exponential $y = a b^{x}$ , etc.
Formulate the Objective Function:
- For a given set of data points $(x_{i}, y_{i})$ , the residual for each point is the difference between the observed value and the value predicted by the model.
- The sum of the squares of these residuals is the objective function that needs to be minimized: $S = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}$
Derive the Normal Equations:
- Calculate the partial derivatives of $S$ with respect to the model parameters and set them to zero. This yields a system of normal equations.
Solve the Normal Equations:
- Solve these equations to obtain estimates of the model parameters.

(ii) Fitting an Equation of the Form $y = a b^{X}$

Given data:

\begin{array}{cc} X & Y \\ 2 & 144 \\ 3 & 172 \\ 4 & 207 \\ 5 & 248 \\ 6 & 298 \end{array}

To fit the equation

y = a b^{X}

using the method of least squares, we first transform it into a linear form by taking the natural logarithm on both sides:

\ln (y) = \ln (a) + X \ln (b)

Let

Y^{'} = \ln (y)

A = \ln (a)

, and

B = \ln (b)

. Then the equation becomes:

Y^{'} = A + B X

We can now use the method of least squares to fit a linear equation to the transformed data.

Transform the Data:

\begin{array}{ccc} X & Y & Y^{'} = \ln (Y) \\ 2 & 144 & \ln (144) \approx 4.9698 \\ 3 & 172 & \ln (172) \approx 5.1475 \\ 4 & 207 & \ln (207) \approx 5.3327 \\ 5 & 248 & \ln (248) \approx 5.5175 \\ 6 & 298 & \ln (298) \approx 5.6971 \end{array}

Calculate the Necessary Sums:

\sum X = 2 + 3 + 4 + 5 + 6 = 20

\sum Y^{'} = 4.9698 + 5.1475 + 5.3327 + 5.5175 + 5.6971 \approx 26.6646

\sum X^{2} = 2^{2} + 3^{2} + 4^{2} + 5^{2} + 6^{2} = 4 + 9 + 16 + 25 + 36 = 90

\sum X Y^{'} = 2 \times 4.9698 + 3 \times 5.1475 + 4 \times 5.3327 + 5 \times 5.5175 + 6 \times 5.6971

\approx 9.9396 + 15.4425 + 21.3308 + 27.5875 + 34.1826 \approx 108.483

Solve the Normal Equations:

The normal equations for the least squares fit are:

n A + B \sum X = \sum Y^{'}

A \sum X + B \sum X^{2} = \sum X Y^{'}

Substituting the sums:

5 A + 20 B = 26.6646

20 A + 90 B = 108.483

Solve for $A$ and $B$ :

Multiply the first equation by 4:

20 A + 80 B = 106.6584

Subtract this from the second equation:

(20 A + 90 B) - (20 A + 80 B) = 108.483 - 106.6584

10 B = 1.8246

B = 0.18246

Substitute

B

back into the first equation:

5 A + 20 \times 0.18246 = 26.6646

5 A + 3.6492 = 26.6646

5 A = 23.0154

A = 4.60308

So,

\ln (a) = A = 4.60308

and

\ln (b) = B = 0.18246

Exponentiate to Get $a$ and $b$ :

a = e^{4.60308} \approx 100.0

b = e^{0.18246} \approx 1.2001

The Fitted Equation:

The equation of the form

y = a b^{X}

fitted to the data is:

y = 100.0 \times {1.2001}^{X}

Back to Course

Next Lesson

IGNOU MST-002 Previous Year Paper Solution | PGDAST

MST-002 Dec 2023

MST-002 Dec 2023

Question:-01

Answer:

Justification:

Conclusion:

Answer:

Justification:

Proof Using Given Coefficients:

Constraints on r 2 r 2 r^(2)r^2r2:

Conclusion:

Answer:

Justification:

Conclusion:

Answer:

Justification:

Analysis of the Given Statement:

Conclusion:

Answer:

Given Data:

Consistency Check:

Calculation of Unmarried Persons:

Conclusion:

Question:-02

Answer:

Step 1: List the percentages obtained

Step 2: Calculate the Simple Mean

Step 3: Calculate the Weighted Mean

Conclusion:

Answer:

Given Data:

(i) Which Firm Pays a Larger Package of Salary?

(ii) Which Firm Shows Greater Variability in the Distribution of Salary?

(iii) Compute the Combined Average Salary and Combined Variance of Both Firms

Combined Average Salary

Combined Variance

Conclusion:

Question:-03

Answer:

Coefficient of Determination ( R 2 R 2 R^(2)R^2R2)

Definition:

Interpretation:

Example:

Correlation Ratio (η or η 2 η 2 eta^(2)\eta^2η2)

Definition:

Interpretation:

Example:

Summary:

Answer:

Step 1: Calculate the Correlation Coefficient for the Original Data

Step 2: Calculate the Correlation Coefficient for the Transformed Data

Transformation Impact:

Conclusion:

Question:-04

Answer:

Correlation:

Regression:

Summary:

Answer:

Step 1: Correct the Calculations

Corrected Sums:

Step 2: Calculate the Means

Step 3: Calculate the Regression Coefficients

Regression Coefficient of Y Y YYY on X X XXX ( b Y X b Y X b_(YX)b_{YX}bYX):

Regression Coefficient of X X XXX on Y Y YYY ( b X Y b X Y b_(XY)b_{XY}bXY):

Step 4: Calculate the Regression Equations

Regression Equation of Y Y YYY on X X XXX:

Regression Equation of X X XXX on Y Y YYY:

Step 5: Calculate the Correlation Coefficient

Summary:

Question:-05

Answer:

Finding the number of contestants who cleared at least two stages:

Answer:

Step 1: Finding the Third Quartile ( Q 3 Q 3 Q_(3)Q_3Q3)

Step 2: Calculating the Coefficient of Quartile Deviation

Summary:

Question:-06

Answer:

Constraints on $r^{2}$ :

Coefficient of Determination ( $R^{2}$ )

Correlation Ratio (η or $η^{2}$ )

Regression Coefficient of $Y$ on $X$ ( $b_{Y X}$ ):

Regression Coefficient of $X$ on $Y$ ( $b_{X Y}$ ):

Regression Equation of $Y$ on $X$ :

Regression Equation of $X$ on $Y$ :

Step 1: Finding the Third Quartile ( $Q_{3}$ )

(ii) Fitting an Equation of the Form $y = a b^{X}$

Solve for $A$ and $B$ :

Exponentiate to Get $a$ and $b$ :