Question:-1
Explain the assumptions of parametric and nonparametric statistics.
Answer:
1. Introduction to Statistical Methods
Statistical methods are essential for analyzing data and drawing valid conclusions. These methods are broadly classified into parametric and nonparametric statistics. Each category has specific assumptions and conditions that must be met for the results to be valid and reliable. Understanding the underlying assumptions is crucial for selecting the appropriate statistical test and interpreting the results accurately.
2. Assumptions of Parametric Statistics
Parametric statistics are widely used due to their powerful inferential capabilities. However, they rely on several key assumptions:
Normality
Parametric tests assume that the data follows a normal distribution. This assumption applies particularly to the residuals or errors in the model, not necessarily the raw data. Violations of normality can lead to misleading p-values and confidence intervals.
Parametric tests assume that the data follows a normal distribution. This assumption applies particularly to the residuals or errors in the model, not necessarily the raw data. Violations of normality can lead to misleading p-values and confidence intervals.
Homogeneity of Variance (Homoscedasticity)
This assumption states that the variance within each group being compared should be roughly equal. For instance, in ANOVA or t-tests, each group must have similar levels of variability. If this condition is not met, the test might favor one group over another.
This assumption states that the variance within each group being compared should be roughly equal. For instance, in ANOVA or t-tests, each group must have similar levels of variability. If this condition is not met, the test might favor one group over another.
Independence
Observations must be independent of each other, meaning the value of one observation should not influence another. This is a foundational assumption for most statistical tests. Violation of independence, such as in repeated measures without correction, can inflate error rates.
Observations must be independent of each other, meaning the value of one observation should not influence another. This is a foundational assumption for most statistical tests. Violation of independence, such as in repeated measures without correction, can inflate error rates.
Scale of Measurement
Parametric tests generally require the data to be measured at an interval or ratio scale. This ensures that the arithmetic operations (like mean and standard deviation) used in parametric tests are meaningful.
Parametric tests generally require the data to be measured at an interval or ratio scale. This ensures that the arithmetic operations (like mean and standard deviation) used in parametric tests are meaningful.
Linearity (in regression models)
In parametric regression analyses, the relationship between the independent and dependent variables is assumed to be linear. Nonlinear relationships can distort the model’s predictive accuracy.
In parametric regression analyses, the relationship between the independent and dependent variables is assumed to be linear. Nonlinear relationships can distort the model’s predictive accuracy.
3. Assumptions of Nonparametric Statistics
Nonparametric methods are often used when the assumptions of parametric methods are violated. Although they are more flexible, they are not assumption-free. The primary assumptions include:
Independence of Observations
Just like parametric tests, nonparametric methods assume that data points are independent of each other. For example, the Mann-Whitney U test assumes that the two groups being compared consist of independent samples.
Just like parametric tests, nonparametric methods assume that data points are independent of each other. For example, the Mann-Whitney U test assumes that the two groups being compared consist of independent samples.
Ordinal or Nominal Data
Nonparametric tests are suitable for ordinal or nominal scales where rankings or categories are used instead of precise numerical values. These tests do not assume a specific distribution, which makes them ideal for non-continuous data.
Nonparametric tests are suitable for ordinal or nominal scales where rankings or categories are used instead of precise numerical values. These tests do not assume a specific distribution, which makes them ideal for non-continuous data.
Similar Shape of Distributions (for some tests)
Some nonparametric tests, such as the Mann-Whitney U test, assume that the distributions of the two groups being compared have a similar shape. If the shapes differ significantly, the test may compare differences in distribution rather than medians.
Some nonparametric tests, such as the Mann-Whitney U test, assume that the distributions of the two groups being compared have a similar shape. If the shapes differ significantly, the test may compare differences in distribution rather than medians.
Random Sampling
Nonparametric tests also require that the samples are randomly drawn from the population to ensure generalizability. Although this is a general assumption across all statistical tests, it is often emphasized in nonparametric analysis due to its broader applicability.
Nonparametric tests also require that the samples are randomly drawn from the population to ensure generalizability. Although this is a general assumption across all statistical tests, it is often emphasized in nonparametric analysis due to its broader applicability.
Robustness Against Outliers
While not an assumption, a key feature of nonparametric tests is their robustness to outliers and skewed data. This makes them particularly useful for small samples or data sets with extreme values.
While not an assumption, a key feature of nonparametric tests is their robustness to outliers and skewed data. This makes them particularly useful for small samples or data sets with extreme values.
Conclusion
Understanding the assumptions underlying parametric and nonparametric statistics is critical for selecting the appropriate analytical method. Parametric tests, with their strong statistical power, require stringent assumptions such as normality and homogeneity of variance. In contrast, nonparametric methods offer flexibility and fewer assumptions, making them suitable for non-normal or ordinal data. However, they still rely on principles like independence and random sampling. A careful assessment of these assumptions ensures valid conclusions and minimizes the risk of erroneous interpretations in statistical analysis.
Question:-2
Compute One Way ANOVA (parametric statistics) for the following data:
Scores obtained on Emotional Intelligence Scale
Group A 3 3 2 2 4 3 2 1 2 4
Group B 4 1 1 2 3 4 2 2 3 4
Group C 3 4 5 3 2 4 4 2 2 2
Group A 3 3 2 2 4 3 2 1 2 4
Group B 4 1 1 2 3 4 2 2 3 4
Group C 3 4 5 3 2 4 4 2 2 2
Answer:
To compute a One-Way ANOVA for the given data, we’ll follow these steps: calculate the group means, the overall mean, the between-group sum of squares (SSB), the within-group sum of squares (SSW), the degrees of freedom, the mean squares, and finally the F-statistic. Let’s break it down.
Step 1: Organize the Data and Calculate Means
The data is already provided for three groups (A, B, and C), each with 10 observations.
-
Group A: 3, 3, 2, 2, 4, 3, 2, 1, 2, 4
Sum = 3 + 3 + 2 + 2 + 4 + 3 + 2 + 1 + 2 + 4 = 26
Mean (A) = 26 / 10 = 2.6 -
Group B: 4, 1, 1, 2, 3, 4, 2, 2, 3, 4
Sum = 4 + 1 + 1 + 2 + 3 + 4 + 2 + 2 + 3 + 4 = 26
Mean (B) = 26 / 10 = 2.6 -
Group C: 3, 4, 5, 3, 2, 4, 4, 2, 2, 2
Sum = 3 + 4 + 5 + 3 + 2 + 4 + 4 + 2 + 2 + 2 = 31
Mean (C) = 31 / 10 = 3.1 -
Overall Mean: Total sum = 26 + 26 + 31 = 83
Total observations = 10 + 10 + 10 = 30
Overall Mean = 83 / 30 ≈ 2.7667
Step 2: Compute the Sums of Squares
Between-Group Sum of Squares (SSB)
SSB = Σ(nᵢ * (meanᵢ – overall mean)²), where nᵢ is the number of observations in each group (10 here).
- Group A: 10 * (2.6 – 2.7667)² = 10 * (-0.1667)² = 10 * 0.02778 ≈ 0.2778
- Group B: 10 * (2.6 – 2.7667)² = 10 * (-0.1667)² ≈ 0.2778
- Group C: 10 * (3.1 – 2.7667)² = 10 * (0.3333)² = 10 * 0.11109 ≈ 1.1109
SSB = 0.2778 + 0.2778 + 1.1109 ≈ 1.6665
Within-Group Sum of Squares (SSW)
SSW = Σ(x – mean)² for each group, summed across all groups.
-
Group A: Mean = 2.6
(3-2.6)² + (3-2.6)² + (2-2.6)² + (2-2.6)² + (4-2.6)² + (3-2.6)² + (2-2.6)² + (1-2.6)² + (2-2.6)² + (4-2.6)²
= 0.16 + 0.16 + 0.36 + 0.36 + 1.96 + 0.16 + 0.36 + 2.56 + 0.36 + 1.96 = 8.4 -
Group B: Mean = 2.6
(4-2.6)² + (1-2.6)² + (1-2.6)² + (2-2.6)² + (3-2.6)² + (4-2.6)² + (2-2.6)² + (2-2.6)² + (3-2.6)² + (4-2.6)²
= 1.96 + 2.56 + 2.56 + 0.36 + 0.16 + 1.96 + 0.36 + 0.36 + 0.16 + 1.96 = 12.4 -
Group C: Mean = 3.1
(3-3.1)² + (4-3.1)² + (5-3.1)² + (3-3.1)² + (2-3.1)² + (4-3.1)² + (4-3.1)² + (2-3.1)² + (2-3.1)² + (2-3.1)²
= 0.01 + 0.81 + 3.61 + 0.01 + 1.21 + 0.81 + 0.81 + 1.21 + 1.21 + 1.21 = 10.9
SSW = 8.4 + 12.4 + 10.9 = 31.7
Total Sum of Squares (SST)
SST = SSB + SSW = 1.6665 + 31.7 ≈ 33.3665
(Alternatively, SST can be computed directly as Σ(x – overall mean)², but the above confirms our calculations.)
(Alternatively, SST can be computed directly as Σ(x – overall mean)², but the above confirms our calculations.)
Step 3: Degrees of Freedom
- Between groups (dfb): k – 1 = 3 – 1 = 2
- Within groups (dfw): N – k = 30 – 3 = 27
- Total (dft): N – 1 = 30 – 1 = 29
Step 4: Mean Squares
- Mean Square Between (MSB): SSB / dfb = 1.6665 / 2 ≈ 0.8333
- Mean Square Within (MSW): SSW / dfw = 31.7 / 27 ≈ 1.1741
Step 5: F-Statistic
F = MSB / MSW = 0.8333 / 1.1741 ≈ 0.7097
Step 6: Interpret the Result
The F-statistic is approximately 0.71. To determine significance, we’d compare this to the critical F-value for dfb = 2 and dfw = 27 at a chosen significance level (e.g., α = 0.05). The critical F-value for F(2,27) at α = 0.05 is approximately 3.35 (based on standard F-tables). Since 0.71 < 3.35, we fail to reject the null hypothesis, suggesting no significant difference between the group means on the Emotional Intelligence Scale.
ANOVA Table
Source | SS | df | MS | F |
---|---|---|---|---|
Between Groups | 1.6665 | 2 | 0.8333 | 0.71 |
Within Groups | 31.7 | 27 | 1.1741 | |
Total | 33.3665 | 29 |
Conclusion
The F-statistic is 0.71, indicating no significant difference between the groups (p > 0.05, assuming α = 0.05). The means of the groups (2.6, 2.6, 3.1) are not statistically different based on this ANOVA test.
Question:-3
Write a short note within 200 words on explain the concept of inferential statistics.
Answer:
Inferential Statistics: A Short Note
Inferential statistics is a branch of statistics that allows researchers to draw conclusions or make generalizations about a population based on data collected from a sample. Unlike descriptive statistics, which summarize data, inferential statistics use probability theory to make predictions and test hypotheses.
The core idea is that it’s often impractical or impossible to collect data from an entire population. Instead, a smaller, representative sample is studied. From this sample, researchers use techniques like hypothesis testing, confidence intervals, t-tests, chi-square tests, and ANOVA to estimate population parameters and assess relationships or differences.
For example, rather than surveying every citizen’s opinion, a pollster might sample a few hundred people and use inferential methods to predict national opinion trends with a known level of confidence.
Inferential statistics rely on key assumptions, including random sampling and normal distribution, to ensure accuracy and reliability. These methods are fundamental in scientific research, market analysis, policy studies, and more, enabling informed decision-making based on limited data.
Question:-4
Compute Chi square for the following data:
Junior Managers | Senior Managers | ||
---|---|---|---|
Males | 10 | 5 | |
Females | 6 | 4 |
Answer:
To compute the Chi-Square test for the given 2×2 contingency table, we’ll test for independence between gender (Males, Females) and managerial level (Junior Managers, Senior Managers). Let’s go step-by-step.
Step 1: Set Up the Contingency Table
The observed frequencies are:
Junior Managers | Senior Managers | Row Totals | |
---|---|---|---|
Males | 10 | 5 | 15 |
Females | 6 | 4 | 10 |
Column Totals | 16 | 9 | 25 |
- Grand total = 25
Step 2: State the Hypotheses
- Null Hypothesis (H₀): Gender and managerial level are independent (no association).
- Alternative Hypothesis (H₁): Gender and managerial level are not independent (there is an association).
Step 3: Calculate Expected Frequencies
The expected frequency for each cell is calculated as:
- Males, Junior Managers:
E_(11)=(15 xx16)/(25)=(240)/(25)=9.6 E_{11} = \frac{15 \times 16}{25} = \frac{240}{25} = 9.6 - Males, Senior Managers:
E_(12)=(15 xx9)/(25)=(135)/(25)=5.4 E_{12} = \frac{15 \times 9}{25} = \frac{135}{25} = 5.4 - Females, Junior Managers:
E_(21)=(10 xx16)/(25)=(160)/(25)=6.4 E_{21} = \frac{10 \times 16}{25} = \frac{160}{25} = 6.4 - Females, Senior Managers:
E_(22)=(10 xx9)/(25)=(90)/(25)=3.6 E_{22} = \frac{10 \times 9}{25} = \frac{90}{25} = 3.6
Expected frequencies table:
Junior Managers | Senior Managers | |
---|---|---|
Males | 9.6 | 5.4 |
Females | 6.4 | 3.6 |
Step 4: Compute the Chi-Square Statistic
The Chi-Square statistic is calculated using the formula:
where O O is the observed frequency and E E is the expected frequency.
- Males, Junior Managers:
((10-9.6)^(2))/(9.6)=((0.4)^(2))/(9.6)=(0.16)/(9.6)~~0.0167 \frac{(10 – 9.6)^2}{9.6} = \frac{(0.4)^2}{9.6} = \frac{0.16}{9.6} \approx 0.0167 - Males, Senior Managers:
((5-5.4)^(2))/(5.4)=((-0.4)^(2))/(5.4)=(0.16)/(5.4)~~0.0296 \frac{(5 – 5.4)^2}{5.4} = \frac{(-0.4)^2}{5.4} = \frac{0.16}{5.4} \approx 0.0296 - Females, Junior Managers:
((6-6.4)^(2))/(6.4)=((-0.4)^(2))/(6.4)=(0.16)/(6.4)=0.025 \frac{(6 – 6.4)^2}{6.4} = \frac{(-0.4)^2}{6.4} = \frac{0.16}{6.4} = 0.025 - Females, Senior Managers:
((4-3.6)^(2))/(3.6)=((0.4)^(2))/(3.6)=(0.16)/(3.6)~~0.0444 \frac{(4 – 3.6)^2}{3.6} = \frac{(0.4)^2}{3.6} = \frac{0.16}{3.6} \approx 0.0444
Summing these:
Step 5: Determine Degrees of Freedom
Degrees of freedom (df) for a contingency table is:
Step 6: Compare with Critical Value or Find p-Value
The Chi-Square statistic is 0.1157 with df = 1. To determine significance, we compare this to the critical value for a chosen significance level (e.g., α = 0.05). The critical value for χ² with df = 1 at α = 0.05 is 3.841.
Since 0.1157 < 3.841, we fail to reject the null hypothesis. Alternatively, the p-value for χ² = 0.1157 with df = 1 is approximately 0.734 (based on standard Chi-Square distribution tables), which is much greater than 0.05, confirming we fail to reject H₀.
Conclusion
The Chi-Square statistic is 0.1157 (df = 1, p ≈ 0.734). There is no significant association between gender and managerial level at α = 0.05, meaning the distribution of males and females across junior and senior managers appears independent.
Question:-5
Compute One Way ANOVA (parametric statistics) for the following data:
Scores obtained on Emotional Intelligence Scale
Group A 3 3 2 2 4 3 2 1 2 4
Group B 4 1 1 2 3 4 2 2 3 4
Group C 3 4 5 3 2 4 4 2 2 2
Group A 3 3 2 2 4 3 2 1 2 4
Group B 4 1 1 2 3 4 2 2 3 4
Group C 3 4 5 3 2 4 4 2 2 2
Answer:
Let’s compute a One-Way ANOVA for the given data. We’ll calculate the group means, sums of squares, degrees of freedom, mean squares, and F-statistic to determine if there’s a significant difference between the groups.
Step 1: Organize Data and Calculate Means
-
Group A: 3, 3, 2, 2, 4, 3, 2, 1, 2, 4
Sum = 26
Mean = 26 / 10 = 2.6 -
Group B: 4, 1, 1, 2, 3, 4, 2, 2, 3, 4
Sum = 26
Mean = 26 / 10 = 2.6 -
Group C: 3, 4, 5, 3, 2, 4, 4, 2, 2, 2
Sum = 31
Mean = 31 / 10 = 3.1 -
Overall Mean: Total sum = 26 + 26 + 31 = 83
Total observations = 30
Overall Mean = 83 / 30 ≈ 2.7667
Step 2: Compute Sums of Squares
Between-Group Sum of Squares (SSB)
SSB = Σ(nᵢ * (meanᵢ – overall mean)²), where nᵢ = 10.
- Group A: 10 * (2.6 – 2.7667)² = 10 * (-0.1667)² ≈ 10 * 0.02778 ≈ 0.2778
- Group B: 10 * (2.6 – 2.7667)² ≈ 0.2778
- Group C: 10 * (3.1 – 2.7667)² = 10 * (0.3333)² ≈ 10 * 0.11109 ≈ 1.1109
SSB ≈ 0.2778 + 0.2778 + 1.1109 ≈ 1.6665
Within-Group Sum of Squares (SSW)
- Group A: Σ(x – 2.6)² = 0.16 + 0.16 + 0.36 + 0.36 + 1.96 + 0.16 + 0.36 + 2.56 + 0.36 + 1.96 = 8.4
- Group B: Σ(x – 2.6)² = 1.96 + 2.56 + 2.56 + 0.36 + 0.16 + 1.96 + 0.36 + 0.36 + 0.16 + 1.96 = 12.4
- Group C: Σ(x – 3.1)² = 0.01 + 0.81 + 3.61 + 0.01 + 1.21 + 0.81 + 0.81 + 1.21 + 1.21 + 1.21 = 10.9
SSW = 8.4 + 12.4 + 10.9 = 31.7
Total Sum of Squares (SST)
SST = SSB + SSW = 1.6665 + 31.7 ≈ 33.3665
Step 3: Degrees of Freedom
- Between groups (dfb): 3 – 1 = 2
- Within groups (dfw): 30 – 3 = 27
- Total (dft): 30 – 1 = 29
Step 4: Mean Squares
- MSB = SSB / dfb = 1.6665 / 2 ≈ 0.8333
- MSW = SSW / dfw = 31.7 / 27 ≈ 1.1741
Step 5: F-Statistic
F = MSB / MSW = 0.8333 / 1.1741 ≈ 0.7097
Step 6: Interpret
The F-statistic is 0.71. The critical F-value for df(2,27) at α = 0.05 is approximately 3.35. Since 0.71 < 3.35, we fail to reject the null hypothesis.
ANOVA Table
Source | SS | df | MS | F |
---|---|---|---|---|
Between Groups | 1.6665 | 2 | 0.8333 | 0.71 |
Within Groups | 31.7 | 27 | 1.1741 | |
Total | 33.3665 | 29 |
Conclusion
The F-statistic is 0.71 (p > 0.05). There’s no significant difference between the group means (2.6, 2.6, 3.1) on the Emotional Intelligence Scale.
Question:-6
Explain the computation of one sample median test with the help of suitable example.
Answer:
One-Sample Median Test: A Short Note
The one-sample median test is a nonparametric statistical test used to determine whether the median of a single sample differs significantly from a hypothesized median. It is especially useful when the data are not normally distributed or when the sample size is small.
Steps in Computation:
-
State the Hypotheses:
- Null Hypothesis (H₀): The sample median equals the hypothesized median.
- Alternative Hypothesis (H₁): The sample median differs from the hypothesized median.
-
Choose a Hypothesized Median:
Suppose we have the data: 5, 7, 9, 6, 8, 10, 4
Hypothesized median = 7 -
Count Observations:
- Values greater than 7: 8, 9, 10 → 3 values
- Values less than 7: 5, 6, 4 → 3 values
(Ignore values equal to the hypothesized median for the test.)
-
Use Binomial Test:
Apply a binomial test to determine if the number of observations above and below the hypothesized median differ significantly. Here, n = 6 and p = 0.5.
Since the counts above and below are equal, we fail to reject the null hypothesis, concluding that the sample median is not significantly different from 7. This method is simple, distribution-free, and effective for ordinal or skewed data.
Question:-7
Write a short note within 200 words on what is Kruskal Wallis ANOVA test? Compare between Kruskal Wallis ANOVA tests and one way ANOVA parametric test.
Answer:
Kruskal-Wallis ANOVA Test: A Short Note
The Kruskal-Wallis ANOVA is a nonparametric statistical test used to determine whether there are statistically significant differences between the medians of three or more independent groups. It is the nonparametric alternative to the one-way ANOVA and is used when the assumptions of normality or equal variances are not met.
Instead of comparing group means, the Kruskal-Wallis test ranks all the data from all groups together and then analyzes the distribution of ranks among the groups. It is particularly suitable for ordinal data or when sample sizes are small or unequal.
Comparison with One-Way ANOVA:
Feature | One-Way ANOVA | Kruskal-Wallis ANOVA |
---|---|---|
Type | Parametric | Nonparametric |
Assumptions | Normal distribution, equal variances | No assumption of normality or equal variances |
Data type | Interval/ratio scale | Ordinal or non-normal interval data |
Test statistic | F-value (based on variance) | H-value (based on ranks) |
Comparison | Means | Medians or rank distributions |
In summary, the Kruskal-Wallis test is more robust when data violate parametric assumptions, making it a valuable alternative to one-way ANOVA in real-world, non-ideal conditions. However, it is generally less powerful when the assumptions of parametric tests are satisfied.
Question:-8
Write a short note within 200 words on explain the procedure for computation of mean and standard deviation using Microsoft Excel.
Answer:
Computation of Mean and Standard Deviation Using Microsoft Excel: A Short Note
Microsoft Excel provides a quick and efficient way to compute the mean and standard deviation of a data set using built-in functions. These measures are essential in descriptive statistics for understanding the central tendency and dispersion of data.
Procedure:
-
Enter the Data:
Begin by entering your numerical data in a single column (e.g., A1 to A10). -
Calculate the Mean:
Click on an empty cell where you want the result.
Type the formula:=AVERAGE(A1:A10)
Press Enter. Excel will return the mean (average) of the selected values. -
Calculate the Standard Deviation:
Click on another empty cell.
For a sample, type:=STDEV.S(A1:A10)
For a population, type:=STDEV.P(A1:A10)
Press Enter. Excel will display the standard deviation.