11
Comparing Multiple Means: Analysis of Variance
Introduction
A student's T-test can be used to compare two samples. Often, it is necessary to compare more than two samples. Analysis of variance (ANOVA) is a powerful statistical analysis tool that can be used to compare the mean of three or more populations efficiently. An interesting fact to remember is that the analysis of variance actually compares the means. In ANOVA, variation is partitioned using the sum of squares. This statistical analysis tool is used to test hypotheses about differences between two or more means. The null hypothesis in ANOVA can be stated as:
Alternate hypothesis is H1: Any of the population means is different.
ANOVA can be one-way when only one factor is analyzed. Similarly, if more factors are analyzed, we call it two-way, three-way, etc. Thus, ANOVA is an appropriate statistical tool for analysis in design of experiments (DOE).
One-Way ANOVA
The procedure for one-way ANOVA is explained with the help of an example. Let us assume that an auto magazine wants to compare the performance of three models of motorcycles for fuel economy, measured as kilometers/liter (kmpl) of petrol. Their evaluation team decides to take three samples of each bike and test it on a road that is generally level and does not have much traffic. The team also decides that the riders should maintain a speed of 40 kilometers per hour. This is to assure that the effect of other factors on fuel consumption is minimized. After completion of all the runs, the data is collected as shown in the Figure 11.1.
We want to assess whether the three models of motorcycle have the same fuel consumption. Our assumption in ANOVA is that variance within each of the three populations is equal. Null hypothesis is H_{0}:μ_{A} = μ_{B} = μ_{C}. The sample size n for each motorcycle type is n_{A}= n_{B} = n_{C} = 3.
There are three models of motorcycles. These are called levels of the factor or treatments. In general the total number of levels is k. In one-way ANOVA, we have only one factor. In ANOVA, variation within each level is compared with variation between levels. Variation is quantified as variances. The ANOVA ‘model’ can be stated as
Figure 11.1 Data of fuel economy of three models of motorcycle
where | |
i | is the treatment number and j is the observation number under i^{th} treatment |
μ | is the overall mean, |
τ_{i} | is effect of i^{th} treatment and |
ε_{ij} | is random error component |
Under null hypothesis, τ_{1} = τ_{2} = τ_{3}= 0. This is known as the linear statistical model of one-way ANOVA. The data is shown in Figure 11.1 as per this convention. The overall mean is 43.111 and the means of the three motorcycles can be easily calculated as 46, 41.33 and 42.
It is recommended that the results be reviewed graphically. The individual value plot of the data is shown in Figure 11.2. The figure also illustrates variation within the model (or treatment). Our objective in ANOVA is to compare variation within each level of the factor with variation between the levels. The variation ‘between’ is represented by the magnitudes of τ_{1} τ_{2} and τ_{3}. Under null hypothesis, the mean at each level will be randomly distributed around the overall mean with standard deviation
If the null hypothesis is true, the entire set of nine readings can be considered as a sample drawn from the same population with variance σ^{2}. In ANOVA, we compare two different estimates of variance σ^{2}.
Figure 11.2 Variation within and between different models of motorcycle
(Courtesy Institute of Quality and Reliability)
- First estimate the variance ‘among’ or ‘between’ sample means, i.e., 46, 41.333 and 42. The population variance can then be estimated based on the central limit theorem as follows:
- In the second case, estimate variance ‘within’ the three samples. Then compare the two estimates of σ^{2} using F-Test for equal variances.
Steps in ANOVA
Variance is calculated as The fundamental technique is a partitioning of the total sum of squares into components. It involves the following steps:
- Calculate the total of the sum of squares.
- Estimate the sum of squares due to factor or model.
- Calculate error or residual sum of squares.
- Find the degrees of freedom for the factor and residual error.
- Construct ANOVA table. Divide the sum of squares by respective degrees of freedom to estimate the variance or mean square (MS). Compare MS for the factor with MS for error. Perform F-test to assess whether the MS due to factor is significantly large. If it is significantly large, the factor has statistically significant effect.
Step 1: Calculate the total sum of squares (SS_{T})
The total sum of squares SS_{T} is the numerator in the standard deviation An equivalent formula can also be used.
The term is called correction for mean. Note that where T is the total of all N observations. In this example, the overall mean is 43.1111 and N = 9.
The total sum of squares SS_{T} is 54.89. (Figure 11.3 shows the calculation.) This represents total variation.
Figure 11.3 Calculation of total sum of squares
Step 2: Calculate the sum of squares ‘between’ or due to factor (SS_{between})
This represents the squares of the difference between the mean of motorcycle models and the overall mean.
Where n_{A}, n_{B} and n_{C} are number of observations for the three motorcycle makes A, B and C respectively. An equivalent formula is
Figure 11.4 illustrates the calculation of SS_{between}. The square of the difference between = 46 and = 43.111 is 8.346. As there are three samples of A, it must be counted thrice. Following a similar procedure for models B and C, the sum of squares between or due to motorcycle models is 38.222. This is also called the sum of squares between treatments.
Figure 11.4 Calculation of the sum of squares ‘between treatments’
Step 3: Calculate the residual or error sum of squares (SS_{within})
Calculate the difference between each individual reading and mean of the particular make of motorcycle. Each difference is called ‘residual’ or ‘random error’ and is denoted as ϵ_{i}. Refer to Figure 11.5 and Table 11.1. The error sum of squares is 16.667.
In one-way ANOVA, the sum of squares can be partitioned as
Step 4: Calculate the degrees of freedom in one-way ANOVA
Total number of observations | = N |
Total degrees of freedom | = N – 1 |
Number of levels of the factor | = k |
Degrees of freedom for the factor | = k– 1 |
Degrees of freedom for error | = (N – 1) –(k – 1) = N– k |
In this example:
Total number of observations N | = 9 |
Total degrees of freedom(N – 1) | = 9 – 1 = 8 |
Figure 11.5 Calculation of residuals
Table 11.1 ANOVA table for motorcycle fuel economy
Number of levels of the factor (k) | = 3 |
Degrees of freedom for the factor (k - 1) | = 3 − 1 = 2 |
Degrees of freedom for error (N - k) | = (9 − 3) = 6 |
Step 5: Construct the ANOVA table
It is convenient to summarize the calculations in an ANOVA table as shown in Table 11.1.
The ANOVA table (11.1) shows the calculation of mean square and the calculated F-ratio. If the means of three motorcycle models are not different, the two variances will be equal and F-ratio will be lower than the critical value. The critical value of F-distribution for 2 numerator df and 6 denominator df at 95 percent confidence level is 5.14 (from Table T3 in the CD). The calculated F-ratio 6.88 is larger than the critical value of F. Thus we must reject the null hypothesis and conclude that the motorcycle models are significantly different at 95 percent confidence level.
P-Value: Most of the statistical softwares calculate the P-value for hypothesis testing and ANOVA. P-value is the smallest value of a risk that would lead to the rejection of null hypothesis. P-value measures the amount of statistical evidence that supports the null hypothesis. The lower the evidence to support null hypothesis, the smaller is the P-value.
Figure 11.6 shows that the area to the right of critical value for α = 0.05 is 5.14. The figure also shows the calculated F-value of 6.88. The area to the right of 6.88 is 0.028 and it is referred to as P-value. If F_{calc} > F_{crit}, P-value will be lower than α-risk. When P-value is lower than α-risk, we must reject the null hypothesis.
Figure 11.6 α-risk and p-value
As 0.028 is lower than alpha risk of 0.05, we must reject H_{0} and conclude that at least one of the population means is significantly different. Thus, at least one of the motorcycle models has a significantly different mileage.
Figure 11.7 Pie chart of the sum of squares between and within treatments
Calculating and Interpreting R^{2}
In one-way ANOVA, the equation for the sum of squares can be written as
The SS_{between} is the ‘explained’ portion of the variation. The SS_{within} is also called the error sum of squares (SS_{E}). The term error is used for the unexplained portion of variation.
The term R^{2} is used for the proportion of variation explained by the ‘model’. It can be written as
A pie-chart can be easily plotted on Excel and is shown in Figure 11.7. This is useful for visual presentations, especially when more factors and treatments are being analyzed. Thus motorcycle types are able to explain 69.64 percent of the variation in mileage. The remaining 30.36 percent variation is not explained by the ‘ANOVA model’. We must look for more factors in our model and analysis. This requires revisiting the process map with the team members.
Assumptions of ANOVA
ANOVA is performed with some assumptions. It is essential that the validity of these is confirmed while performing ANOVA. The assumptions are as given below:
- Population variances of the output are equal across all levels of the given factor. This is also called homogeneity of variances or homoscedasticity. This assumption can be tested with the test for equal variances procedure.
- Residuals (or errors) are independently and normally distributed with mean = 0 and a constant variance
- Response means are independent and normally distributed
Model adequacy is validated by examining the residuals. These should be normally distributed and random and should not exhibit any patterns. If the ANOVA model is valid, residuals should be ‘structureless’ or should not be related to any other variable including the predicted response. Let us recall the ANOVA model
where | |
y_{ij} | is the observed reponse of i^{th} treatment and j^{th} observation number i^{th} under treatment |
μ | is the overall mean, |
τ_{i} | is effect of i^{th} treatment and |
ε_{ij} | is random error component |
Predicted response ŷ_{ij} is given by:
In this example,
μ = 43.111, τ_{1} = 2.889, τ_{2} = – 1.778, τ_{3} = – 1.111 so that
Table 11.2 shows the predicted response and residuals.
Table 11.2 ANOVA model using motorcycles as an example
Equality of Variances
Refer to graph (in Figure 11.8) of residuals vs fitted values. The fitted values are 46, 41.333 and 42. The spread should be generally random and equal above and below 0. This graph also provides visual information as to whether variation or variances across all treatments are equal.
Figure 11.8 Residuals vs fitted values
Normality of Residuals
The histogram, normal probability plot, and run chart of residuals should be examined. Refer to Figure 11.9 for histogram and normal probability plot. Histogram of small samples are not meaningful. Thus, in this example, it is better to observe the normal probability plot. P-value for Anderson Darling test of normality for the residuals is 0.116. Hence, it can be concluded that in this case residuals are normally distributed. Balanced ANOVA is robust to minor deviations from normality. In model validation, normal probability plots are visually assessed and the Anderson Darling test is not often conducted in balanced ANOVA.
Figure 11.9 Histogram and normal probability plot of residuals
In addition, residuals are also plotted in the sequence in which observations are made. The sequence is usually random in order to allow even distribution of the unknown portion of variation. This strategy is called randomization and is discussed in greater detail in Chapter 14 on design of experiments. Usually, ANOVA is used to analyze results of DOE. Figure 11.10 shows the time series plot. In this case the observations are not random. However, the time series plot does not show any obvious patterns.
Figure 11.10 Residuals vs order of data
Independence of Residuals
Residuals are plotted in the order of data (see Figure 11.10). This is a plot of all residuals in the same order in which the data was collected and can be used to find non-random error, especially of time-related effects. This plot helps you to check the assumption that there is no correlation between the residuals.
In addition to the graphical assessment, equality of variances can be analyzed by using tests of equal variances.
Minitab commands > Stat > Anova > test for equal variances
SigmaXL commands > SigmaXL > Statistical Tools > Equal Variance Tests
Output of tests using Minitab is shown in Figure 11.11.
Figure 11.11 Testing for equality of variances
Statisticians recommend the use of Levene's test when the data come from continuous, but not necessarily normal, distributions. This method considers the distances of the observations from their sample median rather than their sample mean and makes the test more robust for smaller samples. Bartlett's test is recommended only when the data come from normal distributions as this test is not robust to departures from normality (Montgomery 2004).
As the P-value in Levene's test is 0.842, i.e., > 0.05, variances can be considered equal. For details of these test procedures, refer to Design and Analysis of Experiments by Montgomery.
Minitab software provides graphical output of ‘four-in-one’ for checking model adequacy. We conclude that the ANOVA model is adequate and valid. Similar graphical output is available on most other softwares including SigmaXL.
ANOVA Using Software
Try using Minitab for one-way ANOVA: > Stat > ANOVA > One-way
In SigmaXL use commands: > SigmaXL > Statistical Tools > One-Way ANOVA and Means Matrix
Exercise on One-Way ANOVA
Table 11.3 gives the data for the compressive strength of Portland cement with four mixing techniques: A, B, C and D. Do you see any significant difference at 95 percent confidence? First, perform ANOVA without using software and then try using the software.
Table 11.3 Compressive strength of cement
Fixed Effects and Random Effects Models
The conclusion of ANOVA in the example of motorcycles would apply to the three motorcycle types only and cannot be extended to similar other types of motorcycle that were not included in the study. Such an ANOVA model is called fixed effects model. The other type of ANOVA model is random effects model (REM). In REM, we take a sample of treatments from a larger population of treatments. In this example, the auto magazine took three models of motorcycle but if they wish to extend the conclusions to other similar models, they could use the REM. We will, however, limit our scope to fixed effects model. For REM, refer to Design and Analysis of Experiments by Douglas Montgomery. The REM is used in Repeatability and Reproducibility Study. We select three appraiser but we intend to extend our conclusions to all appraisers.
Two-Way ANOVA
If there are two factors, A and B, to be compared, this can be done with two-way ANOVA. The total sum of squares will be partitioned as
where
A and B are two factors under study. Sometimes, these are called row and column factors.
SS_{A} is sum of squares due to A.
SS_{B} sum of squares due to B.
Note the addition of interaction term SS_{AB} in the equation. SS_{AB} is the sum of squares due to interaction of A and B. Interaction means the combined effect of factors A and B. Interaction is discussed in detail in Chapter 14 on design of experiments.
Degrees of Freedom for Two-Way ANOVA
• | N is the total number of data points | |
• | n_{a} is the number of levels of the factor A | |
• | n_{b} is the number of levels of the factor B | |
• | Total degrees of freedom | = (N – 1) |
• | Degrees of freedom for factor A | = n_{a} – 1 |
• | Degrees of freedom for factor B | = (n_{b}–1) |
• | Degrees of freedom for interaction A*B | =(n_{a} – 1)(n_{b} –1) |
• | Degrees of freedom for error | = (N – 1) –(n_{a}– 1) – (n_{b} – 1) – (n_{a} – 1)(n_{b} –1) = N– n_{a} n_{b} |
Application Example for Two-Way ANOVA
Consider an experiment on aluminum castings. Customer requires hardness to be controlled. Hardness is an important critical to quality characteristic (CTQ). We, therefore, want to evaluate the effect of two factors on hardness (y). The two potential contributors are the percentage of copper and of magnesium. We have controlled copper percentage at 3.5 and 4.5 percent and magnesium at 1.2 and 1.8 percent. Hardness was measured for two samples from each treatment. Data is shown in Table 11.4. Data is also available in Excel file ‘ANOVA Hardness.xls.’.
Table 11.4 Data of two-way ANOVA
Magnesium |
Copper |
Hardness (Y) |
---|---|---|
1.2 |
3.5 |
76 |
1.2 |
3.5 |
78 |
1.8 |
3.5 |
77 |
1.8 |
3.5 |
78 |
1.2 |
4.5 |
73 |
1.2 |
4.5 |
74 |
1.8 |
4.5 |
79 |
1.8 |
4.5 |
80 |
Calculation of Two-Way ANOVA
Two-way ANOVA will require the calculation of
- Total sum of squares (SS_{T})
- Sum of squares due to copper (SS_{Cu})
- Sum of squares due to magnesium (SS_{Mg})
- Sum of squares due to interaction (SS_{CuxMg}) Sum of Squares due to error or residuals (SSE)
- Degrees of freedom for Mg, Cu, interaction and error
- Mean square values for each of these (SS/DF)
- F-ratio for Cu, Mg and interaction
The total sum of squares SS_{T} equals 40.875 after correction for mean (CM) as shown in Table 11.5 using the formula
Table 11.5 Total sum of squares
Now calculate the sum of squares due to magnesium SS_{Mg}. Refer to Table 11.6. Total for hardness is 301 for the four trials when Mg is at 1.2 and it is 314 for the other four trials when Mg is at 1.8. The sum of squares due to magnesium is 21.125 as shown in Table 11.6.
Table 11.6 Calculation table for the sum of squares due to magnesium
Similarly, the sum of squares due to copper SS_{Cu} can be calculated as 1.125 (Table 11.7). Note that the table is sorted by Copper Levels for easy calculation.
Table 11.7 Calculation table for SSCu sum of squares due to copper
For calculation of the sum of squares due to interaction SS_{CuxMg}, illustrated in Table 11.8, consider each treatment combination of copper and magnesium levels. Please note that there are two observations for each treatment combination. While calculating SS_{CuxMg}, SS_{Cu} and SS_{Mg} must be subtracted.
For calculating error sum of square SS_{error}, consider the relation
As all terms except SS_{error} are known it can easily be calculated as SS_{error} = 3.5.
The degrees of freedom are as follows:
N is the total number of data points | = 8 |
n_{a} is the number of levels of the factor A (Mg) | = 2 |
n_{b} is the number of levels of the factor B (Cu) | = 2 |
Total degrees of freedom = N – 1 = 8 – 1 | = 7 |
Degrees of freedom for factor A = n_{a} – 1 = 2 –1 | = 1 |
Degrees of freedom for factor B = n_{b} − 1 = 2 – 1 | = 1 |
Degrees of freedom for interaction A*B = (n_{a} –1)(n_{b} –1) = 1 × 1 | = 1 |
Degrees of freedom for error = N - n_{a}. n_{b} = 8 – 4 | = 4 |
ANOVA calculations are summarized in Table 11.9. In addition to the calculated and table values of F, P-value is also shown.
Table 11.8 Calculating SS_{CuxMg} sum of squares due to interaction
Table 11.9 ANOVA table for hardness experiment
Each MS value represents variance due to the source mentioned in the row. This is compared for each treatment with error variance to calculate the F-ratio. Assuming a risk of 0.05, we can conclude that magnesium and interaction between magnesium and copper have significant effect on the hardness as corresponding F_{calc} values exceed the critical value of F_{table} for these terms. The p-values in both these cases is < 0.05.
Values of sum of squares relative to the total sum of squares represent portion of variation that can be explained by the factors considered in ANOVA. We can also draw a pie-chart as shown in Figure 11.12.
Model Adequacy
Minitab output is reproduced in Figure 11.13 for checking model adequacy.
- Residuals are normally distributed. Histograms for small sizes are not meaningful.
- Residuals vs fits shows that variance within treatments is comparable.
- The residuals by observation order do not show any non-random pattern or runs.
Thus, we can conclude that the ANOVA model is valid.
Figure 11.12 Contribution of factors in variation
Three-Way and Multi-Way ANOVA
Three-way ANOVA is similar to two-way ANOVA except that the former is more complicated and difficult to calculate. However, given the modern computational facilities, we need not worry about calculations. It is not very difficult with Excel. The total sum of squares in three-way ANOVA is partitioned as
AB, BC and CA are two-factor interactions and ABC is a three-factor interaction. We will discuss more about interactions in Chapter 14.
Figure 11.13 Residual plots for hardness
Consider an experiment with three factors. The levels of the three factors A, B and C were n_{a} = 3, n_{b} = 3, n_{c} = 2, respectively. Degrees of freedom for three-way ANOVA will be as explained in Table 11.10.
Observe that the degrees of freedom for error term are 0. This means that we will not be able to estimate error variance as denominator is 0. If we really want to estimate the effect of interaction ABC, we must replicate the experiment the second time. This will increase the total degrees of freedom from 17 to 35, making 18 degrees of freedom available for error term. Replicating the complete experiment can be expensive and time-taking. Another approach used by experimenters is to omit the insignificant effects that have p-value > a so that degrees of freedom for the omitted factors or interactions are made available for estimating error variance.
Table 11.10 Degrees of freedom for three-way ANOVA
Calculations of three-way and multi-way ANOVA are tedious and usually performed using a software application.
Unbalanced ANOVA
In all the previous examples, each treatment had the same number of data points or runs. In real life, we may be required to analyze data with unequal number of data points for various treatments. In such cases, ANOVA can be performed using the same procedure. Unbalanced ANOVA is more sensitive to deviations from assumptions of ANOVA. It also has lower power as compared to balanced ANOVA (Montgomery 2004).
Summary
ANOVA is an effective statistical technique to compare means of samples from more than two populations. Depending on the number of factors in analysis, ANOVA can be one-way, two-way or multi-way. ANOVA finds extensive application in the analysis of designed experiments. In ANOVA, the total sum of squares is partitioned. Using degrees of freedom for each factor, mean square (i.e., variance) is estimated and compared with residual error variance. The ratio of the two variances is used to evaluate significance of the factors.
References
Anderson, David R. Denis J. Sweeney and Thomas A. Willians (2007). Statistics for Business and Economics. New Delhi: Thomson South-Western Division of Thompson Learning Inc.
Breyfogle III, Forrest W. (1999). Implementing Six Sigma: Smarter Solutions Using Statistical Methods. New York, NY: John Wiley & Sons.
Johnson, Richard (2005). Miller and Freund's Probability and Statistics for Engineers. USR, NJ: Prentice Hall Inc.
Montgomery, Douglas C. (2001). Statistical Quality Control. New York, NY: John Wiley & Sons.
Montgomery, Douglas C. (2004). Design and Analysis of Experiments. New York, NY: John Wiley & Sons.