13
Nonparametric Tests
Introduction
Most of the hypothesis tests that we learned about earlier assume normal distribution of data. However, this assumption is sometimes not right. Examples can be: data of project cycle time, lead time, time to repair machines, etc. Nonparametric tests—also known as distributionfree tests—can be performed when the data is clearly not normal and we do not know the shape of the distribution. The power of nonparametric tests is less than parametric tests and confidence intervals are wider.
There are various kinds of nonparametric tests. Some of these are given below:
 Sign Test
 Wilcoxon SignedRank Test
 Ranked Sum MannWhitney U Test
 KruskalWallis Test
 Mood's Median Test
 Friedman Test
 Levene's Test
Unless otherwise mentioned, for application examples in this chapter, data are available in Excel file ‘nonparametrics’ or Minitab worksheet with the same file name.
Sign Test
This is the simplest nonparametric test; an alternative to onesample and paired ttest. We find out the number of data points above (or below) the hypothesized median and test whether there are significant differences in these. Our objective is to check if the median of a population is equal to the specified target. We will study the procedure by citing examples.
Application Example
A company recently conducted a training program for call center staff. Table 13.1 shows the number of complaints received both before and after the training in comparable time period. Was the training effective? Assume confidence level of 95 percent. Data are available in Excel file worksheet ‘nonparametrics.xls’.
Since there are two samples, find the difference and its sign. If there is no difference, we will find approximately equal pluses and minuses. Effective training should essentially reduce the number of complaints after training, in which case we will find more minuses than pluses. Thus the median will be lower than 0, i.e., less complaints than before. If the median is η, the null and alternate hypotheses will be given by
Table 13.1 Complaints before and after training
Before  After  Improvement 

8 
6 
–2 
7 
5 
–2 
6 
8 
2 
9 
6 
–3 
7 
9 
2 
10 
8 
–2 
8 
10 
2 
6 
7 
1 
5 
5 
0 
8 
6 
–2 
10 
9 
–1 
8 
5 
–3 
As we can see in the table, there are 7 minuses, 4 pluses and 1 tie. The sample size is 12.
If the training had no effect, the hypothesized proportion under null hypothesis is p_{H0} = 0.5. Since we want to find improvement after training, we will exclude zeros.
Under null hypothesis, the probability of seven or more employees performing better can be calculated using Binomial distribution. P(>7) = 1 – P(≤6). Use the Excel function = Binomdist (number of successes, number of trials, probability, cumulative). Thus use = Binomdist (6, 11, 0.5, 1) to get a value of 0.7256. P(>7) = 1 – 0.7256 = 0.2744. This is the pvalue. If this was less than a risk of 0.05, we would have rejected H_{0}. Hence we will have to conclude that there is no significant improvement after the training.
We can use software to test the hypothesis.
Minitab commands: > Stat > Nonparametric > 1 − sample sign test (Test for Median = 0 vs Median < 0)
SigmaXL commands: > Nonparametric tests > 1 − sample sign
SigmaXL output is shown in Table 13.2. The pvalue is 0.274. We can, therefore, conclude that there is insufficient evidence of improvement after training.
Table 13.2 SigmaXL output for sign test
H_{0}: Median = 0  
Ha: Median Less Than 0  
Improvement  
Count (N)  12 
Median  –1.5 
Points Below 0  7 
Points Equal To 0  1 
Points Above 0  4 
Pvalue (1sided)  0.2744 
Exercises for Practice
A random sample of 10 measurements of octane number for gasoline is taken. The following data shows the values:
Data: 98.9, 98.3, 97.8, 98.4, 98.1, 98.0, 97.9, 98.3, 98.1, 98.2.
Is the median >98? Assume confidence level of 95 percent. (Ans: Yes)
A company has implemented safety training at its 10 different plants. Data of lost workerhours during the period of 3 months before and 3 months after the training are collected for the plants and are as follows:
Use sign test to assess whether safety training has reduced lost hours. Assume 95 percent confidence level. (Ans: Yes. Safety training has reduced lost hours)
Wilcoxon Signed Rank Test
This is a nonparametric alternative to the paired ttest. We first find the difference between the matched pairs and rank the absolute difference. Then we give the same sign to the ranks. The sum of signed ranks can be used to assess whether the differences are significant. The procedure is explained in the following example.
Application Example
Two possible methods for production are under consideration. The task is performed by 11 workers using both the methods. Data of production are shown in Table 13.3. Are the methods significantly different? Assume confidence level of 95 percent. (Adapted from Statistics for Business and Economics by David R. Anderson, Denis J. Sweeney and Thomas A. Willians.)
Find the differences between production using the two methods with sign. Rank the values of differences excluding zeroes. Ties are assigned average ranks. For example, ranks 3 and 4 have the same absolute difference of 0.4. Hence, the average of ranks 3 and 4 (i.e. 3.5) is taken as adjusted. Similarly, ranks 5 and 6 (having same difference) are averaged at 5.5. Then give the signs to the ranks as those of original differences. Table 13.4 illustrates the procedure. Data is sorted by absolute value of difference.
The sum of signed ranks T is distributed normally with mean of 0 and standard deviation of if n is 10 or more, where n is the number of nonzero comparisons. In this case n = 10. We can easily calculate the Z value using mean = 0 and the calculated value of standard deviation. Table 13.4 shows the procedure and calculations. In this case, the calculated Z value of 2.24 is larger than Z value of 1.96 at 95 percent. The corresponding pvalue is 0.0125. Thus we reject H_{0}. We conclude that there is a significant difference between the two methods.
Table 13.3 Data of production with two methods
Worker  Method A  Method B 

1 
10.2 
9.5 
2 
9.6 
9.8 
3 
9.2 
8.8 
4 
10.6 
10.1 
5 
9.9 
10.3 
6 
10.2 
9.3 
7 
10.6 
10.5 
8 
10 
10 
9 
11.2 
10.6 
10 
10.7 
10.2 
11 
10.6 
9.8 
Table 13.4 Illustration calculation of for Wilcoxon signed rank sum test
MannWhitney Rank Sum U Test
The MannWhitney test is used to compare two population medians. Assumptions for the MannWhitney test are
 Data are independent random samples from two populations that have the same shape and whose variances are equal and
 a scale that is continuous or ordinal (possesses natural ordering) if data is discrete.
In this test, the null hypothesis is
Data from both samples are ranked in ascending order and rank numbers are determined. Test statistic U is calculated based on rank numbers rather than on individual values. If numbers are equal, each of these will be considered as mean rank of the equal numbers. For example, if rank 4 and 5 were number 50, their ranks would be 4.5. The same rule applies to more than two equal numbers.
where n_{1} and n_{2} are sample sizes and R_{1} and R_{2} are sum of ranks in the samples 1 and 2 respectively. Mean and standard error of U statistic is given by
For sample sizes n_{1} and n_{2} larger than 10, U statistic is approximately normally distributed. Thus, we can estimate the Z value using these values of mean and standard deviation. By comparing this Z value with Z_{α/2} (orZ_{α/2}) we can conclude whether there is a significant difference in the two populations.
Application Example
A committee wants to assess whether the scores of two schools, A and B are comparable. Scores of 15 students from both the schools are shown in Table 13.5. What should the committee conclude at 95 percent confidence level? (Adapted from Statistics for Management by Richard I. Levin and David S. Rubin.)
Arrange the data in ascending order and then perform the calculations as shown in Table 13.6.
The calculated Z value is –0.6014. For 95 percent confidence, the critical lower limit is –1.96. As the calculated Z value is not in the critical zone, we cannot reject H_{0}. Thus the two schools are comparable.
Try using R_{2} instead of R_{1}. What do you observe?
Table 13.5 The scores of students from two schools
School A  School B 

1000 
920 
1100 
1120 
800 
830 
750 
1360 
1300 
650 
950 
725 
1050 
890 
1250 
1600 
1400 
900 
850 
1140 
1150 
1550 
1200 
550 
1500 
1240 
600 
925 
775 
500 
Table 13.6 Illustration of MannWhitney test calculation
KruskalWallis Test
The KruskalWallis test is used to compare two or more population medians. Assumptions are the same as for MannWhitney test:
 Data are independent random samples from the populations that have the same shape and whose variances are equal and
 a scale that is continuous or ordinal (possesses natural ordering) if data is discrete.
Here the null and alternate hypotheses are
H_{0}: η_{1} = η_{2} = η_{3}
H_{1}: In at least one pair, medians are not equal.
Data from all the samples are ranked and rank numbers are determined. Test statistic is calculated based on rank numbers rather than on individual values.
Application Example
Three operators work in a call center. Their call lengths are sampled; data is arranged in ascending order and shown in Table 13.7. Assess whether there is a significant difference in the call lengths. Historically, the call length data is not distributed normally. Assume confidence level of 95 percent.
Table 13.7 Illustration of KruskalWallis test
Arrange the data in ascending order and assign ranks. We compute the Kstatistic. (Some literature refers to this as H statistic.) The formula and calculations are shown in Table 13.7. Ranks of equal values are replaced as average ranks. For example, ranks 3 and 4 for time = 0.9 are replaced as 3.5. Similarly, ranks 5, 6, 7 for time = 1.3 are replaced by 6. K statistic is calculated as shown in the table.
The K statistic is approximated by chisquare distribution with (p – 1) degrees of freedom. As p=3, degrees of freedom is 2. The critical value of Chisquare distribution with 2 degrees of freedom is 5.991. As the calculated statistic 0.847 is less than 5.991 and therefore is not in the critical region, we cannot reject H_{0}. We conclude that there is no significant difference between operators.
Mood's Median Test
Mood's median test can be used to test the equality of medians from two or more populations and, like the KruskalWallis test, provides a nonparametric alternative to the oneway analysis of variance. Mood's median test is sometimes called a median test or sign scores test. Mood's median test is more robust than is the KruskalWallis test against outliers, but is less powerful for data from many distributions, including the normal. Mood's median test, being robust against outliers and errors in data, is particularly appropriate in the preliminary stages of analysis.
For Mood's median test the null and alternate hypotheses are
H_{0}: the population medians are all equal
H_{1}: the medians are not all equal.
Assumptions of Mood's Median Test
 Data from each population are independent and random
 Samples and the population distributions have the same shape.
Procedure for Mood's Median Test
Find the overall median for all the data pooled together. Construct a table with one column showing the number of readings above the overall median and another showing the number of readings below the overall median for each category. Half of the readings that are equal to the median should be counted in the ‘above’ column and half in the ‘below’ column. These are the observed frequencies. Under H_{0}, there will be equal number of samples above and below the overall median. This is the expected frequency. Now we can perform a Chisquare test to assess whether there is a significant difference between the observed and expected frequencies.
Application Example of Mood's Median Test
Three teaching methods are being evaluated. The first two columns of Table 13.8 show marks scored by students who were taught using different methods. (Source: Levin & Rubin, 1991).
Overall median is 80.5. For each data point, we have to find out whether the number is equal to or below or above the median and add 1 to count below/equal to or above respectively. If all three methods were alike, we should expect half the data points above and half below. Thus we get the observed and expected counts. A chisquare test is performed with (p – 1) degrees of freedom where p is the number of groups (3 methods in this example) being compared. Table 13.8 illustrates the calculations.
As the calculated chisquare statistic is less than the critical value at 95 percent confidence level, we cannot reject H_{0}. Thus we conclude that there is no significant difference between the teaching methods.
Table 13.8 Illustration of Mood's Median test
Friedman Test
The Friedman test is a nonparametric analysis of a randomized block experiment, and thus provides an alternative to the twoway analysis of variance. The hypotheses are
H_{0}: all treatment effects are zero
H_{1}: not all treatment effects are zero.
Randomized block experiments are a generalization of paired experiments and the Friedman test is a generalization of the paired sign test. As the calculations are complex, we will use Minitab to perform the test. This test is not available in SigmaXL.
Application Example
A randomized block experiment is conducted to evaluate the effect of rake angle and tool type on tool life. Data is shown in Table 13.9. Perform the Friedman test to assess whether tool life depends on rake angle and tool type.
Minitab commands are > Stat > Nonparametrics > Friedman.
Minitab output is shown in Table 13.10
As P value is > 0.05, we cannot reject the null hypothesis. We have to conclude that the tool life is not significantly affected by rake angle.
Table 13.9 Data for the tool life experiment
Rake angle  Tool  Tool life 

5 
1 
31.5 
10 
1 
32.6 
15 
1 
32.3 
20 
1 
39.9 
5 
2 
35.5 
10 
2 
32.6 
15 
2 
27.8 
20 
2 
39.9 
5 
3 
35.5 
10 
3 
36.6 
15 
3 
37.7 
20 
3 
39.9 
Table 13.10 Minitab output for Friedman test
Friedman test: Tool life versus rake angle blocked by tool
S = 5.80 DF = 3 P = 0.122
Levene's Test
This test uses distances from the median instead of the mean. It can be used to compare the equality of variances from two or more samples. We will use software to perform calculations. Unlike the Ftest, this is applicable to any continuous distribution.
Application Example
Suppose we want to compare the variation in runs scored by two famous cricket players, Sachin Tendulkar of India and Brian Lara of West Indies. Data is available in file ‘Sachin lara data.xls’ (or .mtw). Partial data is shown in Table 13.11 for reference. Compare the standard deviations of the two players. Consideration for out or notout is not given in this example.
It can be easily verified with normal probability plots and histograms that the data is grossly nonnormal. It is, therefore, necessary to use Levene's test.
Table 13.11 Data of runs scored by two players
Player  Runs 

Lara  11 
Lara  23 
Lara  5 
Lara  45 
Lara  0 
Lara  54 
Lara  18 
Lara  45 
Sachin  1 
Sachin  3 
Sachin  23 
Sachin  11 
Sachin  88 
Sachin  64 
Sachin  28 
Sachin  62 
Sachin  67 
Minitab commands: > Stat > Basic Statistics > 2variances. Specify Runs in samples and Player in subscripts.
SigmaXL commands: > Statistical Tools > Equal Variance Tests > Levene's.
Output from SigmaXL is shown in Table 13.12. The pvalue is 0.0619 which means the null hypothesis cannot be rejected at 95 percent confidence.
Table 13.12 SigmaXL output for Levene's test
Levene's test for equal variance: Runs
(Use with nonnormal data)
Test information 

H0: Variance 1 = Variance 2 =…. = Variance k 
Ha: At least one pair Variance i ≠ Variance j 
Player  Lara  Sachin 

Count  286 
374 
Mean  36.129 
39.698 
Median  26 
28 
StDev  34.447 
38.916 
AD Normality Test pvalue  0.0000 3.498 
0.0000 
Levene's Test Statistic  3.498 

DF Num  1 

DF Den  658 

pvalue  0.0619 
Table 13.13 Summary of nonparametric tests
Summary
Nonparametric methods do not assume any distributions.
Sensitivity of nonparametric tests is lower as compared to parametric tests
We have seen the applications of
 Sign Test
 Wilcoxon SignedRank Test
 Ranked Sum MannWhitney U Test
 KruskalWallis Test to compare more than two medians
 Mood's Median Test to compare more than two medians
 Friedman Test
 Levene's Test to compare two or more variances
Checkpoints for Completion of Analyze Phase
FMEA is complete and actions are identified with responsibilities and dates
List of possible causes is generated and stratified based on Exploratory Data Analysis, Multivari studies
The relationship of Xs and Ys is validated using Hypothesis tests and/or Analysis of Variance (ANOVA)
Prioritized list of ‘vital few’ KPIVs (key sources of variation) is available
Waste Analysis is complete in case of lean Six Sigma projects
Strong clue(s) to the KPIVs that will be useful for developing solution strategy
Appropriate tools are used to analyze data efficiently with minimum efforts.
References
Anderson, David R., Denis J. Sweeney and Thomas A. Willians (2007). Statistics for Business and Economics. New Delhi: Thomson SouthWestern Division of Thompson Learning Inc.
Johnson, Richard (2005). Miller and Freund's Probability and Statistics for Engineers. USR, NJ: Prentice Hall.
Levin, Richard I. and David S. Rubin (1991). Statistics for Management. USR, NJ: Prentice Hall.
Montgomery, Douglas C. (2004). Design and Analysis of Experiments. New York, NY: John Wiley & Sons.