13. Nonparametric Tests – Six Sigma for Business Excellence: Approach, Tools and Applications

13

Nonparametric Tests

Introduction

Most of the hypothesis tests that we learned about earlier assume normal distribution of data. However, this assumption is sometimes not right. Examples can be: data of project cycle time, lead time, time to repair machines, etc. Nonparametric tests—also known as distribution-free tests—can be performed when the data is clearly not normal and we do not know the shape of the distribution. The power of nonparametric tests is less than parametric tests and confidence intervals are wider.

There are various kinds of nonparametric tests. Some of these are given below:

  1. Sign Test
  2. Wilcoxon Signed-Rank Test
  3. Ranked Sum Mann-Whitney U Test
  4. Kruskal-Wallis Test
  5. Mood's Median Test
  6. Friedman Test
  7. Levene's Test

Unless otherwise mentioned, for application examples in this chapter, data are available in Excel file ‘nonparametrics’ or Minitab worksheet with the same file name.

Sign Test

This is the simplest nonparametric test; an alternative to one-sample and paired t-test. We find out the number of data points above (or below) the hypothesized median and test whether there are significant differences in these. Our objective is to check if the median of a population is equal to the specified target. We will study the procedure by citing examples.

Application Example

A company recently conducted a training program for call center staff. Table 13.1 shows the number of complaints received both before and after the training in comparable time period. Was the training effective? Assume confidence level of 95 percent. Data are available in Excel file worksheet ‘nonparametrics.xls’.

Since there are two samples, find the difference and its sign. If there is no difference, we will find approximately equal pluses and minuses. Effective training should essentially reduce the number of complaints after training, in which case we will find more minuses than pluses. Thus the median will be lower than 0, i.e., less complaints than before. If the median is η, the null and alternate hypotheses will be given by

 

Table 13.1 Complaints before and after training

Before After Improvement
8
6
–2
7
5
–2
6
8
2
9
6
–3
7
9
2
10
8
–2
8
10
2
6
7
1
5
5
0
8
6
–2
10
9
–1
8
5
–3

 

 

As we can see in the table, there are 7 minuses, 4 pluses and 1 tie. The sample size is 12.

If the training had no effect, the hypothesized proportion under null hypothesis is pH0 = 0.5. Since we want to find improvement after training, we will exclude zeros.

Under null hypothesis, the probability of seven or more employees performing better can be calculated using Binomial distribution. P(>7) = 1 – P(6). Use the Excel function = Binomdist (number of successes, number of trials, probability, cumulative). Thus use = Binomdist (6, 11, 0.5, 1) to get a value of 0.7256. P(>7) = 1 – 0.7256 = 0.2744. This is the p-value. If this was less than a risk of 0.05, we would have rejected H0. Hence we will have to conclude that there is no significant improvement after the training.

We can use software to test the hypothesis.

Minitab commands: > Stat > Nonparametric > 1 − sample sign test (Test for Median = 0 vs Median < 0)

SigmaXL commands: > Nonparametric tests > 1 − sample sign

SigmaXL output is shown in Table 13.2. The p-value is 0.274. We can, therefore, conclude that there is insufficient evidence of improvement after training.

 

Table 13.2 SigmaXL output for sign test

H0: Median = 0  
Ha: Median Less Than 0  
  Improvement
Count (N) 12
Median –1.5
Points Below 0 7
Points Equal To 0 1
Points Above 0 4
P-value (1-sided) 0.2744

Exercises for Practice

  1. A random sample of 10 measurements of octane number for gasoline is taken. The following data shows the values:

    Data: 98.9, 98.3, 97.8, 98.4, 98.1, 98.0, 97.9, 98.3, 98.1, 98.2.

    Is the median >98? Assume confidence level of 95 percent. (Ans: Yes)

  2. A company has implemented safety training at its 10 different plants. Data of lost worker-hours during the period of 3 months before and 3 months after the training are collected for the plants and are as follows:

 

Use sign test to assess whether safety training has reduced lost hours. Assume 95 percent confidence level. (Ans: Yes. Safety training has reduced lost hours)

Wilcoxon Signed Rank Test

This is a nonparametric alternative to the paired t-test. We first find the difference between the matched pairs and rank the absolute difference. Then we give the same sign to the ranks. The sum of signed ranks can be used to assess whether the differences are significant. The procedure is explained in the following example.

Application Example

Two possible methods for production are under consideration. The task is performed by 11 workers using both the methods. Data of production are shown in Table 13.3. Are the methods significantly different? Assume confidence level of 95 percent. (Adapted from Statistics for Business and Economics by David R. Anderson, Denis J. Sweeney and Thomas A. Willians.)

Find the differences between production using the two methods with sign. Rank the values of differences excluding zeroes. Ties are assigned average ranks. For example, ranks 3 and 4 have the same absolute difference of 0.4. Hence, the average of ranks 3 and 4 (i.e. 3.5) is taken as adjusted. Similarly, ranks 5 and 6 (having same difference) are averaged at 5.5. Then give the signs to the ranks as those of original differences. Table 13.4 illustrates the procedure. Data is sorted by absolute value of difference.

The sum of signed ranks T is distributed normally with mean of 0 and standard deviation of if n is 10 or more, where n is the number of non-zero comparisons. In this case n = 10. We can easily calculate the Z value using mean = 0 and the calculated value of standard deviation. Table 13.4 shows the procedure and calculations. In this case, the calculated Z value of 2.24 is larger than Z value of 1.96 at 95 percent. The corresponding p-value is 0.0125. Thus we reject H0. We conclude that there is a significant difference between the two methods.

 

Table 13.3 Data of production with two methods

Worker Method A Method B
1
10.2
9.5
2
9.6
9.8
3
9.2
8.8
4
10.6
10.1
5
9.9
10.3
6
10.2
9.3
7
10.6
10.5
8
10
10
9
11.2
10.6
10
10.7
10.2
11
10.6
9.8

 

Table 13.4 Illustration calculation of for Wilcoxon signed rank sum test

Mann-Whitney Rank Sum U Test

The Mann-Whitney test is used to compare two population medians. Assumptions for the Mann-Whitney test are

  • Data are independent random samples from two populations that have the same shape and whose variances are equal and
  • a scale that is continuous or ordinal (possesses natural ordering) if data is discrete.

In this test, the null hypothesis is

Data from both samples are ranked in ascending order and rank numbers are determined. Test statistic U is calculated based on rank numbers rather than on individual values. If numbers are equal, each of these will be considered as mean rank of the equal numbers. For example, if rank 4 and 5 were number 50, their ranks would be 4.5. The same rule applies to more than two equal numbers.

 

 

where n1 and n2 are sample sizes and R1 and R2 are sum of ranks in the samples 1 and 2 respectively. Mean and standard error of U statistic is given by

 

 

For sample sizes n1 and n2 larger than 10, U statistic is approximately normally distributed. Thus, we can estimate the Z value using these values of mean and standard deviation. By comparing this Z value with Zα/2 (or-Zα/2) we can conclude whether there is a significant difference in the two populations.

Application Example

A committee wants to assess whether the scores of two schools, A and B are comparable. Scores of 15 students from both the schools are shown in Table 13.5. What should the committee conclude at 95 percent confidence level? (Adapted from Statistics for Management by Richard I. Levin and David S. Rubin.)

Arrange the data in ascending order and then perform the calculations as shown in Table 13.6.

The calculated Z value is –0.6014. For 95 percent confidence, the critical lower limit is –1.96. As the calculated Z value is not in the critical zone, we cannot reject H0. Thus the two schools are comparable.

Try using R2 instead of R1. What do you observe?

 

Table 13.5 The scores of students from two schools

School A School B
1000
920
1100
1120
800
830
750
1360
1300
650
950
725
1050
890
1250
1600
1400
900
850
1140
1150
1550
1200
550
1500
1240
600
925
775
500

 

Table 13.6 Illustration of Mann-Whitney test calculation

Kruskal-Wallis Test

The Kruskal-Wallis test is used to compare two or more population medians. Assumptions are the same as for Mann-Whitney test:

  • Data are independent random samples from the populations that have the same shape and whose variances are equal and
  • a scale that is continuous or ordinal (possesses natural ordering) if data is discrete.

Here the null and alternate hypotheses are

H0: η1 = η2 = η3

H1: In at least one pair, medians are not equal.

Data from all the samples are ranked and rank numbers are determined. Test statistic is calculated based on rank numbers rather than on individual values.

Application Example

Three operators work in a call center. Their call lengths are sampled; data is arranged in ascending order and shown in Table 13.7. Assess whether there is a significant difference in the call lengths. Historically, the call length data is not distributed normally. Assume confidence level of 95 percent.

 

Table 13.7 Illustration of Kruskal-Wallis test

 

Arrange the data in ascending order and assign ranks. We compute the K-statistic. (Some literature refers to this as H statistic.) The formula and calculations are shown in Table 13.7. Ranks of equal values are replaced as average ranks. For example, ranks 3 and 4 for time = 0.9 are replaced as 3.5. Similarly, ranks 5, 6, 7 for time = 1.3 are replaced by 6. K statistic is calculated as shown in the table.

The K statistic is approximated by chi-square distribution with (p – 1) degrees of freedom. As p=3, degrees of freedom is 2. The critical value of Chi-square distribution with 2 degrees of freedom is 5.991. As the calculated statistic 0.847 is less than 5.991 and therefore is not in the critical region, we cannot reject H0. We conclude that there is no significant difference between operators.

Mood's Median Test

Mood's median test can be used to test the equality of medians from two or more populations and, like the Kruskal-Wallis test, provides a nonparametric alternative to the one-way analysis of variance. Mood's median test is sometimes called a median test or sign scores test. Mood's median test is more robust than is the Kruskal-Wallis test against outliers, but is less powerful for data from many distributions, including the normal. Mood's median test, being robust against outliers and errors in data, is particularly appropriate in the preliminary stages of analysis.

For Mood's median test the null and alternate hypotheses are

H0: the population medians are all equal

H1: the medians are not all equal.

Assumptions of Mood's Median Test

  1. Data from each population are independent and random
  2. Samples and the population distributions have the same shape.

Procedure for Mood's Median Test

Find the overall median for all the data pooled together. Construct a table with one column showing the number of readings above the overall median and another showing the number of readings below the overall median for each category. Half of the readings that are equal to the median should be counted in the ‘above’ column and half in the ‘below’ column. These are the observed frequencies. Under H0, there will be equal number of samples above and below the overall median. This is the expected frequency. Now we can perform a Chi-square test to assess whether there is a significant difference between the observed and expected frequencies.

Application Example of Mood's Median Test

Three teaching methods are being evaluated. The first two columns of Table 13.8 show marks scored by students who were taught using different methods. (Source: Levin & Rubin, 1991).

Overall median is 80.5. For each data point, we have to find out whether the number is equal to or below or above the median and add 1 to count below/equal to or above respectively. If all three methods were alike, we should expect half the data points above and half below. Thus we get the observed and expected counts. A chi-square test is performed with (p – 1) degrees of freedom where p is the number of groups (3 methods in this example) being compared. Table 13.8 illustrates the calculations.

As the calculated chi-square statistic is less than the critical value at 95 percent confidence level, we cannot reject H0. Thus we conclude that there is no significant difference between the teaching methods.

 

Table 13.8 Illustration of Mood's Median test

Friedman Test

The Friedman test is a nonparametric analysis of a randomized block experiment, and thus provides an alternative to the two-way analysis of variance. The hypotheses are

H0: all treatment effects are zero

H1: not all treatment effects are zero.

Randomized block experiments are a generalization of paired experiments and the Friedman test is a generalization of the paired sign test. As the calculations are complex, we will use Minitab to perform the test. This test is not available in SigmaXL.

Application Example

A randomized block experiment is conducted to evaluate the effect of rake angle and tool type on tool life. Data is shown in Table 13.9. Perform the Friedman test to assess whether tool life depends on rake angle and tool type.

Minitab commands are > Stat > Nonparametrics > Friedman.

Minitab output is shown in Table 13.10

As P value is > 0.05, we cannot reject the null hypothesis. We have to conclude that the tool life is not significantly affected by rake angle.

 

Table 13.9 Data for the tool life experiment

Rake angle Tool Tool life
5
1
31.5
10
1
32.6
15
1
32.3
20
1
39.9
5
2
35.5
10
2
32.6
15
2
27.8
20
2
39.9
5
3
35.5
10
3
36.6
15
3
37.7
20
3
39.9

 

Table 13.10 Minitab output for Friedman test

Friedman test: Tool life versus rake angle blocked by tool

S = 5.80   DF = 3   P = 0.122

 

Levene's Test

This test uses distances from the median instead of the mean. It can be used to compare the equality of variances from two or more samples. We will use software to perform calculations. Unlike the F-test, this is applicable to any continuous distribution.

 

 

Application Example

Suppose we want to compare the variation in runs scored by two famous cricket players, Sachin Tendulkar of India and Brian Lara of West Indies. Data is available in file ‘Sachin lara data.xls’ (or .mtw). Partial data is shown in Table 13.11 for reference. Compare the standard deviations of the two players. Consideration for out or not-out is not given in this example.

It can be easily verified with normal probability plots and histograms that the data is grossly non-normal. It is, therefore, necessary to use Levene's test.

 

Table 13.11 Data of runs scored by two players

Player Runs
Lara
11
Lara
23
Lara
5
Lara
45
Lara
0
Lara
54
Lara
18
Lara
45
Sachin
1
Sachin
3
Sachin
23
Sachin
11
Sachin
88
Sachin
64
Sachin
28
Sachin
62
Sachin
67

Minitab commands: > Stat > Basic Statistics > 2-variances. Specify Runs in samples and Player in subscripts.

SigmaXL commands: > Statistical Tools > Equal Variance Tests > Levene's.

Output from SigmaXL is shown in Table 13.12. The p-value is 0.0619 which means the null hypothesis cannot be rejected at 95 percent confidence.

 

Table 13.12 SigmaXL output for Levene's test

Levene's test for equal variance: Runs

(Use with non-normal data)

Test information
H0: Variance 1 = Variance 2 =…. = Variance k
Ha: At least one pair Variance i ≠ Variance j
Player Lara Sachin
Count
286
374
Mean
36.129
39.698
Median
26
28
StDev
34.447
38.916
AD Normality Test p-value
0.0000 3.498
0.0000
Levene's Test Statistic
3.498
 
DF Num
1
 
DF Den
658
 
p-value
0.0619
 

 

Table 13.13 Summary of nonparametric tests

 

Summary

Nonparametric methods do not assume any distributions.

Sensitivity of nonparametric tests is lower as compared to parametric tests

We have seen the applications of

  • Sign Test
  • Wilcoxon Signed-Rank Test
  • Ranked Sum Mann-Whitney U Test
  • Kruskal-Wallis Test to compare more than two medians
  • Mood's Median Test to compare more than two medians
  • Friedman Test
  • Levene's Test to compare two or more variances

Checkpoints for Completion of Analyze Phase

  • FMEA is complete and actions are identified with responsibilities and dates

  • List of possible causes is generated and stratified based on Exploratory Data Analysis, Multivari studies

  • The relationship of Xs and Ys is validated using Hypothesis tests and/or Analysis of Variance (ANOVA)

  • Prioritized list of ‘vital few’ KPIVs (key sources of variation) is available

  • Waste Analysis is complete in case of lean Six Sigma projects

  • Strong clue(s) to the KPIVs that will be useful for developing solution strategy

  • Appropriate tools are used to analyze data efficiently with minimum efforts.

References

Anderson, David R., Denis J. Sweeney and Thomas A. Willians (2007). Statistics for Business and Economics. New Delhi: Thomson South-Western Division of Thompson Learning Inc.

Johnson, Richard (2005). Miller and Freund's Probability and Statistics for Engineers. USR, NJ: Prentice Hall.

Levin, Richard I. and David S. Rubin (1991). Statistics for Management. USR, NJ: Prentice Hall.

Montgomery, Douglas C. (2004). Design and Analysis of Experiments. New York, NY: John Wiley & Sons.