Chapter 12
Theory of Sampling
Objectives
After completing this chapter, you can understand the following:
 The definition, meaning and significance of sampling and its distribution.
 The concept related to different methods of sampling with examples.
 The concept of large and small samples.
 The need for sampling in biological decision making situations.
 The standard error concept and its importance.
 The estimation of population parameters with the help of sample statistic.
12.1 INTRODUCTION
In this chapter we discuss the concepts of sampling and sampling distributions, which is the actual basis of statistical estimation and hypothesis testing. The main purpose of sampling is to allow us to make use of the information gathered from the sample to draw influences about the entire population. One can define a population as a collection of objects having a certain welldefined set of attributes. A sample is any subset of a given population. It is possible to estimate the population parameters from the limited sample parameters with the help of statistical methods and concepts. This falls under the category of statistical inference [Inductive statistics]. The inferential process is not error free. It is due to the fact that the estimation or inference is based on the limited sample data obtained from samples.
We should evaluate such errors in order to have a measure of confidence in our inferences. If we take random samples, these errors occur randomly and thus the same can be computed probabilistically.
In this chapter, we will develop the concepts of sampling to describe sampling distributions for various sample statistics such as the sample mean, proportion and introduce the wellknown sampling distributions as the Chisquare, Fdistribution, tdistribution and standard normal distribution. These distributions are very well fit into certain sample statistics that play a major role in estimation and hypothesis testing.
12.2 WHY SAMPLE?
In many situations, even though we are very much interested in some specific characteristic of a specific population, we cannot physically examine the entire population due to cost, time or other limitations. In such instances, examine a part of a population by means of a sample with the expectation that the sample will be the representative of the population under study.
12.3 HOW TO CHOOSE IT?
One way is to use simple random sampling, which gives all samples of the size specified an equal chance of being selected. Based on the given random sample, one can find a sample statistic such as mean or variance; the same can be used to estimate the corresponding population parameter. Every statistic is a random variable having its own probability distribution. The probability distribution referred by the sample statistic is known as sampling distribution. It has a defined property like any probability model. Based on the properties one can evaluate the chance errors involved in drawing the inference from a sample.
12.4 SAMPLE DESIGN
It is a procedure or plan for obtaining a sample from a prescribed population prior to collecting any data.
12.5 KEY WORDS AND NOTATIONS
Population: Collection of objects having certain welldefined set of attributes.
Example:
 The population of affiliated colleges in Tamil Nadu.
 The population of government hospitals in Tamil Nadu.
Sample: It is a portion of the population.
Example:
 Collection of affiliated colleges in Tamil Nadu with minority status.
 Collection of government hospitals only in Chennai.
Parameter: It refers the characteristics of the population.
 Population mean, population SD etc.
Statistic: It refers to the characteristics of the sample.
Example:
Sample mean, sample SD etc.
Degrees of freedom: It means the number of items to be selected freely out of ‘n’ items. It is [n – 1]. It is denoted by df.
Example:
Select three integer numbers such a way that their addition leads to the value 100.
One can choose freely two items only, the selection of third value cannot be done freely. If you select 40 & 10; the third value should be 50.
Degrees of freedom = df = 3 – 1 = 2.
Census: It refers to the complete enumeration of the population.
Notations:
N  population size
μ  population mean
σ  population SD
p  population proportion
n  sample size
 sample mean
s  sample SD
p  sample proportion
R  population correlation coefficient
r  sample correlation coefficient
Sample survey: The process of partial enumeration is called a sample survey.
12.6 ADVANTAGES AND DISADVANTAGES OF SAMPLING
Advantages
 Less time is needed to study the sample than the population.
 Less cost towards the analysis in most numbers of situations, sampling gives adequate information.
 The confidence level of data collected is more in sampling than in population.
Disadvantages
 At times there is a possibility of the error factor.
 High degree of expertise is required while selecting the sample.
12.7 NON RANDOM ERRORS/NON SAMPLING ERRORS
This type of error can occur in two different situations:
 Sample is not selected from the corresponding population.
 Sample is taken from predefined population, buy response bias that is respondents are not giving the proper information.
12.8 RANDOM ERRORS/SAMPLING ERRORS
At times a welldesigned sample may not provide actual representation of the population under study; it is because a sample is a portion of a population. The inference based on this sample towards the parent population lead to incorrect inferences.
Such type of errors are referred as random error or sampling error.
12.9 TYPES OF SAMPLE
A sample can be classified in to two major categories.
 Probability sample and
 Nonprobability sample.
12.9.1 Probability Sample
If the probability of selection of each member into a sample is nonzero, then the resulting sample is said to be a probability sample.
12.9.2 Nonprobability Sample
If a sample is not probabilistic sample, then it is said to be nonprobabilistic sample.
Normally the sampling is based on two specific principles.
Principles: 1 Law of statistical regularity
This law implies that a reasonably large number of items selected at random from the population such a way that the characteristics of the population and the sample are equal.
Principles: 2 Law of inertia of large numbers
This law reveals that wherever the sample is quite large the inference will be very close to the actual.
Different methods of sampling
12.10 RANDOM SAMPLING
According to N.M. Harper, ‘it is a sample selected in such a way that every item in the population has an equal chance of being included’. In general, it is the process of selecting sample from a population in such a way that every item of the population has an equal chance of being included in the sample.
Example:
 Selection of any five members out of a group containing 20 members will constitute a random sample.
 Selection of 4 aces out of a wellshuffled pack of 52 cards will constitute a random sample.
Notations:
Population size 
N 
Sample size 
n [n ≤ N ] 
Number of possible samples 
m = ^{N}C_{n} 
Different samples 
S_{1}, S_{2},…, S_{m} 
P [Selecting a sample] 
1/m 
In other words, simple random sampling refers the process which ascertains that each sample of size n [S_{1}, S_{2}, … , S_{m}] has an equal probability of being selected up of the chosen sample.
The simple random sampling method can be adopted with or without replacement of the items selected. In practice, sampling is done always without replacement. While selecting a single random sample, we must use some specific method to ensure true randomness. One such method involves the use of random numbers. Usage of random numbers ensures that every element in the population has equal and independent chance of being selected.
Example: 1
Let us consider the production record on a particular day of the employees of a Firm Bhavana Sree Ltd. along with the employee numbers.
E. No. – Employee Number; Prod. – Production
We can use the random number table for selecting a simple random sample of size 5, without replacement from the population of 50 employees.
Step 1:
Select 5 two digit random numbers using the random number table
Step 2:
Select the employees by considering the random number selected as their employee numbers.
If we proceed in the same way, we can create different samples of size 5.
Note:
Since we are sampling without replacement, we do not want to use the same random number twice.
12.10.1 Systematic Sampling
It is a procedure that starts with a random starting point in the population and then includes in the sample every be k^{th} element encountered thereafter.
Example: 2
Population size [N]: 100 students
Sample size [n]: 10 students
Sampling ratio = n/N = 10/100 = 1/10
Form 10 different groups according to roll numbers as follows:
Select any one number in G1
[1 2 3 4 5 6 7 8 9 10]
Suppose the selected item is 8. Then in each group select the 8^{th} item.
That is 8, 18, 28 and 98. The collection of all these elements leads to a sample of size 10. This sample is referred as systematic sample.
It is different from the simple random sampling. In this only the first element is selected randomly. There is a chance of response bias to occur. This method of selecting a sample is commonly used among the probability sampling deigns.
12.10.2 Stratified Sampling
P: Population [Size N ]
P_{1}, P_{2}, P_{3}: SubPopulation [Size N_{1}, N_{2}, N_{3} and N = N_{1} + N_{2} + N_{3}]
S_{1}, S_{2}, S_{3}: Samples from each subpopulation of size n_{1}, n_{2} and n_{3}, respectively.
Divide the single population into many subpopulation called strata. Select a random sample from each stratum. Then the stratified sample is nothing but the grouping of different sample selected from all the strata with a one sample. This sampling technique needs prior knowledge about the population. This helps to partition the single population into different strata based on some homogeneous characteristics.
In order to set the maximum information using stratified sampling, the strata must be different from each other but homogeneous within each structure.
Example: 3
Problem: Determining the faculty preferences for a union in a college.
Population: 100
To say specifically, the preferences will be differing according to the different grades of the teachers. If we take a sample out of this population directly, we will not get any fruitful results. Instead try to split this single population of college teachers into different subpopulation based on their grades and select a sample from each strata and form a one big sample by merging all the subsamples collected from different strata. If so there is more chance for us to have fruitful results.
Population: 100
Stratified sample = [S_{1}]U[S_{2}]U[S_{3}]U[S_{4}]U[S_{5}]U[S_{6}]
In stratified sampling, the number of items selected from each stratum is in proportion to its size. This method ensures that the stratum in the sample is over weighted by the number of elements it contains with. It is very much used in managerial applications. The reason is that it allows to infer conclusions based on each stratum separately.
12.10.3 Multistage Sampling
As the name indicates the selection process of this type of sample contains different stages.
Population is divided into different groups called first stage units.
Stage 2:
The first stage units are then divided into smaller groups, called second stage units.
Stage 3:
The second stage units are divided into smaller groups, called third stage units.
This staging process will go on until a sample of required number is attained.
Example: 4
Population: Group of institutions
I_{1}

I_{2}

I_{3}

I_{4}

I_{5}

I_{6}

I: Each institution contains different department.
D: Each department contains different courses.
First stage units: [I_{1}, I_{2}, ..., I_{6}]
Second stage units: [I_{1}[D_{1}, D_{2}, ...D_{6}], ...]
Third stage units: [[I_{1}, D_{1}][C_{1}, C_{2}, C_{3}], ...]
Select a sample using proper method out of first stage units. Then select a sample out of second stage units is selected out of the sample selected based on first stage units and the same procedure is repeated from stage to stage until we reach the required sample size. This method of selecting sample will be very much useful in the case of a very large population.
12.11 NONRANDOM SAMPLING METHODS
To apply the probability, sampling needs a list of all sampling units. The same is not possible in all the cases. In order to overcome from this situation, we seek the help of nonrandom sampling technique.
12.11.1 Convenience Sampling
In this type of sampling, the selection of sample is totally left to the convenience of the researcher. The cost of selecting a convenience sample is very low in comparing with the probability sampling. On the other hand, it suffers from excessive biasness, which in turn leads to possible errors and the same cannot be quantifiable. It is very much useful in public opinion surveys, sample regarding demand analysis, shopping centre surveys etc.
Convenient sampling is separately used in exploratory studies or when representing the population is not a critical factor.
12.11.2 Purposive Sampling
If we select an element from the population based on certain characteristics, then the resulting sampling is known as purposive or judgment sample.
Population of students
Among the 100 students of a class, the sample is selected only based on the students those who are members of extracurricular group.
12.11.3 Quota Sampling
There is a defined proportion of elements to be selected from the population based on certain characteristics, is referred as quota sampling.
Example: 5
Population: 1000 customers
Top income group [TIG] 20%

Middle income group [MIG] 30%

Low income group [LIG] 50%

Out of this population select a sample of size 100, is such a way that
Sample: 100 customers
Top income group [TIG] 30%

Middle income group [MIG] 30%

Low income group [LIG] 40%

This type of sampling is often used in conducting public opinion polls such as predicting consumer preferences in market research studies and public opinions regarding political issues and candidates. There is a chance of reducing the biasness in the case. It is very easy to adopt and less cost.
12.11.4 Cluster Sampling
It requires the prior knowledge about the population. The population is to be partitioned into different groups called clusters; the formation of clusters is based on some characteristics.
Step 1:
Form the clusters.
Step 2:
Select few clusters at random.
Step 3:
Select the elements at random based on the randomly selected clusters.
The resulting sample is referred as cluster sampling.
Example: 6
Population: 1000 students
Clusters formed based on discipline.
Department of Mathematics 50 
Department of computer science 100 
Department of Management 500 
Department of Fashion 150 
Department of BioTech. 50 
Department of Interior Design 150 
Among the clusters randomly select any two clusters.
Department of Fashion 50 
Department of Computer Science 100 
Select few elements randomly out of these two randomly selected clusters.
Department of Fashion 5 
Computer Science 15 
The abovementioned sample is said to be a cluster sample of size 20.
12.11.5 Sequential Sampling
Samples are selected one after another based on the outcome of the previous samples.
This type of sampling method is used in the statistical quality control department very often.
12.12 SAMPLING DISTRIBUTIONS
We can define a sampling distribution as follows. The distribution of all possible values that can be assumed by some statistic evaluated from samples of same size randomly drawn from some population is called the sampling distribution of that of statistic.
Population: Ν
From the population of size N, draw the different sample of size n, [n < N] randomly. Let the sample be [s_{1}, n], [s_{2}, n], … [s_{k}, n].
With the sample data it is possible to evaluate the sample statistics such as sample mean, sample SD etc.
Sampling distribution based on the sample means:
Consider all the sample means _{1}, _{2},…, _{k}.
Construct a frequency distribution based on the means of the samples.
Means of sample  Frequency 

The resulting continuous distribution based on the means of the sample is referred as sampling distribution based on the means of the samples. For the constructed distribution, it is possible for us to evaluate the measures mean, SD etc.
The mean is said to be the mean of the sample means. The standard deviation of this sampling distribution based on mean is known as the standard error [SE] of the distribution.
In the same way, one can construct a sampling distribution based on the SD of the samples.
SDs of sample  Frequency 

Likewise for every statistic of the sample it is possible to construct different sampling distribution.
Example: 7
Population: Weekly expense of five families
Collect all possible combinations of different samples containing exactly of size 2. Also evaluate the sample means and SDs as well as the mean and SD of the population. Since Ν = 5 and n = 2, we can have ^{5}C_{2} samples. Over all we can have 10 sample of size 2.
Sample no.  Sample data  Sample mean 

01

45, 40

42.5

02

45, 47

46.0

03

45, 35

40.0

04

45, 33

39.0

40, 47

43.5


06

40, 35

37.5

07

40, 33

36.5

08

47, 35

41.0

09

47, 33

40.0

10

35, 33

34.0

Total


400

Construction of a sampling distribution
Mean of the population = 40
SD of the population = 5.44
Consider all the sample means and the associated sampling distribution ofis
We now evaluate Ε[] and var[]
12.13 NEED FOR SAMPLING DISTRIBUTION
We can draw the inferences about the population parameters based on the sample statistics only. In addition to the sample statistic, if we know the probability distributions with respect to the sample statistic, it is possible for us to calculate the probability when the sample statistic assumes any specific value. This characteristic is very much needed in all statistical inferences.
Note:
The variance of the sampling distribution is equal to the variance of the population divided by the size of the sample used to get the sampling distribution.
Case: 1 ; when the population size is infinite.
Case: 2 ; when the population size is finite.
Central limit theorem
P: [μ,σ, Ν] for a sufficiently large value of n [n ≥ 30], the sampling distribution of sample mean [] is approximately a normal distribution with mean μ and σ_{} . Ρ: [μ,σ, Ν].
Note:
The same holds food for the sample proportion also.
Relationship between the sample statistics with the population parameter
 The mean of all possible sample means will be exactly equal to the universe mean.
 The mean of all possible sample SDs will be approximately equals to ; where n is the sample size.
Note:
While evaluating the sample variance, we use the relation.
Here we use [n – 1] in the division instead of [n].
This is due to technical reason in order to have E[s^{2}] = σ^{2}.
Show that the sample variance an unbiased estimator of the population variance σ^{2}.
Case: 1
Sample from infinite population having normal distribution, we know that the expected value of the chisquare statistic
This implies that, E[s^{2}] = σ^{2}.
The sample variance s^{2} is an unbiased estimator of σ^{2} for infinite populations having normal distributions.
Case: 2
For samples from infinite populations
Taking expectation on both sides of [1], we have
it is obvious
And the sample variance is thus an unbiased estimator of σ^{2} for an infinite population in general.
12.14 STANDARD ERROR FOR DIFFERENT SITUATIONS
12.14.1 When the Population Size Infinite
 Standard Error [SE] of the specified sample mean n.
 Standard error [SE] of difference of two sample means .
 Standard error [SE] of the specified sample SD[s]
 Standard error of the difference of two sample SDs s_{1}
 Standard error [SE] of the specified sample proportion [p]:
 Standard error [SE] of the difference of two sample proportions [P_{1} – P_{2}].
Standard error [SE] of the sample correlation coefficient [r]
12.14.2 When the Population Size is Finite
Sample is drawn with replacement
 Standard error of the specified sample mean []: refer formula [1].
 Standard error of the specified sample proportion [p]: refer formula [5].
Sample is drawn without replacement
 Standard error [SE] of the specified sample mean []:
 Standard error of the specified sample proportion [P]:
12.14.3 Sampling Distribution Based on Sample Means
Consider a random sample of size n out of a population with actual mean is and variance σ^{2}, then we know that the sample observation are independent and identically distributed random variables. Then the sample mean,
Clearly is also a random variable with an expected value.
Note: 1
It indicates that the expected value of the sample mean and the actual population mean are one and the same.
Note: 2
This shows that the variability in sample means is lesser then the population variance, .
Whenever the sample size is large, the fluctuation will be less from one sample to the other.
Population parameters are estimated from sample data because it is not possible to examine the entire populations practically in order to make a perfect evaluation.
Statistical estimation procedures provide the process by which estimates of the population parameters can be evaluated with the degree of confidence needed. This degree of confidence is controllable with respect to the size of the sample and by the type of estimate made.
12.15 POINT AND INTERNAL ESTIMATION
Type of organization  Estimation of interest 

Manufacturing industry 
Quality of raw materials used for production 
Bank 
Mean number of arrivals of the customer at the teller’s window 
The estimate can be of two types, they are
 Point estimates and
 Interval estimates.
12.15.1 Point Estimate
It refers a specific value which is used to estimate the value of the unknown population parameter.
Example:
 The mean salary of a sample of toplevel executives in many firms may be used as a point estimate of the corresponding population mean for toplevel executives in all firms.
 The percentage of employed women who prefer Cinthol brand soap over all other brands may be used as an estimate of the corresponding population percentage of all employed women.
Similarly, the use of sample mean to estimate the population mean, the use of sample SD to estimate the population SD and etc., in each case we use point estimate of the parameter.
Estimate and estimator
An estimator is random variable, and its numerical value is an estimate.
Population parameter 
Estimator [sample statistic] 
Estimate [value of estimator] 

Mean – μ 
= 100 

Variance – σ^{2 } 
s^{2} 
s^{2} = 50 
12.15.2 Properties of Good Point Estimators
The criteria for good point estimators are
 Unbiasedness
 Relative efficiency
 Consistency and
 Sufficiency
Unbiasedness
An estimator is unbiased, if its expected value is equal to the population parameter being estimated.
Relative efficiency
It refers the sampling variability of an estimator.
If two estimators of a given population parameter are both unbiased, the one with the smaller variance for a given sample size is defined as being relatively more efficient. If e_{1} and e_{2} are two unbiased estimators of the parameter e, then the relative efficiency of e_{1}, with respect to e_{2} is defined as [assume that Var[e_{1}] < Var[e_{2}]].
Consistency
An estimator is said to be consistent, if the probability of the parameter being estimated approaches 1 as n approaches infinity.
e_{1} – Sample estimator
e – Population estimator
Sufficiency
An estimator e_{1} is said to be a sufficient estimator, if it uses all the information contained in the sample, to estimate the population parameter.
12.16 INTERVAL ESTIMATE
An interval estimate of a population parameter is the specification of two values between which we have a certain degree of confidence then actual population parameter lies. It can be otherwise called confidence internal estimation. To evaluate the same, we required the value for the confidential level or the level of significance.
Population parameter: μ
Sample parameter: , s, n
Level of significance: 5%
Test statistic: Ζ
Table value of the test statistic: Z_{t}
Then the interval estimation of the population parameter μ can be defined as where ; if σ is known if not .
Then ; [since σ is not known]
There is a 95% confidential level for the population parameter μ to lie in the interval
This clearly indicates that there is a 5% chance for the population mean μ not to lie in the defined internal estimate.
12.17 CONFIDENCE INTERVAL ESTIMATION FOR LARGE SAMPLES
For business application it is not sufficient merely to consider the single point estimate of the population parameter. Instead we require an estimation procedure that permits some error in the estimate with the given level of accuracy. In classical inference such a method incorporates the use of what is known as confidence interval estimation? We can discuss the same with respect to the population mean as the parameter of interest.
Consider the sampling distribution of [mean] of the random samples of size n. From a normal population with mean μ and known variance σ^{2}, that is, Ν [μ, σ^{2}] the same can be defined in the standard form as, transferred with respect to the Zstatistic.
If we permit the error percentage as a, we say the level of significance is α.
We can assert with the probability [1 – α ] that normal random variable will lie in between –Z_{α} and +Z_{α}.
The same can be written symbolically,
Equation [1] reveals that μ is contained in the interval between and its probability equal to [1 – α]. The interval is referred as the confidential interval for μ, and [1 – α] is called the degree of confidence since μ is contained in the given interval with probability value [1 – α].
Hence, the probability of the value of μ to lie in the interval is [1 – α].
Note:
If the sample size is large enough say n ≥ 30, then the sample is said to be a large sample. If not it is referred as a small sample [n < 30].
Example: 8
As a part of the National Health and Nutrition Examination Survey [NHANES], haemoglobin levels were checked for a sample of 1139 men age 70 and over. The sample mean was 145.3/Li and the standard deviation was 12.87 g/Li. Use these data to construct a 95% confidence interval for μ.
Step 1:
Given α = 0.05 [since 1 – 0.95 = 0.05]
s = 12.87/Li; n = 1139; = 145.3/Li
Since, n = 1139 > 30; it refers a large sample.
According to the standard normal table when α = 0.05, the value of Z_{α} = Z_{0.05} = 1.96.
Step 2:
The interval estimation can be given as ± Z_{t} * SE[].
Step 3:
Step 4:
Use the value for , Z_{α} and SE[], we have
The required confidence interval of estimation with 95% confidence level for the average haemoglobin level is
Note:
There is a very close association between the length of interval where in which μ lies and the level of significance α. Whenever α decreases, the length of the interval where in which μ lies is also increases.
If we want to increase the chance of the value of μ to lie in the estimated interval try to choose α minimum.
Suppose for the above problem, if we assure the value of α = 0.
We have Z_{α} = Z_{0} = 3.
Hence the interval estimation becomes,
Since α = 0; There is a 99.73% assured chance for the population mean μ to lie in the interval [132.3841, 158.2159].
Note: 1
It is obvious that in the above problem the interval estimation when α = 0.05 lies well within the interval estimation when α = 0.
Note: 2
When σ is not known, we can make use of the sample SD[s]. Then the interval estimation formula reduces to
Confidence limits for μ, [μ_{1} – μ_{2}], Ρ and [Ρ_{1} – Ρ_{2}] for large random sample
SE, Standard Error; CL, Confidence Limits; α = 10%; Z_{0.1} = 1.645.
Example: 9
Researchers measured the bone mineral density of the soibes of 94 women who had taken the drug CEE. The mean was 1.016 g/cm^{2} and the standard deviation was .155 g/cm^{2}. A 95% confidence interval for the mean is [.948, 1.048]. True or false.
Step 1:
Given α = 0.05
s = 0.155; n = 94; = 1.016
Since, n = 94 > 30; it refers a large sample.
According to the standard normal table when α = 0.05, the value of Z_{α} = Z_{0.05} = 1.96.
The interval estimation can be given as
Step 3:
Step 4:
Use the value for , Z_{α} & SE[], we have
Step 5:
The required confidence interval of estimation with 95% confidence level is μ: [0.9847, 1.0473]
The given interval is exactly coinside with the evaluated one. There is a 95% for the population to lie in the interval [0.9847, 1.0473].
12.18 CONFIDENCE INTERVALS FOR DIFFERENCE BETWEEN MEANS
Example: 10
The following table summarizes the sucrose consumption [mg in 30 minutes] of black blowflies injected with Pargyline or saline [control].
Saline  Pargyline  

n

900

905


14.9

46.5

S

5.4

11.7

Construct [a] 95% confidence interval; [b] a 90% confidence interval for the difference in population means.
Step 1:
Given α = 0.05,
Since, both the samples are large, the table value of Z_{0.05} = 1.96
Sample1 
Sample2 

Blowflies injected with saline 
Blowflies injected with Pargyline 
n1 = 900 
n2 = 905 
= 14.9 
= 46.5 
s_{1} = 5.4 
s_{2} = 11.7 
Population – 1 
Population – 2 

Mean = μ_{1} 
Mean = μ_{2} 
Step 2:
The interval estimation can be given as
Step 3:
Use the values of _{1}, _{2}, Z_{α} and SE, we have
Step 4:
Thus, 30.756 and 32.44 are the lower and upper bounds, respectively, of the 95% confidence interval for .
12.19 ESTIMATING A POPULATION PROPORTION
Example: 11
In a sample of 400 population from a village, 230 are found to be eaters of vegetarian items and the rest nonvegetarian items. Estimate the population proportion based on 5% level of significance?
Step 1:
Given α = 0.05
Since the sample is large, the table value of Z_{0.05} = 1.96
Sample proportion ; q = 1 – p = 0.425; n = 400
Step 2:
The interval estimation for the population proportion can be given as
Step 3:
Step 4:
Use the values of p, Z_{α} and SE[p], we have
Step 5:
There is a 95% chance for the population proportion to lie in the interval [0.527, 0.623].
Example: 12
A cultivator in bananas claims that in a random sample of 700 bananas contained 45 defective bananas. Estimate the population proportion based on 1% level of significance?
Step 1:
Given α = 0.01
Since the sample is large, the table value of Z_{0.01} = 2.58.
Step 2:
The interval estimation for the population proportion can be given as
Step 3:
Step 4:
Use the values of p, Z_{α} and SE[p], we have
Step 5:
There is a 95% chance for the population proportion to lie in the interval [0.0475, 0.0811].
Finite population
Example: 13
The central government is interested in evaluating the number of fortune 500 manufacturing firms that plan to ‘fight inflation’ by following certain voluntary wage – price guidelines. A sample of 100 of the firms is taken, and 20 said they do not follow any of these guidelines.
Determine 90% confidence interval for the percentage of fortune 500 firms that do not follow the guide lines.
Step 1:
Given α = 0.1
Since the sample is large and finite, the table value of Z_{ 0.1} = 1.645
Sample proportion = 0.2; q = 1 – p = 0.8; n = 100; N = 500
Step 2:
The interval estimation for the population proportion can be given as p ± Z_{α}* SE[p]
Step 3:
Step 4:
Use the values of p, Z_{α} and SE[p], we have
Step 5:
Thus, 14.11% and 25.89% are the lower and upper bounds, respectively, of the confidence interval.
Example: 14
A random sample of size 10 is drawn without replacement from a finite population of 30 units. If the number of defective units in the population be 6, find the SE[p].
Step 1:
Given: n = 10
Ν = 30 [finite population]
Ρ = 6/30 = 1/5 = 0.2
Q = 1 – P = 0.8
Step 2:
Step 3:
The value of SE[p] is 0.105.
12.20 ESTIMATING THE INTERVAL BASED ON DIFFERENCE BETWEEN TWO PROPORTIONS
Example: 15
A sample survey of citizens in a VillageA gives that out of 1000 members interviewed, 420 members were found to be vegetarians. In another survey, conducted VillageB, 370 out of 1000 members were vegetarians. Construct a 99% confidence interval for the true difference in the proportion of favourable responses in the two villages.
Step 1:
Given,
Sample1
Sample2
Step 2:
Step 3:
Step 4:
Use the value of p_{1}, p_{2}, Z_{α} and SE[p_{1} – p_{2}]0, we have
Since the value of probability value is > = 0; we discard the negative value.
Hence; [p_{1} – p_{2}]: [0, 0.1062].
Step 5:
Thus, 0 and 0.1062 are the lower and upper bounds, respectively, of the 99% confidence interval for [p_{1} – p_{2}].
12.21 CONFIDENCE INTERVAL ESTIMATION FOR SMALL SAMPLE
Example: 16
To study the conversion of nitrite to nitrate in the blood, researchers injected four rabbits with a solution of radioactively labeled nitrite molecules. Ten minutes after injection, they measured for each rabbit the percentage of the nitrite that had been converted to nitrate. The results were as follows.
 For these data, calculate the mean, the standard deviation and the standard error of the mean.
 Construct a 95% confidence interval for the population mean percentage.
Step 1:
Based on the given data evaluate the sample mean and the SD.
[Refer the sections Sec. 4.3; Sec. 5.6]
Mean = = 51
SD = s = 3.1948
n = 4
∵ n = 4(< 30);it is a small sample. α = 0.05, df = ν = n – 1 = 4 – 1 = 3. The table value of t_{t}[0.05,3 df] = 3.1825.
Note:
Since the table value of t is given based on onetail test, while taking the table value based on twotail test, consider the value of α as [α /2]. Here α = 0.05, but consider α = 0.025.
Step 2:
The interval estimation can be given as,
Step 3:
Find SE[]
Step 4:
Use the values of ,t_{α}[v], and SE[], we have
Step 5:
The required confidence internal of estimation with 95% confidence level is μ : [ 45.123, 56.87]
Example: 17
A sample of 20 fruit fly [Drosophila melanogaster] larva was incubated at 37°C for 30 minutes. It is theorized that such exposure to heat causes polytene chromosomes located in the salivary glands of the fly to unwind, creating puffs on the chromosomes arm that are visible under a microscope. The following normal probability plot supports the use of a normal curve to model the distribution of puffs. The average number of puffs for the 20 observations was 4.30, with a standard deviation of 2.03; construct a 95% confidence interval for μ.
Step 1:
Given the data
Sample
Mean = = The average number of puffs = 4.3
Since n < 30; implies it refers a small sample. α = 0.05, df = 20 – 1 = 19. The table value of t_{t} [0.05,19 df] = 2.093.
Step 2:
The interval estimation can be given as
Step 3:
Find SE[]
Step 4:
Use the value of , t_{α}[v], and SE[], we have
Step 5:
The required confidence interval of estimation with 95% confidence level is μ : [3.3253,5.2747].
Example: 18
Experimenters test two types of fertilizer for possible use in the cultivation of cabbages. They grow cabbages in two different fields. One of the two fertilizers is applied in each field. At harvest time, they select a random sample of 25 cabbages from the crop grown with fertilizer1 and randomly selected 12 cabbages from the crop grown with fertilizer2. The sample mean and variance of weights of cabbages grown with fertilizer1 are 44.1 g and 36 g. The mean weight computed from the second sample is 31.07 g and the variance is 44 g. The experiments assume that the two population weights are normally distributed. They also assume that the two population variances are equal. Compute 95% confidence interval for [μ_{1} – μ_{2}].
Step 1:
Given,
Sample1 
Sample2 

=44.1  = 31.7 
s_{1}^{2} = 36 
s_{2} ^{2} = 44 
n_{1} = 25 
n_{2} = 12 
Sample1 and Smaple2 are small samples.
Step 2:
The interval estimation can be given as
Step 3:
Use the values of _{1}, _{2}, t_{α} and SE, we have
Hence, the required confidence interval of estimation with 95% confidence level based on difference of two means can be given as [7.8459, 16.9541].
Example: 19
Ferulic acid is a compound that may play a role in disease resistance in corn. A botanist measured the concentration of soluble Ferulic acid in corn seedlings grown in the dark or in a light/dark photoperiod. The results [nmol acid per g tissue] were as shown in the table.
Dark  Photoperiod  

n

4

4

92

115


S

13

13

Construct a 90% confidence interval for the difference in Ferulic acid concentration under the two lighting conditions. [Assume that the two populations from which the data came are normally distributed.]
Step 1:
Given,
Sample1 
Sample2 

=92 
= 115 
s_{1} = 13 
s_{2} = 13 
n_{1} = 4 
n_{2} = 4 
Sample1 and Smaple2 are small samples.
Step 2:
The interval estimation can be given as
Step 3:
Use the values of _{1}, _{2}, t_{α} and SE, we have
Hence, the required confidence interval of estimation with 95% confidence level based on difference of two means can be given as [2.376, 43.624].
Example: 20
A simple random sample of 10 electronics firms is asked in a questionnaire to state the amount of money spent on employee training programme during the year just ended and during a year a decade ago.
Construct a 95% confidence interval for the mean difference in expenditures for employee training programme by the 10 firms.
Step 1:
Based on the given data, find the mean difference d = x – y; then find mean and SD based on the values of d.
Note:
We can chose either [x – y] or [y – x] as d; provided the sum of d should be positive.
The interval estimation can be given as
Find
Step 3:
Use the values of [], t_{α}, and SE[], we have
Step 4:
The required confidence interval of estimation with 95% confidence interval with 9 df is μ_{d}: [–0.0861, 3.0861]
12.22 DETERMINING THE SAMPLE SIZE
Deciding the proper sample size is an integral part of any sampling study where inferences need to be made.
Error
It is defined as the absolute difference between the parameter being estimated and the point estimate obtained from sample.
Evaluation of sample size for a mean
Known elements: σ^{2},
To be estimated: μ ~ N [μ, σ^{2}]
The error can be defined as,
By definition
Equations [1] & [2] implies that,
Squaring on both sides of [3], we have
Thus, [4] gives the sample size required to attain the tolerable error with the required degree of confidence.
Note 1:
When σ^{2} is not known, we can make use of the sample variance s^{2} and the sample size n is defined as
The value it can be referred from the ttable minimum level of significance α and [n – 1] degrees of freedom.
Note 2:
The sample size for a proportion can be defined as
when Ρ is not known can be assumed as Ρ = 0.5.
Note 3:
For a two sample case, [n_{1} = n_{2} = n] the size of the sample can be defined as
where d is equal to one half the width of the desired confidence interval and assume that n_{1} = n_{2} = n.
Note 4:
For a two sample proportions can be defined as
where d is equal to one half the width of the desired confidence interval and assume that n_{l} = n_{2} = n.
Example: 21
Evaluate the sample size n to find 90% confidence interval for the purchase price of TVS in various retail stores in a given area such that the sample mean will differ by no more than 25. Assume that σ is known and equal to 35/.
Given:
Step 2:
The sample size should be minimum 6 in order to attain the error factor 25 with the required 90% confidence level.
Example: 22
A researcher wishes to know whether the mean length of employment with the current firm at time of retirement is different for men and women. The researcher would like to have a confidence interval estimate of the difference between the population means. The specifications are a confidence interval width or 1 year and 95% confidence. Pilot samples yielded variances of 5 and 7. The researcher wants sample of equal size. What size sample should be drawn from each population?
Step 1:
Given α = 5% = 0.05
Step 2:
Step 3:
We needed a sample of at least 185 men and an independent sample of at least 185 women is needed.
A cigarette manufacturer wished to conduct a survey using a random sample to estimate the proportion of smokers who would switch to the company’s newly developed lowbar brand. The sampling error should not be more than 0.02 above or below the actual proportion, with a 99% degree of confidence.
Step 1:
Given α = 0.01
Step 2:
Hence, the minimum sample size should be at least 4161 members in order to attain the error 0.02 with the required 99% confidence level.
Example: 24
The weight of cement bags follows a normal distribution with SD 0.2 kg. Find how large the value of n should be taken so that error can be plus or minus 0.05 of the actual value with a confidence level of 90%.
Step 1:
Error = 0.05
Step 2:
Then the value of n can be given as
The sample size should be at least 44, so that the mean weight of cement bags can be estimated within ± 0.05 kg of the actual value with a 90% confidence level.
Example: 25
For two populations of consumers, a researcher wants to estimate the difference between the proportions, who have used a particular brand of coffee. A confidence coefficient of 0.95 and an interval width of 0.10 are desired. Estimates of p_{1} & p_{2} are 0.20 and 0.25, respectively. How large should the sample size be [n_{1} = n_{2}]?
Step 1:
Given that
Step 2:
The researcher should draw a sample size of at least 534 from each population.
Example: 26
A medical researcher proposes to estimate the mean serum cholesterol level of a certain population of middleaged men, based on a random sample of the population. He asks a statistician for advice. The ensuing discussion reveals that the researcher wants to estimate the population mean to within ± 6 mg/dL or less, with 95% confidence. Thus, the standard error of the mean should be 3 mg/dL or less. Also, the researcher believes that the standard deviation of serum cholestrolin the population is probably about 40 mg/dL. How large a sample does the researcher need to take?
Step 1:
Given that α = 0.05; σ = 40; SE = 3
Step 2:
We know that
That is
The researcher should take a sample size of 178.
EXERCISES
 A zoologist measured tail length in 86 individuals, all in the oneyear age group, of the Deermouse peromyscus. The mean length was 60.43 mm and the standard deviation was 3.06 mm. Can be 95% confidence interval for the mean is [59.77, 61.09].
 There is an old folk belief that the sex of a baby can be guessed before birth on the basis of its heart rate. In an investigation to test this theory, foetal heart rates were observed for mothers admitted to a maternity ward. The results [in beats per minute] are summarized in the table.
Construct a 95% confidence interval for the difference in population means.
 As part of a large study of serum chemistry in healthy people, the following data were obtained for the serum concentration of uric acid in men and women aged 18–55 years.
Serum Uric Acid [mmol/I] Men Women n530420.354.263S.058.051Construct a 95% confidence interval for the difference in population means.
 An agronomist measured the heights of n corn plants. The mean height was 220 cm and the standard deviation was 15 cm. Calculate the standard error of the mean if
 n = 25
 n = 100
 As part of study of the treatment of anemia in cattle, researchers measured concentration of selenium in the blood of 36 cows who had been given a dietary supplement of selenium [2 mg/day] for one year. The cows were all the same breed [Santa gertrudis] and had borne their first calf during the year. The mean selenium concentration was 6.21 μg/dL and the standard deviation was 1.84 μg/dL. Construct a 95% confidence interval for the population mean.
 In a study of larval development in the tufted apple budmoth [Playnota idaeusalis] an entomologist measured the head widths of 50 larvae. All 50 larvae had been reared under identical conditions and had moulted six times. The mean head width was 1.20 mm and the standard deviation was 14 mm. Construct a 90% confidence interval for the population mean.
 A group of 101 patients with endstage renal disease were given the drug epoetin. The mean hemoglobin level of the patients was 10.3 [g/dL], with an SD of 0.9. Construct a 95% confidence interval for the population mean.
 A pharmacologist measured the concentration of dopamine in the brains of several rats. The mean concentration was 1,269 ng/g and the standard deviation was 145 ng/g. What was the standard error of the mean if
 8 rats were measured?
 30 rats were measured?
 The diameter of the stem of a wheat plant is an important trait because of its relationship to breakage of the stem, which interferes with harvesting the crop. An agronomist measured stem diameter in eight plants of the Tetrastichon cultivar of soft red winter wheat. All observations were made three weeks after flowering of the plant. The stem diameters [mm] were as follows:
The mean of these data is 22.75 and the standard deviation is .238.
 Calculate the standard error of the mean.
 Construct a 95% confidence interval for the population mean percentage.
 For the 28 lamb birth weights, the mean is 5.1679 kg, the SD is .6544 kg and the SE is .1237 kg. Construct [a] a 95% confidence interval for the population mean [b] a 99% confidence interval for the population mean.
 Ferulic acid is a compound that may play a role in disease resistance in corn. A botanist measured the concentration of soluble ferulic acid in corn seedlings grown in the dark or in a light/dark photoperiod. The results [nmol acid per g tissue] were as shown in the table.
Dark Photoperiod n4492115S1313Construct the 95% confidence interval for the difference in Ferulic acid concentration under the two lighting conditions.
 Prothrombin time is a measure of the clotting ability of blood. For 10 rats treated with an antibiotic and 10 control rats, the prothrombin times [in seconds] were reported as follows:
Antibiotic Control n10102523S108Construct a 90% confidence interval for the difference in population means [Assume that the two populations from which the data came are normally distributed].
 A dendritic tree is a branched structure that emanates from the body of a nerve cell. In a study of brain development, researchers examined brain tissue from seven adult guinea pigs. The investigators randomly selected nerve cells from a certain region of the brain and counted the number of dendritic branch segments emanating from each selected cell. A total of 36 cells were selected, and the resulting counts were as follows:
Construct a 95% confidence interval for the population mean.
 In evaluating a forage crop, it is important to measure the concentration of various constituents in the plant tissue. In a study of the reliability of such measurements, a batch of alfalfa was dried, ground and passed through a fine screen. Five small [.3 g] aliquots of the alfalfa were then analyzed for their content of insoluble ash. The results [g/kg] were as follows:
For these data, calculate the mean, the standard deviation and the standard error of the mean.
 Six healthy threeyearold female Suffolk sheep were injected with the antibiotic Gentamicin, at a dosage of 10 mg/kg body weight. Their blood serum concentrations [μg/mL] of Gentamycin 1.5 hours after injection were as follows.
For these data, the mean is 28.7 and the standard deviation is 4.6;construct a 95% confidence interval for the population mean.
 Human betaendrophin [HBE] is a hormone secreted by the pituitary gland under conditions of stress. A researcher conducted a study to investigate whether a program of regular exercise might affect the resting [unstressed] concentration of HBE in the blood. He measured blood HBE levels, in January and again in May, on ten participants in a physical fitness program. The results were as shown in the table. HBE Level [pg/mL].
Construct a 95% confidence interval for the population mean difference in HBE levels between January and May.
 If N = 2696, n = 100 and the number of defectives in a sample is 5. Evaluate the 99% confidence interval for the proportion of defective articles in the whole batch.
 Doctors who have developed a new drug for the treatment of a certain disease treat a group of 400 patients suffering from the disease with the new drug. They treat another group of 400 patients with an alternative drug. At the end of two weeks, 320 of the patients receiving the new drug recover, whereas 240 of those taking the alternative drug recover. Construct the 95% confidence interval for the difference between the true proportions of patients who might be expected to responds to the two drugs.
 What are type I and type II errors in testing of hypothesis?
 Explain the following:
 Simple random sampling
 Stratified random sampling
 Systematic sampling
 Sampling is a necessity under certain conditions – illustrate by a suitable example.
 What are the types of hypothesis? Compare and contrast them.
 Explain in detail the steps involved in the testing of hypothesis.
 Distinguish between complete enumeration and sample survey.
 How far is the later more advantageous than the former and why?
 Briefly explain the principal steps involved in sample survey.
 Explain the concepts of sampling distribution and standard error.
 Discuss the role of standard errors in large sample survey.
 Explain briefly the reasons for the increasing popularity of sampling methods. Explain briefly any two methods of sampling which help us to obtain a representative sample.
 What do you mean by sampling? What are the types of sampling?
 A researcher is planning to compare the effects of two different types of lights on the growth of bean plants. She expects that the means of the two groups will differ by about 1 inch and that in each group the standard deviation of plant growth will be around 1.5 inches. Consider the guideline that the anticipated SE for each experimental group should no more than be onefourth of the anticipated difference between the two group means. How large should the sample be [for each group] in order to meet this guidelines?
 Data from two samples gave the following results:
Sample 1 Sample 2 n6124050S4.35.7Compute the standard error of and the range for the population mean with 5% level of significance.
 Compute the standard error of for the following data.
Sample 1 Sample 2 n1010125217S44.228.7  Compute the standard error of and the range for the population mean with 5% level of significance.
Sample 1 Sample 2 n574447S6.58.4  Suppose the sample sizes were doubled, but the means and SDs stayed the same, as follows. Compute the standard error of and the range for the population mean with 5% level of significance.
Sample 1 Sample 2 n10144447S6.58.4
ANSWER THE QUESTIONS
 Write short notes on sampling.
 The probability distribution referred by the sample statistic is known as_______________.
 Procedure for obtaining a sample from a prescribed population prior to collecting any data is referred as_______________.
 Parameter refers_______________the of the population.
 Parameter is otherwise known as_______________
 State any two advantages of sampling.
 State any two disadvantages of sampling.
 Define the term nonsampling errors.
 A sample can be classified in to_______________major types.
 2
 3
 4
 None
 State any two random sampling methods.
 State any two nonrandom sampling methods.
 Define the term sampling distribution.
 State the relationships between the sample statistics and the population parameter.
 High light the term ‘standard error’.
 The population is said to be finite, if it is_______________.
 countable
 uncountable
 None
 What do you mean by confidence interval?
 What do you mean by level of significance?
 Define the term table value for the test statistic.
 ‘When the sample statistics are know it is possible for us to evaluate the range for the population mean’ – Comment on this_______________.
 Deciding the proper_______________is an integral part of any sampling study_______________.
ANSWERS
 A sample is any subset of a given population. It is possible to estimate the population parameters from the limited sample parameters with the help of statistical methods and concepts. This falls under the category of statistical inference [Inductive statistics]. The inferential process is not error free. It is due to the fact that the estimation or inference is based on the limited sample data obtained from samples. The main purpose of sampling is to allow us to make use of the information gathered from the sample to draw influences about the entire population.
 Sampling distribution
 Sample design
 Characteristics
 Statistic
 Refer Section 12.6
 Refer Section 12.6
 Refer Section 12.7
 (a)
 Refer Section 12.9
 Refer Section 12.9.2
 Refer Section 12.12
 Refer Section 12.13
 The standard deviation of a sampling distribution is referred as standard error
 (a)
 Refer Section 12.17
 The permitted error % is known as level of significance
 The statistical table value for the statistical distribution referred based on the α level
 True
 Sample size