# Chapter 1: Introduction to Statistics and Its Biological Applications – Biostatistics

## Introduction to Statistics and Its Biological Applications

#### Objectives

After completing this chapter, you can understand the following:

• The definition, meaning and significance of Statistics.
• The role of statistics in biological studies.
• The two classifications of statistics.
• The different phases of decision-making process.
• The limitations of statistics.
##### 1.1 INTRODUCTION

The word ‘statistics’ is derived from the Greek word statistik. Its meaning is political state and the derivation suggests its origin. The administration of the state required the collection and analysis of data regarding population and property for purposes of war and finance. Nowadays, any field of social activity or scientific research finds statistics useful.

The term statistics has two meanings:  statistical principles and methods;  statistical data which have been developed to handle the data. When census reports are taken, a large number of data regarding the Indian population is found. They are ‘statistics’ in the first sense of the word. On the other hand, the methods of collecting the data, the way samples are chosen for measurement, the methods of classifying and tabulating the data collected the methods of analyzing them and correlating them, the methods of interpreting them etc., these form ‘the statistical methods’.

Statistics applied to biological problems is simply called as biostatistics or biometry.

The above-mentioned five stages are called phases of a statistical investigation.

Croxton and Cowden define statistics as, ‘it is the collection, presentation, analysis and interpretation of numerical data’.

According to Bowley, ‘Statistics may be called the science of counting’.

As per Boddington, ‘Statistics is the science of estimates and probabilities’.

Spiegel states that statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting and analyzing data as well as drawing valid conclusions and making reasonable decision on the basis of such analysis.

Explaining this definition, statistics can be said to include the study of the following:

Methods of collecting statistical data are done directly by researchers, through mail cards or indirectly from existing published sources.

Various methods are used to evaluate the reliability of the data.

#### 1.1.1 Sampling Methods

• Methods of classifying the data usefully and logically on the basis of quantity, quality, time or geographical regions.
• Methods of presenting the data in the form of easily understood tables, graphs and other diagrams.
• Methods of calculating average, measures of variation, skewness, correlation or association, to understand the basic characteristic of the data.
• Principles involved in interpreting the data, that is, forming valid conclusions by analyzing the data.
• Principles involved in forecasting on the basis of existing data.

#### Classification of statistics

The study of statistics can be classified into two broad areas namely: Descriptive statistics and Inferential statistics.

#### Descriptive statistics

It can be defined as a set of methods involving the collection, presentation, characterization and summarization of set of a data by means of numerical descriptions.

#### Inferential statistics

It can be defined as the set of methods that allow estimation or testing of a characteristic or attribute of a population, or the making of a judgment or decision concerning a population based only upon sample results.

##### 1.2 IS STATISTICS A SCIENCE?

Science is an organized body of knowledge, and statistics is the science of making decisions in the face of uncertainty. However, strictly speaking, statistics is not a science like the physical sciences. To quote Croxton and Cowden again, ‘Statistics should not be thought of as a subject correlative with physics, chemistry, economics or sociology. Statistics is not a science; it is a scientific method.’ Statistical methods are an indispensable tool for the research worker in all sciences: physical, biological, or social. Wherever there are numerical data, the methods of statistics are useful.

##### 1.3 APPLICATION OF STATISTICS IN BIOLOGY

Statistical methods are used in the collection, analysis and interpretation of quantitative data. Though these methods are used in every area of scientific investigation, they are especially useful to biologist. In fact there is no field where statistics does not come handy as a tool for efficient and effective management of biological data. Statistical application to biology can be viewed as follows:

In order to take a decision, the necessary data has to be collected and with the help of statistics one can make a decision. As per the recent trend, the biological investigations are mostly quantitative in nature in which a bigger array of biological observation consists of numerical facts called data. Certain objective methods are necessary to help the biologist in presenting and verifying the research data.

#### 1.3.1 Phases of the Statistical Decision-Making Process

Industry and government statisticians generally divide their tasks into different phases. They are study design, data collection, data analysis and action. The sequence of these phases is clearly stated in the following diagram. The phases and steps of the statistical decision-making process are as follows:

Study design Data collection Data analysis Action on results ##### 1.4 RESPONSIBILITY OF THE DECISION MAKER

Using statistics to solve problems in biological research requires the involvement of a number of different people. The person who knows the functional aspect of the problem is as important as the statistician or the researcher. The phases and steps discussed in the above-mentioned diagram states the important responsibilities of the manger and the statistician. Sharing of responsibilities for the statistical decision-making process

#### 1.5.1 Functions of Statistics

Statistical methods are a helpful device to understand the nature of any phenomenon, if the methods are used carefully.  • For example, statistics can simplify complex data. The marks of 5000 students in a college by themselves make little sense. But when averages are calculated and ratios such as mean marks, passing percentage etc. are evaluated, which give us a good idea of the students’ standards.
• In the same fashion, a diagram graphically describing the trend of sales or profits of a company gives us the level of functioning of the company. It can expand a persons experience and test the validity of conclusions which we form from such experience.
• Statistical methods can compare data and measure the relationship between two factors. For instance, the mere list of prices on a day has no significance. But if the same is compared with prices of the previous year by index numbers it is possible to know the price trend.
• With the help of statistical methods, one can also find out the relationship between rainfall and crop yield; money in circulation and price level; vaccination and immunity to disease and so on.
• With the help of statistical methods one can test the laws of other sciences. That is, to verify if the demand for a commodity falls when its price rises, referred to as ‘The law of Demand’, we use statistical data covering a number of commodities.
• In the same way one can verify whether cancer results from smoking, tuberculosis can be prevented by taking special medicines, eye defects are due to heredity, ammonium sulphate increases production of crops etc. by using statistical methods.
• Moreover statistical methods help in the formulation of government policies and business policies and in the evaluation of the achievement and progress by the country or company.

#### 1.5.2 Limitations of Statistics

Statistical methods have their own limitations which are as follows:

• Statistical methods cannot take concern of individual items.

They deal only with mass data and throw light on the characteristics of the entire group. We can know the average per capita income of a country by statistical calculations. But we cannot know the extent of the misery of a pauper. The mean mark of a class does not reveal the intelligence of its best student.

• A single statistic cannot determine the value of a group. It should be confirmed by other statistics and evidences.

Just because a particular school has a higher percentage of passes, one cannot conclude that its boys are more intelligent. One of the reasons may be they have stopped the below average students to take their final exam. In the same fashion, if two companies say A and B had the same profit this year, but the company A have had a higher profit last year and the other had a lower profit. This situation does not imply that the company B is progressing and that company A is declining; this year’s profit alone does not show it. In order to make any kind of conclusions based on statistical data, we should study their whole background and all the related data.

• Statistical methods can measure only quantitative data.

They cannot measure non-quantitative facts such as culture, friendship, health, skill, pessimism or honesty. Actually, to evaluate certain qualitative items we use related quantitative features, such as age to measure youth, marks for intelligence or income for prosperity.

• Statistical methods must be handled only by experts.

Statistical methods are a double-edged weapon and must be handled only by experts. If any one takes a decision with lack of expertise in statistics, it may lead to the wrong conclusion.

##### 1.6 DISTRUST OF STATISTICS

Because vested interests have misused statistics for selfish purposes and have got exposed later, people tend to distrust statistics. The popular distrust in statistics is generally expressed in the following remarks:

Statistics can prove anything. Statistics is like clay of which one can make a God or a devil as they like. In statistics we give importance only to the figures irrespective of who prepared them and how they were prepared. This particular aspect is exploited by interested parties; statistics is misused and wrong inferences presented to the people. Occasionally the statistical tool can be misused due to ignorance. In a usual situation, the data set given is not going to be verified in the sense as to whether it is reliable or not. A table generated with the false information will lead to otherwise. When false figures are expressed very precisely, people believe it blindly. Statistics is abused when faulty generalizations are made. This is due to lack of knowledge in the field of statistical methods and also due to individual bias. It is usual that if one comes across a number of such wrong inferences one tends to distrust all statistics.

Thus statistics is capable of being misused if handled unscientifically. It is a very useful tool but a very delicate tool. Like drugs, it may cause harmful results, if used badly. To utilize statistics as proper tool one should make sure that the figures are properly collected, are suitable for the problem under investigation, the complete background of the data is known and the inferences are logical.

#### 1.7.1 Law of Statistical Regularity

Study regarding a part of a population [sample] is possible and we can estimate statistically the characteristics of the whole of it. It is due to the occurrence of the regularity in life and nature. The number of times the faces are going to occur in an unbiased die out of 1000 trials will be approximately equal in numbers. In order to study the change in the wage rate of workers in India it is not necessary to study the entire workers of India. It is enough to study 25% of the population. Based on the outcome one can estimate exactly the changes in the earnings of all factory workers. The part of the population [sample] should be selected properly in such a fashion that it should include all factory workers in the study.

It can be concluded that from a very large population, a moderately large number of items is selected at random, then the sample selected is like to have the characteristics of the entire population from which the sample is selected. This is known as the Principle of Statistical Regularity. The concept of sampling exists based on this law. It also helps in making estimates for the future.

#### 1.7.2 Law of Inertia of Large numbers

The principle of large number is based upon a similar reasoning as the principle of statistical regularity. Regarding coin tossing, if we toss the coin for three times, we may get three heads or even three tails. If we do the experiment for larger number times say one million times, nearly half will be heads and half tails. This indicates that the large numbers are more stable than the smaller numbers. This clearly indicates that if the sample is bigger in size, the study results will be closer to the actual results of the population.

In statistics, inferences and forecasts are made because of the validity of the above-stated two laws. Occasionally if the forecast is wrong, it may be due to insufficient sample size.

##### EXERCISES
1. Define the term ‘statistics’.
2. Explain the business applications of statistics.
3. ‘Statistics can prove anything’ – comment on this statement.
4. State the limitations of statistics.
5. Why statistics is essential?
6. ‘Statistics cannot be viewed as science’ – comment on this statement.
7. Explain the principle of statistical regularity and the principle of large numbers and their importance in sampling.
##### ANSWER THE QUESTIONS
1. The word ‘statistics’ is derived from the_____________.
2. State the sampling methods.
3. Statistics can be classified as_____________.
4. Statistics is a Science.   (a) Yes   (b) No   (c) None
5. A statistical method doesn’t have any limitations.   (a) Yes   (b) No   (c) None
6. Nature of statistical laws are_____________.
7. Statistics applied to biological problems is simply called as_____________.
##### ANSWERS
1. Greek word Statistik
2. Classification and presentation
3. Descriptive statistics and Inferential statistics
4. Yes
5. No
6. Law of statistical regularity and law of inertia of large numbers
7. Biostatistics or biometry