Chapter 7: Correlation and Regression Analysis – Biostatistics

Chapter 7

Correlation and Regression Analysis

Objectives

After completing this chapter, you can understand the following:

  • The definition, meaning and significance of correlation coefficient, and rank correlation coefficient.
  • The construction of regression lines.
  • The utilization of the regression line concept to estimate the values.
  • Its implication towards the decision-making applications with respect to biological studies.
7.1 INTRODUCTION

We shall now study two [bivariate] or more variables [multivariate] simultaneously and make an attempt to find the relationship among the variables in quantitative/qualitative form. In reality, we have many such related variables such as crop per acre and fertilizer, height and weight, birth and death rate, blood pressure readings based on two different methods, age of elephants and annual maintenance cost, quantum of pesticides applied and intensity of food poisoning, dietary component and plasma lipid level, size of crops and percentage of worms, age and blood pressure, and antibiotics and bacteria.

This methodology of studying the strength of relationship among the variables is given by Sir Francis Galton and Karl Pearson.

 

7.2 CORRELATION

It is a statistical measure used to evaluate the strength and degree of relationship among the two or more variables under study. Here the term ‘relationship’ is used to measure the tendency of the variables to move together. The movement of the variables may be in the same or opposite direction. The correlation is said to be positive if the variables are moving in the same direction, and negative if they are moving in the opposite direction. If there is no change in direction, it implies that the variables are not related.

It is classified into

  1. simple correlation,
  2. rank correlation and
  3. group correlation.

7.2.1 Simple Correlation/Correlation

This measure can be evaluated for a discrete series of quantitative in nature. It is denoted by the notation r. The value of r lies in the closed interval [–1 ≤ r ≤ 1]. If the value of r is towards 1, then variables are said to be positively correlated or directly related [if X increases, Y also increases and if X decreases, Y also decreases]. If it is towards –1, then it is said to be negatively correlated or inversely related [if X increases, Y will decrease and if X decreases, Y increases] and if it is 0, then the variables are said to be uncorrelated [the change in X does not affect the variable Y and vice-versa].

7.2.2 Rank Correlation

This measure can be evaluated for a discrete series of qualitative in nature. It is denoted by R. The value of R lies in the closed interval [–1 ≤ R ≤ 1].

7.2.3 Group Correlation

This measure can be evaluated for a continuous series of grouped data. It is denoted by r. The values of r lies in the closed interval [–1 ≤ r ≤ 1].

 

Note:

The larger the value of r, the stronger the linear relationship between Y and X. If r = –1 or r = +1, the regression line will include all data points and the line will be a perfect fit.

7.2.4 Assumptions for Karl Pearson’s Coefficient of Correlation

  1. The relationship between the two series [X and Y] is linear [the amount of variation in X bears a constant ratio to the corresponding amount of variation in Y].
  2. Either one of the series is dependent on the other or both are dependent on the third series.
  3. Correlation analysis is applied to most scientific data where inferences are to be made. In agriculture, amount of fertilizers and crops’ yields are correlated. In economics, prices and demand or money and prices. In medicine, use of cigarettes and incidence of lung cancer or use of new drug and the percentage of cases cured. In sociology, unemployment and crime or welfare expenditure and labour efficiency. In demography, wealth and fertility and so on.
  4. The correlation coefficient r, like other statistics of the sample, is tested to see how for the sample results may be generalized for the parent population.

7.2.5 Limitations of Correlation

  1. Interpretation of this analysis needs expertise regarding the statistical concepts and the background of data.
  2. Correlation in statistics is studied by scatter diagrams and regression lines/coefficient of correlation.

7.2.6 Properties of Correlation

  1. It is independent of any change of origin of reference and the units of measurement.
  2. Its value lies in the interval [–1, 1].
  3. It is a constant value, which helps to measure the relationship between two variables.

7.2.7 Scatter Diagram

The scatter diagram is a very valuable graphic device to show the existence of correlation between the two variables. Represent the variable X on the x-axis and Y on the y-axis. Mark the coordinate points [x, y]; then the existence of correlation can be studied based on the structure of the clustering of the coordinate points. The direction of scatter reveals the refuse and strength of the scatter correlation between the variables.

The scatter diagrams for r and 0 < r < 1 refers that the path is linear and the variables are moving in the same direction. This indicates the correlation is positive [the relationship between the variables is direct].

The scatter diagrams for r = –1 and –1 < r < 0 indicates that the variables are moving in opposite direction and the path is linear.

The scatter diagram for r = 0 indicates that the variables are not having any relation and the path is a curve.

7.3 KARL PEARSON’S COEFFICIENT OF CORRELATION

Consider the pairs of values [X1, Y1], [X2, Y2], … , [Xn, Yn] of the variables X and Y. Then, the covariance of these two variables X and Y can be defined as

The standard deviations of X and Y can be given by

The correlation coefficient r can be defined as

Equivalent alternate formulae for r

Value of r using assumed mean

To derive the result, we make use of the concept that the correlation coefficient is independent of choice of origin. Take Xi = [X – a] and Yi = [Y – b]. Where a is any one value of X and b is any one value of Y. Then

Example: 1

  1. In trying to evaluate the effectiveness of antibiotics in killing bacteria, a research institution compiled the following information.

    Calculate the correlation coefficient.

     

    Here n = 6; ΣX = 84; ΣY = 39.6

Direct method

Since the value of r is positive, it implies that the relationship between the antibiotics and bacteria is positively related and the association is 74%.

 

Example: 2

The following table shows the ages [X] and systolic blood pressure [Y] of 8 persons:

Find the value of r.

Here, n = 8; ΣX = 395; ΣY = 1,070.

The age and the blood pressure level are positively related with correlation 0.22.

 

Example: 3

In a study of the effect of dietary component on plasma lipid composition, the following ratios were obtained on a sample of experimental animals.

Obtain the correlation coefficient.

Let the variables X and Y refers the test score and the production rating, respectively.

Here n = 8; ΣX = 23; ΣY = 16.

The dietary components on plasma lipid composition are negatively related with correlation – 0.29.

 

Example: 4

Calculate Karl Pearson’s coefficient of correlation for the following data using 20 as the working mean for price and 70 as the working mean for demand:

Let the variables X and Y refers the level of price and demand, respectively.

The assumed means are given as a = 20 and b = 70.

Here, n = 9.

The correlation value is –0.828; it implies that the demand and the price are negatively related.

 Example: 5

A computer while calculating the value Y between two variables X [advertising expenditure] and Y [sales level] from 25 sets of values gives n = 25; ΣX = 125; ΣY = 100; ΣX2 = 650; ΣY2 = 460; and ΣXY = 508. At the time of checking, it was found that two sets of values were wrongly entered.

Evaluate the correct value of r.

Given,

n = 25; ΣX = 125; ΣY = 100; ΣX2 = 650; ΣY2 = 460 and ΣXY = 508. First, we have to find the corrected sums, that is, subtract the incorrect values and add the correct values from the total.

Corrected values:

Similarly proceeding,

Hence, the corrected value of the correlation coefficient is [2/3] or 0.67.

7.4 COEFFICIENT OF CORRELATION FOR A GROUPED DATA

In a grouped data, the information is given in a correlation table. In each compartment of the table, the deviations from the average of x and the average of y with respect to the corresponding compartment are multiplied and written within brackets. This outcome further multiplied with the frequency of that cell. Adding all such values lead to

Example: 6

The following table gives the distribution of total population and those who are totally are partially blind among them. Find out if there is any relation between age and blindness.

Age No. of persons in ‘000 Blind
0–10
100
45
10–20
60
40
20–30
40
40
30–40
36
40
40–50
24
36
50–60
11
22
60–70
6
18
70–80
3
15

Create a modified table which comprised the data % of blindness over the population.

Let A = 45; h = 10; n = 8.

There is a close positive correlation between age and blindness.

 

Example: 7

Find the coefficient of correlation between the ages of husbands and the ages of wives given here in the form of a two-way frequency table.

 

Age of husbands [in years]

h = 5; dx = XA/h; dy = YB/h; A = 32.5; B = 27.5.
Σfdxdy = 128; Σfdx = –85; Σfdy = –110
Σfdx2 = 145; Σfdy2 = 184; n = Σf = 95.

Note: Show that r lies between +1 and –1.

Let Xi = Xi and let Yi = Yi

Because each term in the RHS of [1] is perfect squares, it implies that LHS ≥ 0.

using [2] in [3], we have [1 – r2] ≥ 0; r2 ≤ 1

 

r ≤ + 1 and r 1; it implies that –1 ≤ r ≤ 1.

 

Hence, the correlation coefficient lies in the closed interval [–1, 1].

7.5 PROBABLE ERROR OF THE COEFFICIENT OF CORRELATION

Normally, we use sample data to evaluate correlation coefficient. So, whenever the result is interpreted, it is necessary to check the reliability of the evaluated sample correlation with the population’s coefficient. This is determined by probable error. It is evaluated using the result.

Probable error = 0.6745 * [standard error of r]

Where standard error of

Where r is the correlation coefficient and n is the number of pairs of items. The interpretation is that if P.E. of r = +/–a, where ‘a’ is a constant, then the range of the correlation of the population can be evaluated approximately as [ra, r + a].

This probable error calculation can be used only when the whole data are normal or near to normal. The selection of sample should be unbiased. In related to the probable error, the significance of the coefficient of correlation may be judged as follows:

The coefficient of correlation is significant, if it is more than six times the probable error or where the probable error is not much and r exceeds 0.5. It is not significant at all, if it is less than the probable error.

Example: 8

Calculate the correlation coefficient and its probable error from the following results:

And find its probable error.

Given,

By definition,

The correlation coefficient is 0.75; it implies that Y is positively related. The probable error of r is 0.0851.

 

Example: 9

Calculate the coefficient of correlation between X and Y.

X series Y series
No. of items
15
15
Arithmetic mean
25
18
Squares of deviation from mean
136
138

Sum of the product of deviations X and Y series from their respective means is 122.

Given,

X series Y series
n1 = 15
n2 = 15
= 25
= 18

By definition,

The relationship between the variables is positive.

 

Example: 10

Evaluate the correlation coefficient for the following data:

 

ΣX = 24; ΣY = 44; n = 4; ΣX2 = 164; ΣY2 = 574 and ΣXY = 306.

 

Consider the given data

 

ΣX = 24; ΣY = 44; n = 4; ΣX2 = 164; ΣY2 = 574 and ΣXY = 306.

 

By definition,

The variables are positively related.

7.6 RANK CORRELATION

Pearson’s correlation coefficient ‘Y’ gives a numerical measure of degree of relationship exists among the two variables X and Y. However, it requires the joint distribution of X and Y must be normal. These two things can be over cited by rank correlation coefficient based on the ranking of the variates. This was introduced by Charles Edward Spearman in 1904. It helps on dealing with qualitative characteristics such as beauty and intelligence. It is more suitable, if the variables can be arranged in order of merit. This is denoted by R.

Consider n pairs [X1, Y1], [X2, Y2], … , [Xn, Yn].

Rank the elements of X series by comparing each and every element of it.

Let it be R1, R2, … Rn.

Similarly for Y series, let it be S1, S2, … , Sn.

Similarly proceeding, we have

Similarly proceeding, we have

Note for repeated ranks

The above-given formula holds good, if the ranks are not repeated. For repeated ranks, say if a rank is repeated for m number of times, then the value [[m[m – 1]2]/12] should be added along with [Σdi2]. This must be carried over for each repeated ranks.

 

Merits of rank correlation coefficient

  1. It is simple to understand and easy to evaluate.
  2. It is very much useful for qualitative type of data.
  3. It can be evaluated also for a quantitative type of data.

Example: 11

  1. Two referees in a flower beauty competition rank the 10 types of flowers as follows:

    Use the rank correlation coefficient and find out what degree of agreement is there between the referees.

    n = 10. By definition,

    Since the given data set contains ranks, evaluate the difference in ranks.

    The rank correlation coefficient is positive; it implies that the variables are positively related.

Example: 12

Ten competitors in a flower beauty contest are ranked by three judges in the following order:

Use the rank correlation coefficient to determine which pair has the nearest approach to common taste in deciding flower beauty.

Since the data set contains ranks, first evaluate the rank correlation coefficient between [J1, J2], [J2, J3], and [J3, J1].

Judges 1 and 3 has the nearest approach to common taste in beauty.

 

Example: 13

Find the rank correlation coefficient of the following data:

Consider the data given and rank it.

Series A:

98 repeated for 3 times; the corresponding rank positions are 7, 8 and 9.

 

Rank [98] = [7 + 8 + 9]/3 = 8.

 

Series B:

73 is repeated for 2 times; the corresponding rank positions are 6 and 7.

 

Rank [73] = [6 + 7]/2 = 6.5

 

As per Spearman’s modified formula for repeated values, along with Σd2; for each repeated values, the element [[m [m2 – 1]]/12] should be added. Where m is the number of time the value is repeated.

Hence,

The variables are positively related.

 

Example: 14

The coefficient of rank correlation between marks in mathematics and statistics of a class is 9/11 and the sum of the squares of the differences in ranks is 30. Find the number of students in the class.

Given R = 9/11 and Σd2 = 30.

Find the value of n.

By definition,

Using the given values in the relation [1],

Comparing the values of the factors or both LHS and RHS, it implies that n = 10.

Hence, the number of students in the class is 10.

7.7 REGRESSION EQUATIONS

7.7.1 Regression

The word regression was first used by Sir Francis Galton in his investigation regarding heredity. Regression means stepping back. The term regression is not used in this sense in statistics. It is a mathematical measure that refers the mean relationship between two variables. This is used to predict the expected value of one variable if the value for another one is given. Among the two variables, one should be treated as independent variable and the other one is treated to be dependent.

The relationship stated above can be expressed in the form of a linear equation in two variables. Among the two variables say X and Y, at a time one can be treated as dependent on the other.

(a)X depends on Y (b)Y depends on X.

7.7.2 Regression Equation Y depends on X

Consider n pairs of data [X1, Y1], [X2, Y2], … [Xn, Yn] and let the linear equation representing these n data be

Multiply on both sides of [1] by X.

Take the summation on either side of [3],

[2] and [4] are two linear equations with two unknowns a and b.

Divide [2] by n on both sides, we have

Solving [1] and [5], we have

By definition,

Comparing [7] and [8], we have

using the value of a in [6],

[9] is the required regression equation Y on X.

It is used to estimate the most likely values of Y when the X value is known.

Here, the value is called regression coefficient of the regression equation Y on X and can be denoted by bYX. Then, [9] can be expressed as

 

Y = bYX [X]

 

Similarly proceeding, we can get the regression Equation X depends on Y as

The value is called regression coefficient of the regression Equation X on Y and can be denoted by bXY. Then, [10] can be expressed as

 

X = bXY [Y]

 

[9] and [10] are the required two regression equations.

Multiplying the like sides of we have

Note:

  1. The value of the variances of and are always positive.
  2. The two regression equations [9] and [10] imply that the two lines are passing through the common point [,].
  3. To get the value of the two means, it is sufficient to solve the given two regression equations.

Example: 15

Blood pressure readings by two different methods were made in 10 patients with essential hypertension. The systolic readings by the two methods are shown in the following table. The clinician wished to investigate the relationship between the two measurements. You are required to find out whether there is any correlation between the two methods of measurement. Is it positive or negative? Is it high or low? Also construct the two regression lines.

  Systolic blood pressure readings [mm Hg] by two methods in 10 patients with essential hypertension
Patient Method 1 Method 2
1
132
130
2
138
134
3
144
132
4
146
140
5
148
150
6
152
144
7
158
150
8
130
125
9
162
160
10
168
150

Let X and Y be the two random variables referring blood pressure reading based on method 1 and method 2, respectively. Evaluate the necessary summations using the given data.

Here n = 10; ΣX = 1,478; ΣY = 1,415

The correlation is positive and high.

By definition,

Similarly,

The regression equation Y on X is

 

The regression equation X on Y is

[1] and [2] are the required two regression equations.

 

Example: 16

Construct the regression lines between pesticides and food poisoning. Find the value of Y when X = 10.

Quantum of pesticides applied [in Kg] X Intensity of food poisoning Y
17
36
13
46
15
35
16
24
6
12
11
18
14
27
9
22
7
2
12
8

Evaluate the necessary summations using the given data.

Here, n = 10; ΣX = 120; ΣY = 230

By definition,

Similarly,

The regression equation Y on X is

The regression equation X on Y is

[1] and [2] are the required two regression equations.

Given x = 10, to find the value of y.

Put X = 10 in equation [1]; Y = 2.5 * 10 – 7 = 18.

When the pesticides level X = 10, the corresponding intensity level of food poisoning Y is 18.

Example: 17

The following table shows the methyl mercury intake and whole blood mercury values in 10 subjects exposed to methyl through consumption of contaminated fish.

Methyl mercury intake [mg Hg/day] X Mercury in whole blood [mg/g] Y
180
90
200
120
230
130
410
290
600
310
550
300
580
175
600
380
250
70
115
100

You are required to construct the two regression equations. Also evaluate the value of X given Y = 295. Evaluate the necessary summations using the given data.

Here, n = 10; ΣX = 3,715; ΣY = 230

By definition,

Similarly,

The regression equation Y on X is

The regression equation X on Y is

[1] and [2] are the required two regression equations.

Given Y = 295, to find the value of X.

Put Y = 295 in [2]; X = 1.49 * 295 + 79.53 = 519.08.

When the mercury in whole blood level Y= 295 mg/g, the corresponding value of methyl mercury intake X is 519.08 mg Hg.

 

Example: 18

The correlation coefficient between supply [Y] and price [X] of a commodity is 0.60. If σX = 150, σY = 200, mean [X] = 10 and mean [Y] = 20. Find the equations of the regression lines of Y on X and X on Y.

[MBA, 1998]

Given Y = 0.6; σX = 150, σY = 200, mean [X] = 10 and mean [Y] = 20.

By definition,

The regression equation Y on X is

The regression equation X on Y is

The regression equation Y on X is Y = 0.8X + 12.

The regression equation X on Y is X = 0.45Y + 1.

 

Example: 19

In a partially destroyed laboratory record of an analysis of correlation data, the following results only are legible:

Regression equations: 8X – 10Y + 66 = 0; 40X – 18Y = 214.

 

What were

  • the mean values of X and Y.
  • The correlation coefficient between X and Y.
  • If σX2 = 9, find the value of σY

[MBA 1999]

Consider the two regression equations,

 

 

We have to choose one equation for X on Y and the other one for Y on X.

Since the magnitude of coefficient of Y in [1] is dominating the magnitude coefficient of X, choose [1] for Y on X and [2] for X on Y.

 

[1] can be rewritten as,

[2] can be rewritten as,

Comparing [4] with the actual equation

 

Y = bYX * X + C1

 

we have, bYX = 0.8

In the same way, comparing [4] with the actual equation

 

X = bXY * Y + C2

 

we have, bXY = 0.45

By definition,         [5]

and         [6]

Multiplying the like sides of [5] and [7] we have,

Since both the regression coefficients are positive, the value of the correlation coefficient must be positive.

Hence, the value of correlation coefficient is 0.6.

To get the mean values of X and Y, solve the two given [1] and [2] for X and Y. The value of X is taken to be the mean value of X and the value of Y is taken to be the mean value of Y.

 

 

5 * [1] – [2] implies that –32Y = –544; Y = 17.

 

Using the values of Y = 17 in [1] we have X = 13.

Hence,

The mean of X is 13 and the mean of Y is 17.

Given σX2 = 9.

Using the value of σX and Y in [5],

Note:

In the situation of dominancy among the coefficients of the variables are not existing purely, choose any one of the equation for Y on X and the other one for X on Y based on trial and error basis. This selection should satisfy the condition bYX * bXY ≤ 1. If this condition fails, then revert the selection and proceed.

 

Example: 20

Two lines of regressions are given by x + 2y = 5 and 2x + 3y = 8. Calculate the value of mean of x, mean of y and r.

Consider the given regression equations,

There is no pure dominance existing among the two variables in both the equations. Clearly the coefficient of Y dominates in terms of magnitude in both the equations. Choosing [1] for Y on X based on trial and error method,

[3] implies that byx = –0.5

Choose the second equation for X on Y.

Then we have, bxy = –1.5

 

bxy *byx = [–3/2][–1/2] = 3/4 ≤ 1

 

Hence, the selection is correct. [If bxy * byx > 1, change the selection of equation for Y on X and X on Y then proceed.]

By definition,

Multiplying the like sides of [5] and [6] we have,

 

r2 = [–0.5 ] * [–1.5] = 0.75; r = ± 0.866.

 

Since both the regression coefficients are negative, the value of the correlation coefficient must be negative.

Hence, the value of correlation coefficient is –0.866.

To get the mean values of x and y, solve the two given [1] and [2] for x and y. The value of x is taken to be the mean value of x and the value of y is taken to be the mean value of y.

Multiplying [3] and [4] based on like sides,

Both bxy and byx are < 0; it implies that r value should be negative,

 

r = –0.866.

 

Solving [1] and [2], we have x = 1 and y = 2.

Hence, the mean of x = 1 and the mean of y = 2.

EXERCISES
  1. Distinguish between correlation coefficient r and rank correlation coefficient R.
  2. Analyse critically the assumptions underlying the Karl Pearson’s correlation coefficient.
  3. Calculate the coefficient of correlation between age group and rate of mortality from the following data.
  4. Ten competitors in a beauty contest are ranked by three judges. Find which pair of judges has the nearest approach to common taste in beauty.
  5. Given the regression lines as 3x + 2y = 26 and 6x + y = 31. Find their point of intersection and interpret it. Also find the correlation coefficient between x and y.
  6. If the Karl Pearson’s coefficient of correlation is 0.95 and the SD of x and y are 3 and 7, what is the covariance of x, y?
  7. Calculate Spearman’s coefficient of rank correlation for the following data:
  8. Find the rank correlation coefficient of the following data:
  9. Y is the weight of potassium bromide that will dissolve in 100g of water at X° C are given below. Fit an equation of the form Y = a + bx by the method of least square. Use this relation to estimate weight [Y] when X = 150°C.
  10. Assume that we conduct an experiment with eight fields planted with corn and four fields having no nitrogen fertilizer. The resulting corn yields are shown in the table as bags per acre:
    Field Nitrogen [Kg] Corn yields [bags/acre]
    1
    0
    12
    2
    0
    36
    3
    0
    6
    4
    0
    18
    5
    80
    128
    6
    80
    112
    7
    80
    112
    8
    80
    76
    • Compute a linear regression equation by least squares.
    • Predict corn yield for a field treated with 60 pounds of fertilizer.
  11. Find the linear regression equation of percentage worms [Y] on size of the crop [X] based on the following seven observations.
  12. The following table shows the ages [X] and systolic blood pressure [Y] of eight persons:

    Fit a linear regression equation of Y on X and estimate the blood pressure of a 70-year-old person.

  13. In trying to evaluate the effectiveness of antibiotics in killing bacteria, a research institution compiled the following information.

    Calculate the regression equation of bacteria on antibiotics. Estimate the probable killings of bacteria when the antibiotics are used in 20 mg.

  14. From the following data, ascertain whether the birth and death rate of fish that have been reared in the laboratory are correlated.
    Month Birth rate Death rate
    January
    100
    90
    February
    104
    95
    March
    110
    98
    April
    125
    100
    May
    130
    102
    June
    140
    115
    July
    145
    135
  15. Some health researchers have reported an inverse relationship between the central nervous system malformations and the hardness of the related water supplies. Suppose the data were collected on a sample of nine geographic areas with the following results.
    CNS malformation rate [per 1,000 births] Water hardness [ppm]
    9
    120
    8
    130
    5
    90
    1
    150
    4
    160
    2
    100
    3
    140
    6
    80
    7
    200

    Compute coefficient of correlation. What are your conclusions?

  16. The body weight [X lbs] and food consumption [Y, 350-day food consumption, lbs] of white leghorn is given in the following table:

    Show the relationship between body weight and food consumption.

  17. The following data give the yield of maize grain [in kgs] per plot of size 10 × 4 sq.m for different doses of nitrogen applications.

    Calculate the correlation coefficients and draw your interface.

  18. Calculate the correlation coefficient between height of father and son from the following data:
  19. Calculate the coefficient of correlation between age of elephants and annual maintenance cost.
    Age of elephants [years] Annual maintenance cost [rupees]
    2
    1,600
    3
    1,500
    5
    1,800
    9
    1,900
    8
    1,700
    10
    2,100
    12
    2,000
  20. The following are the results of some experiments:
    Age of fish [weeks] Fish reared [no.] Fish achieved [required weight]
    10–11
    200
    150
    11–12
    300
    250
    12–13
    50
    20
    13–14
    150
    110
    14–15
    100
    80
    15–16
    200
    190
    16–17
    250
    220

    Calculate the coefficient of correlation between age and fish achieved the required weight in the experiments.

ANSWER THE QUESTIONS
  1. ____________________ helps us to find the relationship among the variables in quantitative/qualitative form.
  2. This methodology of studying the strength of relationship among the variables is given by ____________________.
  3. ____________________ is a statistical measure used to evaluate the strength and degree of the relationship among the two or more variables under study.
  4. Correlation is classified into ____________________.
  5. The value of correlation [r] lies in the closed interval.
  6. ____________________ is used to find the association of the quantitative type of data.
  7. ____________________ is used to find the association of the qualitative type of data.
  8. If the data type is continuous, the association can be studied using the method of ____________________.
  9. State the properties of correlation.
  10. The ____________________ is a very valuable graphic device to show the existence of correlation between the two variables.
  11. The value of r can be computed using the relation.
  12. The standard error[r] = .
  13. The relationship for computing the ____________________ is [0.6745 *[standard error of r]].
  14. Define the term ____________________.
  15. The word ____________________ was first used ____________________ by in his investigation regarding heredity.
  16. ____________________ is used to predict the expected value of one variable if the value for another one is given.
  17. ____________________ is used to express the relationship exists between any two variables in the form of a linear equation.
  18. The structure of the regression equation can be given as ____________________.
  19. Both the regression coefficients bxy and byx should be of ____________________.
    • same sign
    • opposite in sign
    • none
  20. When the covariance is positive, then the values of both and are positive.
ANSWERS
  1. Bivariate or multi variate analysis
  2. Sir Francis Galton and Karl Pearson
  3. Correlation
  4. simple correlation, rank correlation and group correlation
  5. [–1 ≤r ≤ 1].
  6. Simple correlation.
  7. Rank correlation
  8. Group correlation
  9. Refer Section 7.2.6
  10. Scatter diagram
  11. [Covariance/{SD[x] * SD[y]}]
  12. Probable error
  13. Rank correlation
  14. Regression and Sir Francis Galton
  15. Regression
  16. Regression
  17. Same sign
  18. byx and bxy