Example: If the value of coefficient of correlation between two series is +.9 and its probable error is .0128, what would be the value of n?
Solution: P.E. (r) = .6745 (1-r2)/√n from the given data
0.0128 = .6745 (1-(.9)2)/√n
0.0128 = (.6745×(1 - .81))/√n=(.128155)/√3
.0128 = (.128155)/√n
.0128 = .128155 or √n = (.128155)/0.128 = 10
n = (10)2 = 100
Friday, 30 December 2011
Mathematical Properties of Coefficient of Correlation
The following are the important mathematical properties of the Coefficient of Correlation or r. .
The Coefficient of Correlation lies between —1 and +1. It cannot exceed unity.
Symbolically —1 ≤ r ≤ +1
Proof of the property is given below:
Let x and y denote the deviation of x and y series from their actual arithmetic average and ax and ay be their standard deviations respectively. Then,
The Coefficient Of Correlation.JPG
But (∑x2)/σ_(x2 ) = n because σ_(x2 ) = (∑x2)/n ∴ (∑x2)/σ_(x2 ) = (∑x2)/(∑x2 ) × n = n
Similarly (∑x2)/σ_(x2 ) = n and
(2∑xy)/σ_(x σ_y ) = 2nr because r = (2∑xy)/σ_(x σ_y )
As such ∑ ((∑x)/σ_x +(∑y)/σ_y )2 = n + n + 2nr = 2n + 2nr = 2n (1 + r)
But ∑ (x/σ_x +y/σ_y )2 is the sum of square of real quantities and as such
cannot be negative. At best it can be 0.
Now 2n (1+r) ≥ 0
Therefore r cannot be less than —1 or —1≤ r.similarly by expanding
∑ (x/σ_x +y/σ_y )2 it can be proved that this value would be 2n(1—r) and hence, r cannot be greater than + 1 or r≤ +1.
Hence—1 ≤ r ≤ - 1.
Sometimes it appears that the values of the various variables so obtained are inter-related. It is likely that such relationship may be obtained in two series relating to the heights and weights of a group of persons. It may be observed that weights increase with increase in heights- so that tall people are heavier than short sized people. Similarly, if the data are collected about the prices of a commodity and the quantities sold at different prices, two series would be obtained. One variable would be the various prices of the commodity and the other variable would be the quantities sold at these prices. In two such series we are again likely to find some relationship. With increase in the price of the commodity the quantity sold is bound to decrease. We can thus conclude that there is some relationship between price and demand. Such relationships can be found in many types of series, for example, prices and supply, heights and weights of persons, prices of sugar and sugarcane, ages of husbands and wives, etc.
The Coefficient of Correlation lies between —1 and +1. It cannot exceed unity.
Symbolically —1 ≤ r ≤ +1
Proof of the property is given below:
Let x and y denote the deviation of x and y series from their actual arithmetic average and ax and ay be their standard deviations respectively. Then,
The Coefficient Of Correlation.JPG
But (∑x2)/σ_(x2 ) = n because σ_(x2 ) = (∑x2)/n ∴ (∑x2)/σ_(x2 ) = (∑x2)/(∑x2 ) × n = n
Similarly (∑x2)/σ_(x2 ) = n and
(2∑xy)/σ_(x σ_y ) = 2nr because r = (2∑xy)/σ_(x σ_y )
As such ∑ ((∑x)/σ_x +(∑y)/σ_y )2 = n + n + 2nr = 2n + 2nr = 2n (1 + r)
But ∑ (x/σ_x +y/σ_y )2 is the sum of square of real quantities and as such
cannot be negative. At best it can be 0.
Now 2n (1+r) ≥ 0
Therefore r cannot be less than —1 or —1≤ r.similarly by expanding
∑ (x/σ_x +y/σ_y )2 it can be proved that this value would be 2n(1—r) and hence, r cannot be greater than + 1 or r≤ +1.
Hence—1 ≤ r ≤ - 1.
Sometimes it appears that the values of the various variables so obtained are inter-related. It is likely that such relationship may be obtained in two series relating to the heights and weights of a group of persons. It may be observed that weights increase with increase in heights- so that tall people are heavier than short sized people. Similarly, if the data are collected about the prices of a commodity and the quantities sold at different prices, two series would be obtained. One variable would be the various prices of the commodity and the other variable would be the quantities sold at these prices. In two such series we are again likely to find some relationship. With increase in the price of the commodity the quantity sold is bound to decrease. We can thus conclude that there is some relationship between price and demand. Such relationships can be found in many types of series, for example, prices and supply, heights and weights of persons, prices of sugar and sugarcane, ages of husbands and wives, etc.
correlation
A computer while calculating correlation coefficient between two variables X and 7 from 25 pairs of observations obtained the following results:
N=25, ∑X=125, ∑X2 = 650,
∑Y=100, 272=460, ∑XY=508.
It was, however, discovered at the time of checking that two pairs of observation were not correctly copied. They were taken as (6, 14) and (8, 6) while the correct values were (8, 12) and (6. 8) prove that the correct value of the correlation coefficient should be 2/3. (I.C.W.A., Final, 1977)
Solution: Corrected ∑X = 125-6-8+8+6= 125
Corrected ∑Y = 100—14—6+12+8 = 100
Corrected ∑X2= 650—62— 82+82+ 62 = 950
Corrected ∑Y2 = 460—142—62+122+82 = 436
Corrected ∑XY= 508—(6 x 14)—(8 x 6) + (8 x 12) +(6 x 8)=520
Now the Corrected value of the Coefficient of Correlation or
Corrected r = (N∑XY-(∑X)(∑Y))/(√(N∑X2- (∑X)2 ) √(N∑X2- (∑X)2 ))
= ((25×520)- (125×100))/(√(25×650-(125)2 ) √(25×436-(100)2 ))
= 500/√(625×900) = 500/(25×30) = 500/750 = 2/3
N=25, ∑X=125, ∑X2 = 650,
∑Y=100, 272=460, ∑XY=508.
It was, however, discovered at the time of checking that two pairs of observation were not correctly copied. They were taken as (6, 14) and (8, 6) while the correct values were (8, 12) and (6. 8) prove that the correct value of the correlation coefficient should be 2/3. (I.C.W.A., Final, 1977)
Solution: Corrected ∑X = 125-6-8+8+6= 125
Corrected ∑Y = 100—14—6+12+8 = 100
Corrected ∑X2= 650—62— 82+82+ 62 = 950
Corrected ∑Y2 = 460—142—62+122+82 = 436
Corrected ∑XY= 508—(6 x 14)—(8 x 6) + (8 x 12) +(6 x 8)=520
Now the Corrected value of the Coefficient of Correlation or
Corrected r = (N∑XY-(∑X)(∑Y))/(√(N∑X2- (∑X)2 ) √(N∑X2- (∑X)2 ))
= ((25×520)- (125×100))/(√(25×650-(125)2 ) √(25×436-(100)2 ))
= 500/√(625×900) = 500/(25×30) = 500/750 = 2/3
Calculation of Pearson's Coefficient of Correlation
Direct Method No. 1
The steps involved are as follows:
(1) Find the means of the two series (X and y1)
(2) Find the deviations of each item of a series from its mean (x and y). Here x = (X—X) and = (Y- Y)
(3) Square these deviations and total them (∑x2 and ∑y2).
(4) Multiply the respective deviations of the two series and total them (∑xy)
(5) Substitute the above values in the following formula:
r = (∑xy)/(√(n&(∑x2)/n )× √((∑y2)/n))=(∑xy)/(nσ1 σ2 )
r = (∑xy)/√(∑x2×∑y2 )
Solved Example :Calculate the coefficient of correlation from the following data by the Spearman's Rank Differences method:
Prices of
Prices of
Prices of
Prices of
Tea ($)
Coffee ($)
Tea ($)
Coffee ($)
75
120
60
110
88
134
80
140
95
150
81
142
70
115
50
100
Solution: Calculation of Coefficient of Rank Correlation
Prices of Tea
(X)
R1
Prices of
Coffee
(Y)
R2
R1 R2
(d)
d2
75
4
120
4
0
0
83
7
134
5
+2
4
95
8
150
8
0
0
70
3
115
3
0
0
60
2
110
2
0
0
80
5
140
6
—1
1
81
6
142
7
—1
1
50
1
100
1
0
0
n = 8
n = 8
0
∑d2 = 6
The steps involved are as follows:
(1) Find the means of the two series (X and y1)
(2) Find the deviations of each item of a series from its mean (x and y). Here x = (X—X) and = (Y- Y)
(3) Square these deviations and total them (∑x2 and ∑y2).
(4) Multiply the respective deviations of the two series and total them (∑xy)
(5) Substitute the above values in the following formula:
r = (∑xy)/(√(n&(∑x2)/n )× √((∑y2)/n))=(∑xy)/(nσ1 σ2 )
r = (∑xy)/√(∑x2×∑y2 )
Solved Example :Calculate the coefficient of correlation from the following data by the Spearman's Rank Differences method:
Prices of
Prices of
Prices of
Prices of
Tea ($)
Coffee ($)
Tea ($)
Coffee ($)
75
120
60
110
88
134
80
140
95
150
81
142
70
115
50
100
Solution: Calculation of Coefficient of Rank Correlation
Prices of Tea
(X)
R1
Prices of
Coffee
(Y)
R2
R1 R2
(d)
d2
75
4
120
4
0
0
83
7
134
5
+2
4
95
8
150
8
0
0
70
3
115
3
0
0
60
2
110
2
0
0
80
5
140
6
—1
1
81
6
142
7
—1
1
50
1
100
1
0
0
n = 8
n = 8
0
∑d2 = 6
Calculation Of Coefficient
Karl Pearson, the great biologist and statistician, has given a formula for the calculation of coefficient of correlation. According to it the coefficient of correlation of two variables is obtained by dividing the sum of the products of the corresponding deviations of the various items of the two series from their respective means by the product of their standard deviations and the number of pairs of observations.
Thus, if x1, x2, x3 ...... xn are the deviations of various items of the first variable from mean value and y1, y2, y3 …… yn are the corresponding deviations of the second variable from its mean value, the sum of the products of these corresponding deviations would be ∑xy. If further, the standard deviations of the two variables are respectively and if n is the number of pairs of observations, Karl Pearson's coefficient of correlation represented by r would be
It is clear from the above formula that if ∑xy is positive, the coefficient of correlation would also be a positive figure indicating positive correlation between the two series. If, on the other hand, ∑xy is negative, coefficient of correlation would also be negative, indicating that the correlation between the two series is negative, ∑xy would be positive, if generally, positive and negative deviations in one series are associated with positive and negative deviations in the other series also. The value of ∑xy would be negative, if generally, the positive deviations of one variable are associated with the negative deviations in the other variable and vice versa. If positive and negative deviations of one variable are indifferently associated with the deviations of the other variable the value of ∑xy would be 0 or near it, indicating absence of correlation between the two series. The value of this coefficient of correlation is always between + 1 and —1. It cannot exceed unity.
The above formula of Karl Pearson is based on the study of co-variance between two series. The co-variance between two series is written as follows: Co-variance = (∑xy)/n
Where x and y stand for the deviations of the two series from their respective means.
To study correlation, the co-variance of the two series is divided by (he product of their standard deviations. Thus, covariance of the two series
r = (covariance of the two series)/√((variance of series 1)(variance of series 2))
= Coveriance/(σ1× σ2 ) = (∑xy)/(nσ1 σ2 )
This formula is known as the Product Moment Formula of Coefficient of Correlation.
Thus, if x1, x2, x3 ...... xn are the deviations of various items of the first variable from mean value and y1, y2, y3 …… yn are the corresponding deviations of the second variable from its mean value, the sum of the products of these corresponding deviations would be ∑xy. If further, the standard deviations of the two variables are respectively and if n is the number of pairs of observations, Karl Pearson's coefficient of correlation represented by r would be
It is clear from the above formula that if ∑xy is positive, the coefficient of correlation would also be a positive figure indicating positive correlation between the two series. If, on the other hand, ∑xy is negative, coefficient of correlation would also be negative, indicating that the correlation between the two series is negative, ∑xy would be positive, if generally, positive and negative deviations in one series are associated with positive and negative deviations in the other series also. The value of ∑xy would be negative, if generally, the positive deviations of one variable are associated with the negative deviations in the other variable and vice versa. If positive and negative deviations of one variable are indifferently associated with the deviations of the other variable the value of ∑xy would be 0 or near it, indicating absence of correlation between the two series. The value of this coefficient of correlation is always between + 1 and —1. It cannot exceed unity.
The above formula of Karl Pearson is based on the study of co-variance between two series. The co-variance between two series is written as follows: Co-variance = (∑xy)/n
Where x and y stand for the deviations of the two series from their respective means.
To study correlation, the co-variance of the two series is divided by (he product of their standard deviations. Thus, covariance of the two series
r = (covariance of the two series)/√((variance of series 1)(variance of series 2))
= Coveriance/(σ1× σ2 ) = (∑xy)/(nσ1 σ2 )
This formula is known as the Product Moment Formula of Coefficient of Correlation.
Subscribe to:
Posts (Atom)