Correlation in Statistics || Degree of Correlation || Karl Pearson's Coefficient of Correlation || Spearman's Rank Correlation || What is the difference between Probable Error and Standard Error?

Correlation


Q.1 What exactly is correlation? Describe the various types and degrees of correlation.

These variables are considered to be correlated if they fluctuate in such a way that changes in one cause changes in the other. A rise in the price of a commodity, for example, reduces demand for it, and vice versa. There is some relationship between the husband and wife's ages, rainfall, and productivity.

The variables are considered to be correlated if a change in one induces a commensurate change in the other. "Correlation is typically used to describe the analysis of covariation of two or more variables," explains A. M. Tuttle.

(i)                Positive and Negative Correlation: A positive correlation exists when changes in two linked series occur in the same direction, i.e., a rise in one variable is related with an increase in the other variable. Increase in father's age, for example, increases in son's age. If the two connected series vary in opposing directions, that is, a rise in one variable is associated with a drop in the other, the correlation is said to be negative.

(ii)              Linear and Non-Linear Correlations: A correlation is considered to be linear if the amount of change in one variable tends to carry a constant ratio of change in the other variable. If the variables in this series are marked on graph paper, we obtain a straight line. If the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable, the correlation is said to be non-linear or curvilinear.

For example, doubling the quantity of rainfall does not always increase the amount of produce.

 

(iii)            Simple, Partial, and Multiple Correlation: Simple correlation is used when only two variables are evaluated. Multiple correlations are defined as the study of the shared influence of two or more independent variables on one dependent data series. For example, if the effects of rain, soil, and temperature on potato yield per acre are investigated, there will be many correlations.

Partial correlation, on the other hand, recognizes more than two factors but considers just two variables to be affecting each other, with the influence of additional influencing variables held constant.

Degree of Correlation: The degree of correlation is used to understand the co-efficient of correlation.

The coefficient can have one of the following values — 

Q.2 Explain the Mathematical Methods for Determining Correlation.

The following methods can be used to calculate the correlation coefficient. –

(i)                Karl Pearson's Coefficient of Correlation (r): In practice, Karl Pearson's Coefficient of Correlation (r) is commonly utilized. The existence of linear relationships in both series is an assumption of Karl Pearson's coefficient of correlation. 

This approach is regarded as the best measure since it indicates the direction of change in data, whether positive or negative, as well as the degree of correlation, which should always be between +1 and -1.

Computation of Karl Pearson's Correlation Coefficient:

Direct Method

Short Cut Method 

(i)    Spearman's Rank Correlation: When measuring ideal numbers is impossible owing to a lack of numerical data, ranking figures are utilized. These rankings are decided by the magnitude of the data. The coefficient of rank correlation is calculated using rankings rather than actual data.

Charles Edward Spearman, a British Psychologist, invented this approach in 1904. This scale is particularly effective for assessing honesty, attractiveness, talent, and knowledge, among other things.

Rank Correlation Calculation:

(a) First, give rankings to the data from both series. The greatest value is designated as I, while the lowest value is assigned as 1. However, whether we begin with the lowest or greatest value, we must use the same procedure for both variables.

(b) Calculate the difference between two rankings (R1-R2), i.e. (D)

(c) Take the rank difference and square it, i.e. (D2)

In the event of Tied Ranks or Equal Ranks: If some values in a distribution are equal, the average rank is assigned to those items.

The correlation formula will be as follows:

M à No. of items where ranks are common.

 

(i)             Concurrent Deviation Way: This is the most basic method. It is helpful to understand the relationship between short-term changes in time series. It is not always required to know the exact degree of connection, but it is crucial to know if the correlation is positive or negative.

The concurrent Deviation technique is ideal for this purpose.

Correlation Calculation:

- Compare each item in both series to the item before it. If the second item is larger, use (+); if it is less, use (-); and if it is equal, use (=).

- Add the indicators of both series together.

- Add together the positive signals to get the (C) number of concurrent deviations.

Use the following formula:

Where –

C àNo. of concurrent deviation

N à No. of pairs – 1

Q.3 What is the difference between Probable Error and Standard Error? What are the guidelines for determining the significance of the coefficient of correlation?

The term "probable error" refers to an anticipated error in Karl Pearson's coefficient of correlation, which aids in determining the upper and lower limits of potential correlation. With the aid of probable error, it is possible to establish the dependability of the coefficient's value because it is dependent on random sampling conditions.

The likelihood of a mistake is calculated using the following formula:

Where –

            r à Coefficient of correlation.

            N àNo. of items

Standard Error: Standard Error is now favoured above probable error. In any random sample statistics, the degree of the standard error is the standard deviation of the sampling distribution. Other things being equal, it is critical that standard error be as tiny as feasible.

The lower the standard error, the more uniform the sample distribution.

The following are the rules for determining the significance of the coefficient of correlation:

(i)      It is significant if the coefficient of correlation is more than 6 times the probable error (r > 6 P.E).

(ii)              It is not important if r is smaller than P.E. (r < P.E).

(iii)            It is unimportant if r is smaller than 0.3 P.E. (r < 0.3 P.E).


Post a Comment

0 Comments

Close Menu