13
2
The usual correlation coefficient (in 2d) measures how well a set of points can be described by a line, and if yes, its sign tells us whether we have a positive or negative correlation. But this assumes that coordinates of the points can actually interpreted quantitatively for instance as measurements.
If you cannot do that but you can still order the coordinates, there is the rank correlation coefficient: It measures how well the points can be described by a monotonic function.
Challenge
Given a list of 2d points, determine their rank correlation coefficient.
Details
- You can assume the input to be positive integers (but you don't have to), or any other "sortable" values.
- The points can be taken as a list of points, or two lists for the x- and y-coordinates or a matrix or 2d array etc.
- The output must be a floating point or rational type, as it should represent a real number between 0 and 1.
Definitions
Rank: Given a list of numbers X=[x(1),...,x(n)]
we can assign a positive number rx(i)
called rank to each entry x(i)
. We do so by sorting the list and assigning the index of x(i)
in the sorted list rx(i)
. If two or more x(i)
have the same value, then we just use the arithmetic mean of all the corresponding indices as rank. Example:
List: [21, 10, 10, 25, 3]
Indices sorted: [4, 2, 3, 5, 1]
The number 10
appears twice here. In the sorted list it would occupy the indices 2
and 3
. The arithmetic mean of those is 2.5
so the ranks are
Ranks: [4, 2.5, 2.5, 5, 1]
Rank Correlation Coefficient: Let [(x(1),y(1)),(x(2),y(2)),...,(x(n),y(n))]
be the given points where each x(i)
and y(i)
is a real number (wlog. you can assume it is an integer)
For each i=1,...,n
we compute the rank rx(i)
and ry(i)
of x(i)
and y(i)
respectively.
Let d(i) = rx(i)-ry(i)
be the rank difference and let S
be the sum S = d(1)^2 + d(2)^2 + ... + d(n)^2
. Then the rank correlation coefficient rho
is given by
rho = 1 - 6 * S / (n * (n^2-1))
Example
x y rx ry d d^2
21 15 4 5 -1 1
10 6 2&3 -> 2.5 2 0.5 0.25
10 7 2&3 -> 2.5 3 -0.5 0.25
25 11 5 4 1 1
3 5 1 1 0 0
rho = 1 - 6 * (1+0.25+0.25+1)/(5*(5^2-1)) = 0.875
From wikipedia : "Only if all n ranks are distinct integers, it can be computed using the popular formula"
– rahnema1 – 2017-08-10T05:09:27.090What do you want to say with that? – flawr – 2017-08-10T11:32:02.980
I say that the formula that you provided is for the special cases where the ranks are integers according to wikipedia. However you used the formula for the ranks such as
2.5
. – rahnema1 – 2017-08-10T11:34:18.387Well that is if you're using integers in the first place. And even if you're doing so, you're still gonna get a good approximation. Many authors even use the formula of this challenge as a definition. Furthermore keep in mind that a ranking is unstable and does not necessarily have a such an impactful meaning as an usual correlation coefficient. But all this is irrelevant for this challenge. – flawr – 2017-08-10T12:12:09.900