Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.[1] It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed.[2] A value close to 1 tends to indicate the data is highly clustered, random data will tend to result in values around 0.5, and uniformly distributed data will tend to result in values close to 0 .

Preliminaries

A typical formulation of the Hopkins statistic follows.[2]

Let be the set of data points.
Consider a random sample (without replacement) of data points with members .
Generate a set of uniformly randomly distributed data points.
Define two distance measures,
the distance of from its nearest neighbour in , and
the distance of number of randomly chosen from its nearest neighbour in .

Definition

With the above notation, if the data is dimensional, then the Hopkins statistic is defined as:


Notes and references

  1. Hopkins, Brian; Skellam, John Gordon (1954). "A new method for determining the type of distribution of plant individuals". Annals of Botany. Annals Botany Co. 18 (2): 213–227.
  2. Banerjee, A. (2004). "Validating clusters using the Hopkins statistic". IEEE International Conference on Fuzzy Systems: 149–153. doi:10.1109/FUZZY.2004.1375706.
gollark: I see... three of them?
gollark: Some offense.
gollark: My browser's element inspector *says* you have some empty `<p>` tags, so it looks like you just managed to mess it up and the browser displays it in a vaguely coherent-looking way.
gollark: You seem to have some random unclosed `<b>` tags... in the middle of other stuff...
gollark: This... also isn't really spec compliant and should probably not parse.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.