Tversky index

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Tanimoto coefficient (aka Jaccard index).

For sets X and Y the Tversky index is a number between 0 and 1 given by

,

Here, denotes the relative complement of Y in X.

Further, are parameters of the Tversky index. Setting produces the Tanimoto coefficient; setting produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then corresponds to the weight of the prototype and corresponds to the weight of the variant. Tversky measures with are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions [3] .

,

,

,

This formulation also re-arranges parameters and . Thus, controls the balance between and in the denominator. Similarly, controls the effect of the symmetric difference versus in the denominator.

Notes

  1. Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Review. 84 (4): 327–352. doi:10.1037/0033-295x.84.4.327.
  2. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
  3. Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.