Tversky index

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Tanimoto coefficient (aka Jaccard index).

For sets X and Y the Tversky index is a number between 0 and 1 given by

,

Here, denotes the relative complement of Y in X.

Further, are parameters of the Tversky index. Setting produces the Tanimoto coefficient; setting produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then corresponds to the weight of the prototype and corresponds to the weight of the variant. Tversky measures with are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions [3] .

,

,

,

This formulation also re-arranges parameters and . Thus, controls the balance between and in the denominator. Similarly, controls the effect of the symmetric difference versus in the denominator.

Notes

  1. Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Review. 84 (4): 327–352. doi:10.1037/0033-295x.84.4.327.
  2. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html
  3. Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.
gollark: Never done macros.
gollark: I use `anyhow`, which allows me to magically store pretty much any error and add context to it and stuff, without having to have verbose conversion code.
gollark: This is because everything about it can fail at any time.
gollark: I feel like having convoluted `match` statements in my code for every operation would be very ææææ - in minoteaur there are sometimes even multiple `?`s per line.
gollark: Replying to https://discord.com/channels/346530916832903169/348702212110680064/751900012023250964`if let` is pattern matching.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.