Weissman score

The Weissman score is a fictional efficiency metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Mishra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up.[1][2][3][4] It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.

The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.[1]

Weissman score was used in Dropbox Tech Blog to explain real-world work on lossless compression.[5]

Example

This example shows the score for the data of Hutter Prize,[6] using the paq8f as a standard and 1 as the scaling constant.

Application Compression ratio Compression time [min] Weissman score
paq8f 5.467600 300 1.000000
raq8g 5.514990 420 0.720477
paq8hkcc 5.682593 300 1.039321
paq8hp1 5.692566 300 1.041145
paq8hp2 5.750279 300 1.051701
paq8hp3 5.800033 300 1.060801
paq8hp4 5.868829 300 1.073383
paq8hp5 5.917719 300 1.082325
paq8hp6 5.976643 300 1.093102
paq8hp12 6.104276 540 0.620247
decomp8 6.261574 540 0.63623
decomp8 6.276295 540 0.637726

Limitations

Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).

Examples

#Standard compressorScored compressorWeissman scoreObservations
Compression ratioCompression timeLog (compression time)Compression ratioCompression timeLog (compression time)
1 2.12 min0.301033.43 min0.4771211×(3.4/2.1)×(0.30103/0.477121)=1.021506 Change in unit or scale, changes the result.
2 2.1120 s2.0791813.4180 s2.2552731×(3.4/2.1)×(2.079181/2.255273)=1.492632
3 2.21 min03.31.5 min0.1760911×(3.3/2.2)×(0/0.176091)=0 If time is 1, its log is 0; then the score can be 0 or infinity.
4 2.20.667 min−0.1760913.31 min01×(3.3/2.2)×(−0.176091/0)=infinity
5 1.60.5 h−0.301032.91.1 h0.0413931×(2.9/1.6)×(−0.30103/0.041393)=−13.18138 If time is less than 1, its log is negative; then the score can be negative.
6 1.61.1 h0.0413931.60.9 h−0.0457571×(1.6/1.6)×(0.041393/−0.045757)=−0.904627
gollark: I guess that's conceivable, although it should break differently.
gollark: Oh! Maybe you have a SQLite version predating FTS somehow?
gollark: Oh, there was a hacky patch to Prologue for URL encoding, I think.
gollark: Nim's ecosystem is very immature, so I had to do a lot of hacky patching.
gollark: No, that's the other one.

See also

References

  1. Perry, Tekla (July 28, 2014). "A Fictional Compression Metric Moves Into the Real World". Retrieved January 25, 2016.
  2. Perry, Tekla (July 25, 2014). "A Made-For-TV Compression Algorithm". Retrieved January 25, 2016.
  3. Sandberg, Elise (April 12, 2014). "HBO's 'Silicon Valley' Tech Advisor on Realism, Possible Elon Musk Cameo". The Hollywood Reporter. Retrieved June 10, 2014.
  4. Jurgensen, John; Rusli, Evelyn M. (April 3, 2014). "There's a New Geek in Town: HBO's 'Silicon Valley'". The Wall Street Journal. Retrieved June 10, 2014.
  5. "Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend". Dropbox Tech Blog. Retrieved 2017-06-24.
  6. Hutter, Marcus (July 2016). "Contestants". Retrieved January 25, 2016.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.