Decode a variable-length quantity

16

1

A variable-length quantity (also referred to as VLQ or uintvar) is a way to encode up to a 28 bit integer value using only as many bytes as necessary. This was used in MIDI file format as a way to minimize the size of certain event data.

The way it works is fairly simple. As a big-endian series of bytes, the most significant bit (MSB) of each byte is a 1 to indicate that another VLQ byte follows. The remaining 7 bits of each byte make up the decoded value.

Example (from Wikipedia):

[ 0x86, 0xc3, 0x17 ] => 106903

enter image description here Additional references: Wikipedia, Some Guy.

Challenge:

Given a variable-length quantity, convert it to it's integer value.

Input:

A list of one to four bytes or a 32-bit value type representing a valid VLQ of an integer.

Output:

The integer value of the VLQ input.

Rules and scoring:

  • This is code-golf, so shortest answer in bytes for each language wins.
  • Standard rules and default I/O rules apply.
  • Loopholes forbidden (of course).
  • Please provide link with a test for your code (TIO.run, etc).
  • A clear explanation for your answer is highly recommended.
  • Built-ins that handle this conversion are not banned, however not using them is a lot more interesting.

Test cases:

Input (VLQ)                   Output (int)

[                   0x00 ] => 0
[                   0x07 ] => 7
[                   0x7f ] => 127
[             0x81, 0x00 ] => 128
[             0xC0, 0x00 ] => 8192
[             0xff, 0x7f ] => 16383
[       0x81, 0x80, 0x00 ] => 16384
[       0x86, 0xc3, 0x17 ] => 106903
[       0xbd, 0x84, 0x40 ] => 1000000
[       0xff, 0xff, 0x7f ] => 2097151 
[ 0xC0, 0x80, 0x80, 0x00 ] => 134217728  
[ 0xFF, 0xFF, 0xFF, 0x7F ] => 268435455  

Note: you are not required to use hex literals to represent a byte as your input or output. You may use decimal literal ([ 129, 128, 0 ]), integer (0x80818000) or any other reasonable byte/octet representation if better suited to your platform. Format is flexible as long as it represents 1-4 byte/octets.

Golf away!

640KB

Posted 2019-08-07T15:52:18.877

Reputation: 7 149

Is the last byte of the input guaranteed to be < 128? Or do we need to support cases such as [0x01, 0x80, 0x02] => 1? – Arnauld – 2019-08-07T16:07:56.353

Inputs are guaranteed to be valid VLQ values, so last byte will always be MSB = 0. – 640KB – 2019-08-07T16:09:03.337

5@Arnauld I see better now what you were getting at, and in retrospect the case you mention would have been good to have required. In MIDI, for example, you'll be reading a stream of data and so you don't necessarily know which byte is the last one. The way it is written, guaranteeing that the last byte in the list is the last byte in the VLQ makes the MSB irrelevant and in fact kind of defeats the purpose of VLQ. As in, you wouldn't necessarily be able to use these (valid for the challenge) routines for MIDI file decoding. Totally my bad! Should have kept this in sandbox much longer... – 640KB – 2019-08-07T17:35:30.850

5This is partly my bad as well because I think I've seen the challenge in the sandbox and didn't pay enough attention. But yes, that's what I had in mind: given a list of bytes, either output the first VLQ value or output all VLQ values. – Arnauld – 2019-08-07T17:40:19.977

I like this challenge, though it may have been more interesting if the input was an arbitrary number of bytes, and you had to return a list of values (of different length than the input). This would have tested the "variable length" aspect of the encoding. – Jonah – 2019-08-08T01:39:26.400

Can we take the input-list in reversed order? – Kevin Cruijssen – 2019-08-08T14:41:04.670

1@KevinCruijssen, a variable length quantity is supposed to be a stream of bytes, where technically you only know when it ends by reaching the MSB=0 marker. I know the rules didn't quite capture that, but by the definition of VLQ I'm going to have to say no. – 640KB – 2019-08-08T14:47:42.457

1@gwaugh Ok, that makes sense and I can understand the reasoning. I'll modify my MathGolf answer accordingly. :) – Kevin Cruijssen – 2019-08-08T14:48:25.120

do we need to handle garbage bytes before or after the VLQ ends? – qwr – 2019-08-09T19:46:19.983

so if we use a 32 bit int as input can we pad the other bytes with zeros? It's not clear to me how input is taken. From your example 0x818000 it looks like the rest is zero padded. – qwr – 2019-08-09T19:50:26.617

@qwr, I see what you mean. In the case of taking input as a 32 bit integer then my example is incorrect, it should be 0x80818000. I will update. – 640KB – 2019-08-09T20:04:07.543

Answers

11

APL (dzaima/APL), 8 bytes

128(⊣⊥|)

Try it online!

How:

128(⊣⊥|) ⍝ Anonymous function
128   |  ⍝ Input modulo 128
    ⊣⊥   ⍝ Decoded from base 128

J. Sallé

Posted 2019-08-07T15:52:18.877

Reputation: 3 233

5

Pari/GP, 24 bytes

a->fromdigits(a%128,128)

Try it online!

alephalpha

Posted 2019-08-07T15:52:18.877

Reputation: 23 988

4

Wolfram Language (Mathematica), 25 bytes

Fold[128#+#2&,#~Mod~128]&

Try it online!

Wolfram Language (Mathematica), 25 bytes

#~Mod~128~FromDigits~128&

Try it online!

Roman

Posted 2019-08-07T15:52:18.877

Reputation: 1 190

4

J, 10 bytes

128#.128|]

Try it online!

Taking inspiration from J Salle's APL answer.

  • 128|] Remainder of the input numbers divided by 128
  • 128#. Interpreted as the digits of a base 128 number

Jonah

Posted 2019-08-07T15:52:18.877

Reputation: 8 729

3

Jelly, 6 bytes

%Ø⁷ḅØ⁷

Try it online!

Equivalent to alephalpha's Pari/GP answer.

Method: Given \$n\$, output \$n \mod 128\$, converted from base \$128\$ to decimal.

Mr. Xcoder

Posted 2019-08-07T15:52:18.877

Reputation: 39 774

3

05AB1E, 6 bytes

žy%žyβ

Try it online!

ಠ_ಠ Why do both Jelly and 05AB1E have 2-byte constants for 128? Well... It's \$2^7\$. But surely those bytes could be used for something better, right?

Mr. Xcoder

Posted 2019-08-07T15:52:18.877

Reputation: 39 774

Yeah, that builtin is useless nowadays. In the legacy version I could see some use for it, only when you have a digit before it in your program. Otherwise you could just use 7o. Nowadays you can compress certain 3-byte integers (range [101,355]) in 2 bytes though, so 128 can be ƵR anyway.. I've also wondered the same thing about the 2-byte builtin for 16.. Usually you'd just use the literal, or otherwise we'd have 4o/4n/ if a digit is behind it in the program. Only when a digit is before the 16, which I don't think would happen, the builtin is useful.. – Kevin Cruijssen – 2019-08-08T07:41:00.373

3

Stax, 8 bytes

å→♂á╣>nt

Run and debug it

Algorithm:

  • Convert each byte to fixed-width 7 bit array.
  • Flatten bit arrays into one.
  • Convert bit array back to number.

recursive

Posted 2019-08-07T15:52:18.877

Reputation: 8 616

2

JavaScript (ES6), 29 bytes

-2 bytes thanks to @Shaggy

Takes input as an array of bytes.

a=>a.map(p=c=>p=p<<7|c&127)|p

Try it online!

Arnauld

Posted 2019-08-07T15:52:18.877

Reputation: 111 334

2

APL+WIN, 22 bytes

Prompts for a vector of integers:

2⊥¯32↑,0 1↓⍉(8⍴2)⊤¯4↑⎕

Try it online! Courtesy of Dyalog Classic

Explanation:

¯4↑⎕ Pad the vector to 4 integers from the right.

⍉(8⍴2)⊤ Convert to a matrix of 8 bit values.

,0 1↓ drop the MSBs and flatten to a vector.

2⊥¯32↑ pad bit vector to 32 bits starting at LSB and convert back to integer.

Graham

Posted 2019-08-07T15:52:18.877

Reputation: 3 184

2

Stax, 12 bytes

ü╫ôà¡k2Wù}a☺

Run and debug it at staxlang.xyz!

Unpacked (14 bytes) and explanation:

rk128%128i^#*+
r                 Reverse 'cuz big-endian
 k                Fold from the left using block:
  128%              Modulize by 128
      128i^#        Push 128 to the power of (one plus the iteration index)
            *       Multiply
             +      Add to the total
                  Implicit print

Stax has builtin base conversion, but it only works on strings. It almost works on lists of integers, though; the problem is in Stax's handling of 0.

A string is a list of integers. When you're using such a list as a string, any zeroes are automatically converted to 32 as a nice shorthand for spaces. Since the builtin |b for base conversion treats its operand as a string rather than as a raw list of integers, any case with a zero will fail.

10 bytes, fails on zeroes

Ç┘_A♥∙QZ►╣
{128%m128|b    Unpacked
{128%m         Modulize each by 128
      128|b    Convert from base 128

Run and debug it at staxlang.xyz!

Khuldraeseth na'Barya

Posted 2019-08-07T15:52:18.877

Reputation: 2 608

1I see where that issue came from. {:B7)m$:b packs to 8, and seems to work as well, although it's kind of an exotic use of$. – recursive – 2019-08-08T18:34:35.853

@recursive That's clever. Post it as a separate answer. – Khuldraeseth na'Barya – 2019-08-08T18:41:27.230

Done. https://codegolf.stackexchange.com/a/189554/527

– recursive – 2019-08-08T18:50:07.403

2

Python 3, 58 49 bytes

-9 bytes thanks to @Chas and @ar4093

lambda x:int("".join(bin(a|128)[3:]for a in x),2)

Try it online!

or

lambda x:int("".join(f"{a%128:07b}"for a in x),2)

Try it online!

Input via list of integers.

Python's bin function adds "0b" to the beginning of the string, so those have to be taken off before they can be concatenated. It also doesn't keep leading zeros, so if there aren't any (aka the last byte) those have to be added back in. And if there is a leading one (aka all but the last byte) that must be removed as well. Thanks to @Chas for figuring out that by always setting the first bit, I can just remove the first three characters and be done.

Apparently (according to @ar4093) the format function allows not only not having the '0b' prefix, but also removing the first bit and padding to 7 characters all at the same time.

Hiatsu

Posted 2019-08-07T15:52:18.877

Reputation: 679

Welcome to CGCC, which will make you a much worse programmer! :). I had the same thought as you at first. You can save 9 bytes with bin(a|128)[3:] as you then don't need the zfill.

– Chas Brown – 2019-08-08T04:03:32.340

As you said, with the format function you can save 9 bytes: bin(a)[2:].zfill(8)[1:] --> f"{a%128:07b}" – ar4093 – 2019-08-09T07:15:35.573

2

C (gcc), 48 bytes

Takes an integer in big-endian order as input, which is the same order as a byte array.

f(i,j){for(j=0;j|=i&127,i&128;j<<=7,i>>=8);i=j;}

Try it online!

C (gcc), 53 bytes

If a byte array is needed:

f(c,j)char*c;{for(j=0;j|=*c&127,*c++&128;j<<=7);c=j;}

Try it online!

ErikF

Posted 2019-08-07T15:52:18.877

Reputation: 2 149

When I compile your TIO testing code with -Os it only handles the last two cases. I don't know enough C to tell why. – qwr – 2019-08-09T19:49:15.480

@qwr TIO defaults to -O0, which allows you to (usually) store a return value into the first parameter. This is a quirk^Wfeature in code golfing, but doesn't work with higher optimization levels. – ErikF – 2019-08-09T21:39:25.203

You can shave off a byte by replacing &128 with >>7. – G. Sliepen – 2019-08-17T12:07:02.697

2

MathGolf, 14 bytes

ê♣%hrx♣▬^mÅε*Σ

Input as integers.

Try it online.

I have the feeling this can be shorter.. It's a bit annoying that MathGolf has a 1-byte builtin for the constant 128, but no base-conversion (except for binary/hexadecimal).

Explanation:

ê              # Take the inputs as integer-array
 ♣%            # Take modulo-128 on each value in this array
   h           # Push the length of the array (without popping the array itself)
    r          # Pop and push a list in the range [0, length)
     x         # Reverse the array to (length, 0]
      ♣▬       # Take 128 to the power of each value in this array
        ^      # Zip the two lists together to create pairs
         mÅ    # Map each pair to (using the following two commands):
           ε*  #  Reduce by multiplication (so basically the product of the pair)
             Σ # After multiplying each pair, take the total sum
               # (after which the entire stack joined together is output implicitly)

Kevin Cruijssen

Posted 2019-08-07T15:52:18.877

Reputation: 67 575

Base conversion has been on the table since day 1, but there are some other overhauls regarding base conversion that I want to address at the same time. For example, I don't want to different operators to convert to/from hexadecimal strings. – maxb – 2019-08-14T12:55:43.543

1

Japt, 10 8 bytes

Takes input as an array of integers.

muIÑ ìIÑ

Try it or run all test cases (header in both converts from input format used in challenge)

Saved 2 bytes by taking inspiration from alephalpha's solution.

muIÑ ìIÑ     :Implicit input of integer array
m            :Map
 u           :  Modulo
  I          :    64
   Ñ         :    Multiply by 2
     ì       :Convert to decimal
      IÑ     :  From base 64*2

Shaggy

Posted 2019-08-07T15:52:18.877

Reputation: 24 623

1

PHP, 42 bytes

foreach($argv as$a)$r=$r<<7|$a&127;echo$r;

Try it online! and verify all test cases.

Input via command line args, output to STDOUT.

640KB

Posted 2019-08-07T15:52:18.877

Reputation: 7 149

1

Charcoal, 11 bytes

I↨﹪θ¹²⁸¦¹²⁸

Try it online! Link is to verbose version of code. Takes input as an array. Explanation:

   θ        Input array
  ﹪ ¹²⁸    Elementwise modulo 128
 ↨      ¹²⁸ Convert from base 128
I          Cast to string for implicit print

Neil

Posted 2019-08-07T15:52:18.877

Reputation: 95 035

1

Python 2, 42 bytes

f=lambda a:a>[]and a[-1]%128+f(a[:-1])*128

Try it online!

Chas Brown

Posted 2019-08-07T15:52:18.877

Reputation: 8 959

1

Windows Batch, 76 bytes

:a
@set/ax="%1&127",y=y*128+x
@if %2. neq . @shift&goto:a
@echo %y%

Pass parameters prefixed with "0x" and space between (e.g. 0xC0 0x80 0x80 0x00).

peter ferrie

Posted 2019-08-07T15:52:18.877

Reputation: 804

Nicely done. Obviously for anyone else testing it, you'll need to make sure to an do a @set y=,ax= between runs. – 640KB – 2019-08-16T19:27:34.847

1just the "set y=" is enough. – peter ferrie – 2019-08-16T19:30:10.693