Move to the printable ASCII front

19

1

Background

The move-to-front transform (MTF) is a data encoding algorithm designed to improve the performance of entropy encoding techniques.

In the bzip2 compression algorithm, it is applied after the Burrows–Wheeler transform (as seen in Burrows, Wheeler and Back), with the objective of turning groups of repeated characters into small, easily compressible non-negative integers.

Definition

For the purpose of this challenge, we'll define the printable ASCII version of the MTF as follows:

Given an input string s, take an empty array r, the string d of all printable ASCII characters (0x20 to 0x7E) and repeat the following for each character c of s:

  1. Append the index of c in d to r.

  2. Move c to the front of d, i.e., remove c from d and prepend it to the remainder.

Finally, we take the elements of r as indexes in the original d and fetch the corresponding characters.

Step-by-step example

INPUT: "CODEGOLF"

0. s = "CODEGOLF"
   d = " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = []
1. s = "ODEGOLF"
   d = "C !\"#$%&'()*+,-./0123456789:;<=>?@ABDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35]
2. s = "DEGOLF"
   d = "OC !\"#$%&'()*+,-./0123456789:;<=>?@ABDEFGHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47]
3. s = "EGOLF"
   d = "DOC !\"#$%&'()*+,-./0123456789:;<=>?@ABEFGHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37]
4. s = "GOLF"
   d = "EDOC !\"#$%&'()*+,-./0123456789:;<=>?@ABFGHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37 38]
5. s = "OLF"
   d = "GEDOC !\"#$%&'()*+,-./0123456789:;<=>?@ABFHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37 38 40]
6. s = "LF"
   d = "OGEDC !\"#$%&'()*+,-./0123456789:;<=>?@ABFHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37 38 40 3]
7. s = "F"
   d = "LOGEDC !\"#$%&'()*+,-./0123456789:;<=>?@ABFHIJKMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37 38 40 3 45]
8. s = ""
   d = "FLOGEDC !\"#$%&'()*+,-./0123456789:;<=>?@ABHIJKMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
   r = [35 47 37 38 40 3 45 41]

OUTPUT: "COEFH#MI"

Task

Write a program or function that implements the printable ASCII MTF (as defined above).

Test cases

Input:  Programming Puzzles & Code Golf
Output: Prpi"do lp%((uz rnu&3!P/o&$U$(p

Input:  NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN BATMAN!
Output: Na! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !!"DDUP"%'

Input:  Two more questions and I have bzip2 in less than 100 bytes!
Output: Twp#o"si$sv#uvq(u$(l#o#W!r%w+$pz,xF%#,"x(. #0--'$GG ".z(**:

Additional rules

  • You cannot use any built-in operator that computes the MTF of a string.

  • Your code may print a trailing newline if you choose STDOUT for output.

  • Your code has to work for any input of 1000 or less printable ASCII characters (0x20 to 0x7E).

  • Standard code golf rules apply. The shortest submission in bytes wins.

Dennis

Posted 2015-07-03T22:06:18.887

Reputation: 196 637

1"Nanananana DDUP!" just isn't as catchy as "Batman!"... – Doorknob – 2015-07-03T22:12:34.527

8@Doorknob: But Batman isn't easily compressible. – Dennis – 2015-07-03T22:13:56.460

Can we output the result in a function return instead of printing it to STDOUT? – Fatalize – 2015-07-03T22:39:54.453

@Fatalize: That's the most natural form of output for functions, so yes. By the way, we have defaults for I/O, so unless the question explicitly says otherwise, that's always allowed.

– Dennis – 2015-07-03T22:43:09.760

Answers

6

CJam, 20

'¡,q{_C#c' ,C+@|}fC;

Try it online

Explanation:

'¡,      make a string of characters with codes from 0 to 160 (a modified "d")
         could have been to 126 but stackexchange doesn't like the DEL character
q        read the input (s)
{…}fC    for each character C in s
  _      duplicate the d string
  C#     find the index of C in d
  c      convert to character (this is the result)
  ' ,    make a string of characters from 0 to 31
  C+     append C to the string
  @      bring d to the top
  |      set union, preserving order; effectively, C is moved to position 32
         this is the updated d string
;        pop the last d

aditsu quit because SE is EVIL

Posted 2015-07-03T22:06:18.887

Reputation: 22 326

6

Ostrich, 46 45 chars

Don't have a version number in the header because this is actually just the latest commit. I added the O (ascii code to string) operator after releasing the latest version (but still before this challenge was posted).

{a95,{32+O}%:d3@{:x\.3@?3@\+\x-x\+}/;{d=}%s*}

Explanation:

a             this is the "r" array (a is short for [], empty array)
95,{32+O}%:d  this is the "d" array
3@{...}/      for each character in the input (as an "argument")...
  :x            store in variable x (stack is now [r d c])
  \.3@?         find index in d     (stack is now [r d idx])
  3@\+          append index to r   (stack is now [d modified_r])
  \x-           remove char from d, and then...
  x\+           prepend char to d   (stack is now [modified_r modified_d])
;             throw away modified_d
{d=}%         map r to indices of (original) d
s*            join (s is short for ``, empty string)

Doorknob

Posted 2015-07-03T22:06:18.887

Reputation: 68 138

I'm wondering if PPCG is turning from "code this task in the most conscise way possible in your favourite language" to "design your own programming language to solve the typical code golf task shorter than golfscript" – John Dvorak – 2015-07-05T12:30:54.137

1@AlexA. ... wait, huh, it's spelled that way? my entire life has been a lie – Doorknob – 2015-07-06T00:41:35.103

@JanDvorak Ostrich is almost identical to GolfScript. Only real reason I created it is because a.) GolfScript annoyingly does not have a REPL and b.) there are a few missing operators/features (floating point, I/O, etc). And language design is fun anyway! – Doorknob – 2015-07-06T00:43:15.183

3

Python 3, 88

*d,=range(127)
for c in input():y=d.index(ord(c));d[:32]+=d.pop(y),;print(chr(y),end='')

Using some ideas from my CJam solution.
-4 bytes belong to Sp3000 :)

aditsu quit because SE is EVIL

Posted 2015-07-03T22:06:18.887

Reputation: 22 326

2

SWI-Prolog, 239 197 189 bytes

a(S):-l([126],X),a(S,X,[],R),b(R,X).
a([A|T],X,S,R):-nth0(I,X,A,Z),(a(T,[A|Z],[I|S],R);R=[I|S]).
b([A|T],X):-(b(T,X);!),nth0(A,X,E),put(E).
l([B|R],Z):-A is B-1,X=[A,B|R],(A=32,Z=X;l(X,Z)).

Example: a(`Two more questions and I have bzip2 in less than 100 bytes!`). outputs:

Twp#o"si$sv#uvq(u$(l#o#W!r%w+$pz,xF%#,"x(. #0--'$GG ".z(**:

(and true . after it, obviously)

Note: your SWI-Prolog version has to be one of the newer ones in which the backquote ` represents codes strings. Code strings used to be represented with double-quotes " in older versions.

Fatalize

Posted 2015-07-03T22:06:18.887

Reputation: 32 976

2

Python 2, 137 110 104

Wasn't hard to implement, but maybe still golfable?

Try it here

e=d=map(chr,range(32,127))
r=""
for c in raw_input():n=e.index(c);r+=d[n];e=[e[n]]+e[:n]+e[n+1:]
print r

mbomb007

Posted 2015-07-03T22:06:18.887

Reputation: 21 944

1I think you're better off doing a list map e=d=map(chr,range(32,127)) in Python 2, though you have to tweak the e to handle a list. – xnor – 2015-07-04T09:37:44.737

@xnor Thanks. I also tried using e=[e.pop(n)]+e, but it doesn't work. Why is that? – mbomb007 – 2015-07-06T17:25:18.353

You've got e=d=, so when you pop from e you're also popping from d. Try d=e[:]. – Sp3000 – 2015-07-06T17:47:20.697

1But at this point it's probably better to just do n=e.index(ord(c));r+=chr(n+32); and drop d – Sp3000 – 2015-07-06T17:51:37.697

1

Pyth, 24 bytes

JK>95CM127s@LKxL~J+d-Jdz

Demonstration. Test Harness.

The first bit. JK>95CM127 sets up the necessary list and saves it to J and K. ~J+d-Jd performs the list updating, while xL ... z maps the input characters to their positions in the list. Finally, s@LK converts those indexes to characters in the original list.

isaacg

Posted 2015-07-03T22:06:18.887

Reputation: 39 268

1

Haskell, 120 bytes

e#s=[b|(b,a)<-zip[0..]s,a==e]!!0
a=[' '..'~']
f=snd.foldl(\(d,r)e->(e:take(e#d)d++tail(drop(e#d)d),r++[a!!(e#d)]))(a,[])

Usage example: f "CODEGOLF" -> "COEFH#MI"

How it works: # is an index function that returns the position of e in s (can't use Haskell's native elemIndex because of an expensive import). The main function f follows a fold pattern where it updates the position string d and result string r as it walks through the input string.

nimi

Posted 2015-07-03T22:06:18.887

Reputation: 34 639