15
2
Background
The Burrows–Wheeler transform (BWT) is a reversible permutation of the characters of a string that results in large runs of similar characters for certain types of strings such as plain text. It is used, for example, in the bzip2 compression algorithm.
The BWT is defined as follows:
Given an input string such as codegolf
, compute all possible rotations of it and sort them in lexicographical order:
codegolf
degolfco
egolfcod
fcodegol
golfcode
lfcodego
odegolfc
olfcodeg
The BWT of the string codegolf
is the string consisting of the last character of each string in that order, i.e., the last column of the block above. For codegolf
, this yields fodleocg
.
By itself, this transformation isn't reversible, since the strings codegolf
and golfcode
result in the same string. However, if we know that the string ends with an f
, there is only one possible preimage.
Task
Implement an involutive program or function that reads a single string from STDIN or as command-line or function argument and prints or returns either the BWT or its inverse of the input string.
If the input string contains no spaces, your submission should append a single space to the input and compute the BWT.
If the input string already contains a space, it should compute the preimage of the BWT that has a trailing space and strip that space.
Examples
INPUT: ProgrammingPuzzles&CodeGolf
OUTPUT: fs&e grodllnomzomaiCrGgPePzu
INPUT: fs&e grodllnomzomaiCrGgPePzu
OUTPUT: ProgrammingPuzzles&CodeGolf
INPUT: bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
OUTPUT: <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
INPUT: <#Q6(LFksq*MD"=L0<f^*@I^;_6nknNp;pWPBc@<A^[JZ?\B{qKc1u%wq1dU%;2)?*nl+U(yvuwZl"KIl*mm5:dJi{\)8YewB+RM|4o7#9t(<~;^IzAmRL\{TVH<bb]{oV4mNh@|VCT6X)@I/Bc\!#YKZDl18WDIvXnzL2Jcz]PaWux[,4X-wk/Z`J<,/enkm%HC*44yQ,#%5mt2t`1p^0;y]gr~W1hrl|yI=zl2PKU~2~#Df"}>%Io$9^{G_:\[)v<viQqwAU--A#ka:b5X@<2!^=R`\zV7H\217hML:eiD2ECETxUG}{m2:$r'@aiT5$dzZ-4n)LQ+x7#<>xW)6yWny)_zD1*f @F_Yp,6!ei}%g"&{A]H|e/G\#Pxn/(}Ag`2x^1d>5#8]yP>/?e51#hv%;[NJ"X@fz8C=|XHeYyQY=77LOrK3i5b39s@T*V6u)v%gf2=bNJi~m5d4YJZ%jbc!<f5Au4J44hP/(_SLH<LZ^%4TH8:R
OUTPUT: bt4{2UK<({ZyJ>LqQQDL6!d,@:~L"#Da\6%EYp%y_{ed2GNmF"1<PkB3tFbyk@u0#^UZ<52-@bw@n%m5xge2w0HeoM#4zaT:OrI1I<|f#jy`V9tGZA5su*b7X:Xn%L|9MX@\2W_NwQ^)2Yc*1b7W<^iY2i2Kr[mB;,c>^}Z]>kT6_c(4}hIJAR~x^HW?l1+^5\VW'\)`h{6:TZ)^#lJyH|J2Jzn=V6cyp&eXo4]el1W`AQpHCCYpc;5Tu@$[P?)_a?-RV82[):[@94{*#!;m8k"LXT~5EYyD<z=n`Gfn/;%}did\fw+/AzVuz]7^N%vm1lJ)PK*-]H~I5ixZ1*Cn]k%dxiQ!UR48<U/fbT\P(!z5l<AefL=q"mx_%C:2=w3rrIL|nghm1i\;Ho7q+44D<74y/l/A)-R5zJx@(h8~KK1H6v/{N8nB)vPgI$\WI;%,DY<#fz>is"eB(/gvvP{7q*$M4@U,AhX=JmZ}L^%*uv=#L#S|4D#<
INPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
INPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OUTPUT: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Additional rules
You cannot use any built-in operators that compute the BWT (or its inverse) of a string.
You cannot use any built-in operators that invert functions.
Your code may print a trailing newline if you choose STDOUT for output, even if it doesn't support trailing newlines in the input.
Your code has to work for any input of 500 or less printable ASCII characters (0x20 to 0x7E), including at most one space.
For any of the possible inputs described above, your code must finish in under ten minutes on my machine (Intel Core i7-3770, 16 GiB RAM). The last test case should be the slowest, so make sure you time your code with that one.
For most languages and approaches, it should be easy to comply with the time limit. This rule is solely meant to prevent implementing either transform as a brute-force inverse of the other.
Standard code golf rules apply. The shortest submission in bytes wins.
"You cannot use any built-in operators that invert functions." I'd be very surprised if such a thing existed! – Hugh Allen – 2015-06-07T15:01:37.050
4@HughAllen J has
inv
. – Dennis – 2015-06-07T15:02:58.653It might work for trivial or builtin functions, but how could it work for general user-defined functions? Is this "inv" documented somewhere? – Hugh Allen – 2015-06-07T15:09:24.783
@HughAllen: I don't really know J, but here is an example of
– Dennis – 2015-06-07T15:16:54.723inv
working on a user-defined function. I'm not sure ifinv
would work for the BWT, but better safe than sorry.Related, see http://codegolf.stackexchange.com/questions/4771/text-compression-and-decompression-nevermore/4779#4779 for a full bzip2.
– Keith Randall – 2015-06-08T02:42:24.500@KeithRandall: bzip2 in under 300 bytes. Nice! – Dennis – 2015-06-08T02:54:45.613