Permutation Encoding

7

2

Write a program which can encode text to avoid reusing characters, and convert back.

Both normal and encoded forms are restricted to a particular character set: the space character with code point 32, the tilde character ~ with code point 126, and all characters between. This is 95 total characters. It's a printable subset of ASCII.

The encoded text can be up to 95 characters long. The normal text can be up to 75 characters long, because above 75 it would be impossible to uniquely encode all messages.

Encoding

There are some requirements for the encoding:

  • Uses only characters from the character set described above
  • Uses each character at most once
  • Uniquely encodes all valid messages

Other than that, you're free to pick any variant.

Input

Input will consist of 2 items:

  • A boolean: false to encode, true to decode
  • A string: the message to be encoded/decoded

How you receive the input is up to you.

The boolean can be as a boolean or integer in the range [-2, 2] in your language's format. If the esoteric language doesn't have booleans or integers, use the closest analogue. I'm not doing the "or any 2 distinct values" because then you could use an encoding and decoding program as the values and evaluate them.

Output

Output should consist of only the encoded/decoded string. Leading and trailing newlines are okay.

Scoring

Your score is the length of your program, shorter is better.

I still encourage you to try harder languages even if they won't win.

Testing

Testing will generally be done with Try it Online!. For convenience, please provide a link to your program with some test input already filled in.

These are the steps to test a program:

  • Pick a valid normal string.
  • Encode it using the program.
  • Check that the encoded form is valid.
  • Decode it using the program.
  • Check that the final result matches the original string.

This can be repeated with other test strings for confidence.

If you show a working case and no failing cases have been found, we will assume your program works.

Reference Implementation

I am providing a reference implementation that shows one possible method, in hopes of making this problem more approachable. I used it to generate the examples. I recommend you try the problem yourself before looking at how I did it, that way you may independently come up with a better mapping.

Examples

Since you can choose a different encoding, as well as different input/output methods, your program might not match these examples.

Empty

Input:

0

Output:

(none)

Input:

1

Output:

(none)

Small

Input:

0
Hello, world!

Output:

 #mvcD_YnEeX5?

Input:

1
 #mvcD_YnEeX5?

Output:

Hello, world!

Max Size

Input:

0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Output:

C-,y=6Js?3TOISp+975zk}.cwV|{iDjKmd[:/]$Q1~G8vAneoh #&<LYMPNZB_x;2l*r^(4E'tbU@q>a!\WHuRFg0"f%X`)

Input:

1
C-,y=6Js?3TOISp+975zk}.cwV|{iDjKmd[:/]$Q1~G8vAneoh #&<LYMPNZB_x;2l*r^(4E'tbU@q>a!\WHuRFg0"f%X`)

Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Extras

How did I get the 75 characters limit? I could have brute force checked it of course, but there's a slightly more mathematical method for counting the number of possible encoded messages for a given character set:

$$ F(n)=\sum_{k=0}^{n}{\frac{n!}{(n-k)!}}=e\Gamma(n+1,1)\approx en! $$

Here are the relevant bounds. There's plenty more valid encoded messages than normal messages, but still enough normal messages that you'll need a variable length encoding.

$$ F(95)\approx 2.81\times 10^{148} \\ {95}^{75}\approx 2.13\times 10^{148} \\ 95!\approx 1.03\times 10^{148} $$

The problem was inspired by a phenomenon on the Discord chat platform. In Discord, you can "react" to chat messages, which adds a little emoji underneath. You can react many times on a single chat message. Each emoji can only be added once. They remember the order they were added and display oldest to newest, left to right. You can add some non-emojis too. People sometimes write short messages in reactions, and when their message has repeat characters they're forced to get creative, usually they opt for lookalike characters.

EPICI

Posted 2018-09-03T02:17:07.193

Reputation: 379

1

Welcome to PPCG! Great first challenge! Here's a link to the Tour if you haven't seen it, and the Sandbox where you can get challenges reviewed before posting.

– pizzapants184 – 2018-09-03T10:09:19.780

Do we really need to use TIO? If so, then it sounds very restricting. – Erik the Outgolfer – 2018-09-03T15:30:28.470

1I ask for a link to some online compiler/interpreter since it makes it convenient for others to test. It doesn't have to be TIO. If you don't link at all, it will be frustrating trying to test your submission, it's hard to tell if the submission doesn't work or we're just running it wrong. – EPICI – 2018-09-03T16:31:36.680

Answers

3

Python 3, 339 338 bytes

lambda b,s:b and G(f(s))or F(g(s))
a=''.join(chr(32+i)for i in range(95))
f=lambda s:95*f(s[1:])+ord(s[0])-31if s else 0
F=lambda n:chr((n-1)%95+32)+F((n-1)//95)if n else ''
g=lambda s,d=a:1+d.find(s[0])+len(d)*g(s[1:],d.replace(s[0],''))if s else 0
G=lambda n,d=a:d[(n-1)%len(d)]+G((n-1)//len(d),d.replace(d[(n-1)%len(d)],''))if n else''

Try it online!

Same length, uses a list instead of a string for the ASCII chars Try it online!.

Overview: WIP

  • Because the number of printable ASCII strings with 75 or fewer characters is 21570794844237986466892014672640809417999612393958961235356099871415520891159490775110725336452276983325050565474469409911975201387750975629116626495, which is less than the number of valid encoded strings (28079792812953073919740255779032193344167751681188124122118245935431845577725060598682278885127314791932249387894125026034584206856782082066829102875)

    • All we need to do is theoretically sort the valid encoded strings and the valid non-encoded strings, take the index in the correct list of our given input, then return the string at that index in the other list.
  • f: Converts an input (non-encoded) string to an integer based on its index in the theoretical list of ASCII strings sorted by length and then lexicographically.

    • The empty string is 0

    • Single-char strings are 1 to 95

    • Two-char strings are 96 to 9120, etc.

  • F: Undoes f, converts an integer to a (non-encoded) string.

  • g: Converts an input (encoded) string to an integer (<= 28079792812953073919740255779032193344167751681188124122118245935431845577725060598682278885127314791932249387894125026034584206856782082066829102875 for valid inputs) based on its index in the theoretical list of valid encoded strings sorted by length and then lexicographically.

    • The empty string is 0

    • Single-char strings are 1 to 95

    • Two-char strings are 96 to 9025, etc.

  • G: undoes g, converts an integer to an (encoded) string.

  • Main function: Unnamed lambda b,s:b and G(f(s))or F(g(s)): Uses the boolean b to determine whether to encode G(f(s)) or decode F(g(s)) the string, and does so.

pizzapants184

Posted 2018-09-03T02:17:07.193

Reputation: 3 174

2

Charcoal, 94 bytes

≔⁰ζ¿N«≔γδ≔¹εFS«≧⁺×ε⊕⌕γιζ≧×Lγε≔Φγ¬⁼κιγ»W櫧δ⊖ζ≔÷⊖ζ⁹⁵ζ»»«F⮌S≔⁺×⁹⁵ζ⊕⌕γιζWζ«≦⊖ζ§γζ≔Φγ¬⁼κ§γζγ≧÷⊕Lγζ

Try it online! Link is to verbose version of code. Explanation:

≔⁰ζ

Set z, which represents the permutation index, to zero, as both sides of the condition need it.

¿N«

If the first input (as a number) is non-zero, then we're doing encoding.

≔γδ

Save a copy of the predefined ASCII character set g in d, as we need to change it.

≔¹ε

Assign 1 to e. This represents the place value of each encoded character.

FS«

Loop over the encoded characters.

≧⁺×ε⊕⌕γιζ

Update the permutation index.

≧×Lγε

Update e with the place value of the next encoded character.

≔Φγ¬⁼κιγ»

Remove the character from the encoding set.

Wζ«

Repeat while the permutation number is non-zero.

§δ⊖ζ

Print the next decoded character.

≔÷⊖ζ⁹⁵ζ»»

Divide the bijective representation by 95. This completes decoding.

«F⮌S

Loop over the plain text in reverse, to avoid having to calculate the powers of 95.

≔⁺×⁹⁵ζ⊕⌕γιζ

Convert the plain text as bijective base 95 to the permutation index.

Wζ«

Repeat while the permutation index is non-zero.

≦⊖ζ

Decrement the permutation index.

§γζ

Print the next encoded character.

≔Φγ¬⁼κ§γζγ

Remove that character from the encoding set.

≧÷⊕Lγζ

Divide the permutation index by the previous length of the encoding set.

Neil

Posted 2018-09-03T02:17:07.193

Reputation: 95 035

1

Haskell, 346 472 bytes

After fixing of the bug noticed by Anders Kaseorg the program unfortunately grew larger.

(x:y)?f|f x=(0,x)|0<1=(p+1,q)where(p,q)=y?f
a!n=take n a++drop(n+1)a
x#f=fst$f?(x==)
a x y z=y!!(q z-1)+b z s x
b[]_ _=0
b(x:y)f c=u(x#f)+v f*(b y(c f x)c)
c=const
e=divMod
f n|n<2=1|1<2=n*f(n-1)
g y z x=h(x-r)l s y where(l,r)=z?(x<)
h n l f c|l<1=[]|0<1=f!!w m:h d(l-1)(c f m)c where(d,m)=n`e`v f
i=r(k^)
k=95
o=r(\x->f k`div`f(k-x))
q=length
r f=scanl(+)0$map f[1..k]
s=[' '..'~']
u=toInteger
w=fromInteger
v=u.q
x=g(\a->(\b->a!w b))o.a c i
y=g c i.a(\a->(\b->a!(b#a)))o

Try it online!

Use x "Hello World" to encode and y "TR_z" to decode.

How?

We can enumerate all possible input sequences and all possible encoded sequences, so the encoding is rather simple:

1) determine the number for the input sequence; 2) construct the output sequence corresponding to that number.

(The decoding is done in the reverse order).

Explanation

This is an original (pre-golf) version for ease of understanding.

-- for ord
import Data.Char

-- all possible characters
s = [' ' .. '~']

-- get the first position in a list where the predicate returns True
(x:y) `firstPosWhere` fn
    | fn x = 0
    | 0<1 = 1 + firstPosWhere y fn

-- the same as above but return both the position and the list element
(x:y) `firstPosAndElemWhere` fn
    | fn x = (0, x)
    | 0 < 1 = (p + 1, q) where (p, q) = firstPosAndElemWhere y fn

-- the sets of numbers corresponding to input strings and to encoded strings are divided into ranges
inputRanges = scanl (+) 0 [95 ^ x | x <- [1..95]]
outputRanges = scanl (+) 0 [fac 95 `div` fac (95 - x)  | x <- [1..95]]

-- convert an input string into a number
toN s = inputRanges !! (length s - 1) + toN' s
toN' [] =0
toN' (x:y) = toInteger (ord x - 32) + 95 * toN' y

-- convert a number into an (unencoded) string
fromN n = fromN' (n - r) l where
    (l, r) = inputRanges `firstPosAndElemWhere` (n <)
    fromN' n l
        | l < 1 = []
        | 0 < 1 = s !! fromInteger m : fromN' d (l - 1) where
            (d, m) = n `divMod` 95

-- factorial
fac n|n<2=1|1<2=n*fac(n-1)

-- remove nth element from a list
a `without` n = take n a ++ drop (n + 1) a

-- construct an encoded string corresponding to a number
encode n = encode' (n - r) l s where
    (l, r) = outputRanges `firstPosAndElemWhere` (n <)
    encode' n l f
        | l < 1 = []
        | otherwise = (f !! m') : encode' d (l - 1) (f `without` m') where
            (d, m) = n `divMod` (toInteger $ length f)
            m' = fromInteger m

-- convert an encoded string back into number
decode z = r + decode' z s  where
    r = outputRanges !! (length z - 1)
    decode' [] _ = 0
    decode' (x:y) f = toInteger p + (toInteger $ length f) * (decode' y (f `without` p)) where
        p = f `firstPosWhere` (x ==)

Max Yekhlakov

Posted 2018-09-03T02:17:07.193

Reputation: 601

1('~' <$ [1..75]) ## 0 gives a division by zero exception. (The string of ~s in your tests is only 65 long, and the string you label as “Longer Than 75” isn’t.) – Anders Kaseorg – 2018-09-08T01:26:41.810

0

Clean, 182 bytes

import StdEnv,Data.List
$b s=hd[v\\(u,v)<-if(b)id flip zip2(iterate?[])[[]:[p\\i<-inits[' '..'~'],t<-tails i,p<-permutations t|p>[]]]|u==s]
?[h:t]|h<'~'=[inc h:t]=[' ': ?t]
?[]=[' ']

Try it online!

Enumerates every valid encoding and every valid message, returning one or the other depending on the order of the arguments zip2 receives. Explained:

$ b s                         // function $ of flag `b` and string `s` is
 = hd [                       // the first element in the list of
  v                           // the value `v`
  \\ (u, v) <-                // for every pair of `u` and `v` from
   if(b)                      // if `b` is true
    id                        // do nothing
    flip                      // otherwise, flip the arguments to
   zip2                       // a function which creates pairs applied to
    (iterate a [])            // repeated application of ? to []
    [[]: [                    // prepend [] to
     p                        // the value `p`
     \\ i <- inits [' '..'~'] // for every prefix `i` in the character set
     , t <- tails i           // for every suffix `t` of `i`
     , p <- permutations t    // for every permutation of the characters in `t`
     | p > []                 // where `p` isn't empty
    ]]
   | u == s                   // where the second element `u` equals `s`
  ]
? [h: t]                      // function ? of a list
 | h < '~'                    // if the first element isn't a tilde
  = [inc h: t]                // increment it and return the list
 = [' ': ? t]                 // otherwise change it to a space
? []                          // function ? when the list is empty
 = [' ']                      // return a singleton list with a space

It's really, really slow! Faster (in practice) version below (but with similar time complexity):

import StdEnv,Data.List,Data.Func,StdStrictLists,StdOverloadedList

increment :: [Char] -> [Char]
increment []
    = [' ']
increment ['~':t]
    = [' ':increment t]
increment [h:t]
    = [inc h:t]

chars =: hyperstrict [' '..'~']

? :: *Bool [Char] -> [Char]
? b s
    = let
        i :: *[[Char]]
        i = inits (chars)
        t :: *[[Char]]
        t = filter (not o isEmpty) (concatMap tails i)
        p :: *[[Char]]
        p = concatMap permutations t
    in (if(b) encode decode) [] [[]:p]
where
    encode :: [Char] *[[Char]] -> [Char]
    encode d l
        | d == s
            = Hd l
        | otherwise
            = encode (increment d) (Tl l)
    decode :: [Char] *[[Char]] -> [Char]
    decode d l
        | Hd l == s
            = d
        | otherwise
            = decode (increment d) (Tl l)

Try it online!

Οurous

Posted 2018-09-03T02:17:07.193

Reputation: 7 916