OOP: Overlapping Oriented Programming

32

4

One of the lesser known programming paradigms which seems rather fitting for code golfing is Overlapping Oriented Programming (OOP) *. When writing partly identical code, many bytes can be saved by simply overlapping the identical parts and remembering in some way where the two original code lines start. Your task is to write two overlapping programs or functions compress and decompress with the following specification:

*Do not use in production code, probably.

compress

compress takes two strings in any convenient format and overlaps them as much as possible. That is a string s with minimal length is returned such that both input strings are substrings of s. Additionally, some output which identifies the start and end indices of both string is returned.

Examples: (The exact IO-format is up to you)

compress("abcd", "deab") -> "deabcd" ((2,5),(0,3))
compress("abcd", "bc")   -> "abcd" ((0,3),(1,2))
compress("abc", "def")   -> "abcdef" ((0,2),(3,5)) or "defabc" ((3,5),(0,2))

decompress

decompress computes the inverse function of compress, that is given a string and two start and end indices (in the format in which they are returned by your compress), return the two original strings. You need only handle valid inputs. The following equality should hold for all strings s1, s2:

(s1, s2) == decompress (compress (s1, s2))

Examples: (inverses of compress examples)

decompress "deabcd" ((2,5),(0,3)) -> "abcd" "deab" 
decompress "abcd" ((0,3),(1,2))   -> "abcd" "bc"

decompress "abcdef" ((0,2),(3,5)) -> "abc" "def"   
 or (whichever version your "compress" generates)
decompress "defabc" ((3,5),(0,2)) -> "abc" "def"

Scoring

Your score is the size of the string returned by calling compress("<code of compress>", "<code of decompress>"). As this is a lower score is better.

Example:

Assume the code for your function compress is c=abcd and the code for decompress is d=efghi. Then compress("c=abcd", "d=efghi") yields "c=abcd=efghi" (and the indices, but those don't affect scoring) so the score is length "c=abcd=efghi" = 12.

Additional Rules

  • In the spirit of this challenge, your compress and decompress must overlap in at least one character. You may achieve this trivially by adding a comment, but note that doing so will increase your score and there might be shorter solutions using inherently overlapping code.
  • compress and decompress should be able to handle strings containing any printable ASCII characters as well as all characters you used to define compress and decompress.
  • The indices can be zero- or one-indexed.
  • Your programs or functions don't have to actually be named compress and decompress.

Laikoni

Posted 2017-01-25T12:08:49.730

Reputation: 23 676

Can you use different command line arguments to run you compress and decompress code? – MildlyMilquetoast – 2017-01-25T17:01:32.767

Sure. You have to give two programs and the site policy allows command line arguments as long as they are counted, so you can give different command line arguments for each of your programs. – Laikoni – 2017-01-25T17:24:44.913

Answers

25

GNU Prolog, 105 points

s(U,L/M,C):-prefix(A,C),length(A,M),suffix(U,A),length(U,L).
o(A-B,C-X-Y):-length(C,_),s(A,X,C),s(B,Y,C).

(This requires GNU Prolog because prefix and suffix are not portable.)

Prolog has one interesting, major advantage for this challenge; you can write a function to handle multiple call patterns (i.e. not only can you give the function an input to get a corresponding output, you can give the function an output to get the corresponding input). As such, we can define a function that can handle both compression and decompression, leading to a 105-byte submission that defines a function o that works both ways round. (Incidentally, I mostly wrote it as a decompressor – as it's simpler – and got the compressor "for free". ) In general, we could expect a very short program in Prolog for this task, if not for the fact that it's so bad at string handling (both in terms of missing primitives, and in terms of the primitives in question having terribly long names).

The first argument to o is a tuple of strings, e.g. "abcd"-"deab". The second argument has a form like "deabcd"-4/6-4/4; this is a fairly standard nested tuple, and means that the string is "deabcd", the first string has length 4 and ends at the sixth character, the second string has length 4 and ends at the fourth character. (Note that a string in GNU Prolog is just a list of character codes, which makes debugging annoying because the implementation prefers the latter interpretation by default.) If you give o one argument, it'll output the other for you (thus working as a compressor if you give the first argument, and a decompressor if you give the second). If you give it both arguments, it'll verify that the compressed representation matches the strings given. If you give it zero arguments, it'll generate output like this:

| ?- o(X,Y).
X = []-[]
Y = []-0/0-0/0 ? ;

X = []-[]
Y = [_]-0/0-0/0 ? ;

X = []-[A]
Y = [A]-0/0-1/1 ? ;

…many lines later…

X = [A]-[B,A,C]
Y = [B,A,C]-1/2-3/3 ? ;

The above description of the I/O format is pretty much entirely just a description of the program, too; there's not very much in the program at all. The only subtlety is to do with evaluation order hints; we need to ensure that the program doesn't just use a search strategy that's guaranteed to terminate, but also produces the shortest possible output string.

When compressing, we start with length(C,_) ("C has a length"), which is a trick that I've used in many Prolog and Brachylog answers; if this is the first thing Prolog sees, it'll cause it to prioritise reducing the length of C over anything else. This ensures that we have a minimum-length C. The ordering of the constraints in s is carefully chosen so that the search will take a finite time for each possible candidate length of C; A is constrained by C (we don't know C, but we do know the target value we have for its length), M by A, U by A, and L by U, so none of the searches can take unlimited time.

When decompressing, we're given C directly by the user. This again ensures the program will run in finite time, due to the same sequence of constraints. (People who are aware of Prolog's evaluation order will note that the definition of s is very inefficient when decompressing; placing length(A,M) and length(U,L) first would be faster, but moving length(A,M) to the start could cause an infinite loop when compressing because neither A nor M is bound by anything at the time.)

user62131

Posted 2017-01-25T12:08:49.730

Reputation:

13

Brachylog, 50 46 bytes

{Ċ∧Lċ₂l∧Lgj:?z{tT∧?h~cṪhlI∧ṪbhTl:I+-₁:I↔}ᵐ:L}

Try it online!

Decompress:

~{Ċ∧Lċ₂l∧Lgj:?z{tT∧?h~cṪhlI∧ṪbhTl:I+-₁:I↔}ᵐ:L}

Try it online!

Saved 5 bytes thanks to @ais523

Explanation

The good side of declarative languages is that we can reuse the same code for both compressing and decompressing. As such, the code for compress is exactly the same as for decompress, with an additional ~ at the beginning.

This ~ tells Brachylog to reverse the order of arguments (that is, use the Input as output and the Output as input). Since compress has no ~, it actually runs the predicate in the standard order. Since decompress has only one, it runs it with the Input as output and the Output as input.

That way, we can use the same code (modulo that extra ~) to both compress and decompress: compressing is providing the two strings as input and a variable as output, and decompressing is providing the indexes and compressed string as output and a variable as Input.

Obviously this also means that we have to be a bit explicit about our compressing code, so that the interpreter is capable of running it "backwards". This is why the compressor itself is a bit long.

Here is a breakdown of the code for Compress (and thus also of decompress):

{……………………………………………………………………}   Call that predicate the normal way (with swapped arguments
                                 for decompress)
   Ċ                           Input has two elements
   ∧Lċ₂l                       L is a string of any length (measuring its length forces it to
                                 take a specific length from 0 to +inf)
   ∧Lgj                        The list [L,L]
       :?z                     The list [[L, First elem of Input],[L,second elem of input]]
          {………………………………}ᵐ:L    Final output is the [M,L] where M is the result of mapping
                                 the predicate below on both elements of the zip
           tT                  The second element of the input is T
           ∧?h~cṪ              Anticoncatenate the first element of the input into [A,B,C]
           hlI                 I = length(A)
           ∧ṪbhTl:I+-₁         J = length(T) + I - 1
           :I↔                 Output = [I,J]

Fatalize

Posted 2017-01-25T12:08:49.730

Reputation: 32 976

4

Jelly, 58 50 bytes

-1 byte thanks to ais523 (use for a two-byte string)

This may well be quite golfable...

Compression takes two string arguments and returns a list:
[[[startA, lengthA], [startB, lengthB]], compressedString]

w³;w⁴$
0;J⁸ḣ;€
ç;ç@ÑẠ$ÐfLÐṂṪµ³,⁴L€ż@Ñ,

Decompression takes one argument (such a list) and returns two* strings:

,
⁾ṫḣżFv
Ḣç€Ṫ

Overlapped code:

w³;w⁴$
0;J⁸ḣ;€
ç;ç@ÑẠ$ÐfLÐṂṪµ³,⁴L€ż@Ñ,
⁾ṫḣżFv
Ḣç€Ṫ

One-indexed.

* this may not be obvious due to Jelly's implicit print formatting, so the code at TryItOnline linked to above has an extra byte (a Y at the end) to insert a line feed between the two in the printed output.

I started with a single program, which used the depth of the first (or only) argument to decide between compression and decompression, but having an unused link in the decompression code (the first line) and a single overlapping byte is shorter by seven bytes.

How?

ç;ç@ÑẠ$ÐfLÐṂṪµ³,⁴L€ż@Ñ, - Compression: stringA, stringB
ç                       - call the last link (2) as a dyad
  ç@                    - call the last link (2) as a dyad with reversed arguments
 ;                      - concatenate (gives all overlapping strings)
       Ðf               - filter keep:
      $                 -     last two links as a monad
    Ñ                   -         call the next link (1) as a monad
     Ạ                  -         All? (no zeros exist in that result)
          ÐṂ            - filter keep with minimal:
         L              -     length
            Ṫ           - tail (for when more than one exists)
             µ          - monadic chain separation (we now have the compressed string)
              ³,⁴       - [stringA, stringB]
                 L€     - length of €ach
                   ż@   - zip with reversed arguments with
                     Ñ  - next link (1) as a monad with the compressed string
                      , - paired with the compressed string

J0;⁸ḣ;€ - Link 2, possible overlaps: stringL, stringR
J       - range(length(stringL)) - [1,2,...,length(stringL)]
 0;     - zero concatenate       - [0,1,2,...,length(stringL)]
   ⁸    - stringL
    ḣ   - head (vectorises)      - [empty string, first char, first two, ..., stringL]
     ;€ - concatenate €ach with stringR

w³;w⁴$ - Link 1, substring indexes: stringX
w³     - first index of first program argument in stringX or 0 if not found
  ;    - concatenated with
     $ - last two links as a monad
   w⁴  -     first index of second program argument in stringX or 0 if not found
Ḣñ€Ṫ - Decompression: [[[startA, lengthA], [startB, lengthB]], compressedString], ?
Ḣ    - head - [[startA, lengthA], [startB, lengthB]]
   Ṫ - tail - compressedString
 ç€  - call the last link (2) as a dyad for €ach of the left list
     -- extra Y atom at TIO joins the resulting list of two strings with a line feed.

⁾ṫḣżFv - Link 2, extract a substring: [start, length], string
⁾ṫḣ    - string "ṫḣ"
   ż   - zip with [start, length] to yield [['ṫ', start],['ḣ', length]]
    F  - flatten, making a list of characters
     v - evaluate as Jelly code with the string as an argument
       - this evaluates as string.tail(start).head(length) yielding the substring

, - Link 1: only here to make an overlap with the compression program.

Jonathan Allan

Posted 2017-01-25T12:08:49.730

Reputation: 67 804

“ṫḣ” can be golfed by 1 byte by using Jelly's syntax for 2-character strings. – None – 2017-01-25T22:39:22.370

This is a question completely unrelated to the answer per se, but do you write the explanation of the code by hand or is there a tool generating it from code? – tfrascaroli – 2017-01-26T07:37:44.283

@tfrascaroli I write it by hand – Jonathan Allan – 2017-01-26T09:05:32.160