Un-de-duplicating strings

33

1

Introduction

Let's observe the following string:

AABBCCDDEFFGG

You can see that every letter has been duplicated, except for the letter E. That means that the letter E has been de-duplicated. So, the only thing we need to do here is to reverse that process, which gives us the following un-de-duplicated string:

AABBCCDDEEFFGG

Let's take a harder example:

AAAABBBCCCCDD

You can see that there is an uneven number of consecutive B's, so that means that one of the BB was de-duplicated from the original string. We only need to un-de-duplicate this letter, which gives us:

AAAABBBBCCCCDD


The challenge

Given a non-empty de-duplicated string, consisting of only alphabetic characters (either only uppercase or only lowercase), return the un-de-duplicated string. You can assume that there will always be at least one de-duplicated character in the string.


Test cases

AAABBBCCCCDDDD    -->    AAAABBBBCCCCDDDD
HEY               -->    HHEEYY
AAAAAAA           -->    AAAAAAAA
N                 -->    NN
OOQQO             -->    OOQQOO
ABBB              -->    AABBBB
ABBA              -->    AABBAA

This is , so the shortest valid submission in bytes wins!

Adnan

Posted 2016-12-15T22:24:08.423

Reputation: 41 965

@mbomb007 Yes, that would result into AABBBB. – Adnan – 2016-12-15T22:35:04.277

@mbomb007 Ah, I was not aware of that. Thanks! – Adnan – 2016-12-15T22:38:42.693

1I'm not sure I understand the challenge. Why does ABBB map to AABBBB, not AABBBBBB? – Dennis – 2016-12-15T23:14:33.287

2@Dennis If you seperate each group of characters into groups of 2, you would get the following: A BB B. The characters that aren't paired (and therefore not duplicated) need to be duplicated, resulting in AA BB BB, which is the un-de-duplicated string. – Adnan – 2016-12-15T23:33:17.370

Got it. Thanks! – Dennis – 2016-12-15T23:43:15.327

8So: Ensure that every run of characters has an even number of elements by adding at most one element to the run? – Mad Physicist – 2016-12-16T12:13:08.543

1@MadPhysicist Yes, that is correct – Adnan – 2016-12-16T14:30:56.573

Answers

20

MATL, 7 bytes

Y'to+Y"

Try it online! Or verify all test cases.

Let's take 'ABBA' as example input.

Y'   % Implicit input. Run-length decoding
     % STACK: 'ABA', [1 2 1]
t    % Duplicate top of the stack
     % STACK: 'ABA', [1 2 1], [1 2 1]
o    % Modulo 2
     % STACK: 'ABA', [1 2 1], [1 0 1]
+    % Add, element-wise
     % STACK: 'ABA', [2 2 2]
Y"   % Run-length encoding. Implicit display
     % STACK: 'AABBAA'

Luis Mendo

Posted 2016-12-15T22:24:08.423

Reputation: 87 464

11

Retina, 11 bytes

(.)\1?
$1$1

Try it online - contains all test cases

mbomb007

Posted 2016-12-15T22:24:08.423

Reputation: 21 944

1I expected Retina to win. – Adám – 2016-12-16T14:53:04.520

@Adám Yeah, it's pretty short, but that MATL answer is great. All the golfing languages ended up with shorter solutions. – mbomb007 – 2016-12-16T14:57:52.323

8

Perl, 16 bytes

15 bytes of code + -p flag.

s/(.)\1?/$1$1/g

To run it:

perl -pe 's/(.)\1?/$1$1/g' <<< 'HEY'

Dada

Posted 2016-12-15T22:24:08.423

Reputation: 8 279

7

Haskell, 36 bytes

u(a:b:c)=a:a:u([b|a/=b]++c)
u x=x++x

Usage example: u "OOQQO" -> "OOQQOO".

If the string has at least 2 elements, take two copies of the first and append a recursive call with

  • the second element an the rest if the first two elements differ or
  • just the rest

If there are less than two elements (one or zero), take two copies of the list.

nimi

Posted 2016-12-15T22:24:08.423

Reputation: 34 639

6

Brachylog, 17 bytes

@b:{~b#=.l#e,|}ac

Try it online!

Explanation

Example input: "ABBB"

@b                  Blocks: Split into ["A", "BBB"]
  :{          }a    Apply the predicate below to each element of the list: ["AA", "BBBB"]
                c   Concatenate: "AABBBB"

    ~b#=.             Output is the input with an additional element at the beginning, and
                        all elements of the output are the same (e.g. append a leading "B")
        .l#e,         The length of the Output is an even number
             |        Or: Input = Output (i.e. do nothing)

Fatalize

Posted 2016-12-15T22:24:08.423

Reputation: 32 976

4

JavaScript (ES6), 37 30 bytes

Saved 7 bytes by using the much more efficient '$1$1' like [other] [answers] did

s=>s.replace(/(.)\1?/g,'$1$1')

Test cases

let f =

s=>s.replace(/(.)\1?/g,'$1$1')

console.log(f("AAABBBCCCCDDDD"));   //  -->  AAAABBBBCCCCDDDD
console.log(f("HEY"));              //  -->  HHEEYY
console.log(f("AAAAAAA"));          //  -->  AAAAAAAA
console.log(f("N"));                //  -->  NN
console.log(f("OOQQO"));            //  -->  OOQQOO
console.log(f("ABBB"));             //  -->  AABBBB
console.log(f("ABBA"));             //  -->  AABBAA

Arnauld

Posted 2016-12-15T22:24:08.423

Reputation: 111 334

4

Ruby, 21 bytes

20 bytes plus the -p flag.

gsub(/(.)\1?/){$1*2}

Value Ink

Posted 2016-12-15T22:24:08.423

Reputation: 10 608

4

Mathematica, 41 bytes

s=StringReplace;s[s[#,a_~~a_->a],b_->b~~b]&

Unnamed function that inputs a string and outputs a string. Completely deduplicate then completely undeduplicate. Not real short, but I couldn't do better for now.

Greg Martin

Posted 2016-12-15T22:24:08.423

Reputation: 13 940

4

Befunge 98, 24 bytes

#@~#;:::#@,~-:!j;$,;-\,;

Try it Online!

$ can be easily replaced with -, and the 2nd @ with ;.

I think this can be golfed further due to the - at the beginning of both -, (or $, above) and -\,.

How?

Stack notation:  bottom [A, B, C, D] top

#@~     Pushes the first character onto the stack (C henceforth) and ends if EOF
#;      No-op to be used later
:::     Now stack is [C, C, C, C]

#@,~    Prints C, and if EOF is next (odd consecutive Cs), prints again and ends
        Lets call the next character D

-       Now stack is [C, C, C-D]
:!j;    If C == D, go to "$," Else, go to "-\,"

===(C == D)===

$,      C == D (i.e. a pair of Cs) so we discard top and print C (Stack is now [C])
;-\,;   Skipped, IP wraps, and loop starts again

===(C != D)===

-       Stack is [C, C-(C-D)]  By expanding: [C, C - C + D] or just [C, D]
\,      Prints C (Stack is now [D])

;#@~#;  This is skipped, because we already read the first character of a set of Ds,
        and this algorithm works by checking the odd character in a set of
        consecutive similar characters. We already read D, so we don't
        need to read another character.

MildlyMilquetoast

Posted 2016-12-15T22:24:08.423

Reputation: 2 907

3

Java 7, 58 bytes

String c(String s){return s.replaceAll("(.)\\1?","$1$1");}

Ungolfed:

String c(String s){
  return s.replaceAll("(.)\\1?", "$1$1");
}

Test code:

Try it here.

class M{
  static String c(String s){return s.replaceAll("(.)\\1?","$1$1");}

  public static void main(String[] a){
    System.out.println(c("AABBCCDDEFFGG"));
    System.out.println(c("AAAABBBCCCCDD"));
    System.out.println(c("AAABBBCCCCDDDD"));
    System.out.println(c("HEY"));
    System.out.println(c("AAAAAAA"));
    System.out.println(c("N"));
    System.out.println(c("OOQQO"));
    System.out.println(c("ABBB"));
    System.out.println(c("ABBA"));
  }
}

Output:

AABBCCDDEEFFGG
AAAABBBBCCCCDD
AAAABBBBCCCCDDDD
HHEEYY
AAAAAAAA
NN
OOQQOO
AABBBB
AABBAA

Kevin Cruijssen

Posted 2016-12-15T22:24:08.423

Reputation: 67 575

2

PHP, 65 bytes, no regex

while(""<$c=($s=$argv[1])[$i])if($c!=$s[++$i]||!$k=!$k)echo$c.$c;

takes input from command line argument. Run with -r.

regex? In PHP, the regex used by most answers duplicates every character. would be 44 bytes:

<?=preg_replace("#(.)\1?#","$1$1",$argv[1]);

Titus

Posted 2016-12-15T22:24:08.423

Reputation: 13 814

2

Brain-Flak 69 Bytes

Includes +3 for -c

{((({}<>))<>[({})]<(())>){((<{}{}>))}{}{(<{}{}>)}{}}<>{({}<>)<>}<>

Try it Online!

Explanation:

Part 1:
{((({}<>))<>[({})]<(())>){((<{}{}>))}{}{(<{}{}>)}{}}<>

{                                                  }   # loop through all letters
 (   {}     [ {} ]<(())>){((<{}{}>))}{}                # equals from the wiki   
                                                       # but first:
  ((  <>))<>                                           # push the top letter on the other 
                                                       # stack twice  
             (  )                                      # push the second letter back on
                                       {        }      # if they were equal:
                                        (<    >)       # push a 0 to exit this loop
                                          {}{}         # after popping the 1 from the 
                                                       # comparison and the next letter
                                                       # (the duplicate)
                                                 {}    # pop the extra 0
                                                    <> # switch stacks

Part 2 (at this point, everything is duplicated in reverse order):
{({}<>)<>}<>

{        }   # for every letter:
 ({}<>)      # move the top letter to the other stack
       <>    # and switch back
          <> # Finally switch stacks and implicitly print

Riley

Posted 2016-12-15T22:24:08.423

Reputation: 11 345

1

Jelly, 9 bytes

Œgs€2ṁ€€2

Try it online!

Dennis

Posted 2016-12-15T22:24:08.423

Reputation: 196 637

1

Racket 261 bytes

(let((l(string->list s))(r reverse)(c cons)(e even?)(t rest)(i first))(let p((l(t l))(ol(c(i l)'())))
(cond[(empty? l)(list->string(if(e(length ol))(r ol)(r(c(i ol)ol))))][(or(equal?(i ol)(i l))(e(length ol)))
(p(t l)(c(i l)ol))][(p(t l)(c(i l)(c(i ol)ol)))])))

Ungolfed:

(define (f s)
  (let ((l (string->list s)))
    (let loop ((l (rest l))
               (ol (cons (first l) '())))
      (cond
        [(empty? l)
         (list->string(if (even? (length ol))
                          (reverse ol)
                          (reverse (cons (first ol) ol))))]
        [(or (equal? (first ol) (first l)) 
             (even? (length ol)))
         (loop (rest l) (cons (first l) ol))]
        [else
         (loop (rest l) (cons (first l) (cons (first ol) ol)))] ))))

Testing:

(f "ABBBCDDEFFGGG")

Output:

"AABBBBCCDDEEFFGGGG"

rnso

Posted 2016-12-15T22:24:08.423

Reputation: 1 635

1

V 10 bytes

ͨ.©±½/±±

TryItOnline

Just a find and replace regex like all of the rest in the thread. The only difference is that I can replace anything that would require a \ in front of it with the character with the same ascii value, but the high bit set. (So (, 00101000 becomes ¨, 10101000)

nmjcman101

Posted 2016-12-15T22:24:08.423

Reputation: 3 274

1

Perl 6, 17 bytes

s:g/(.)$0?/$0$0/

with -p command-line switch

Example:

$ perl6 -pe 's:g/(.)$0?/$0$0/' <<< 'AAABBBCCCCDDDD
> HEY
> AAAAAAA
> N
> OOQQO
> ABBB
> ABBA'
AAAABBBBCCCCDDDD
HHEEYY
AAAAAAAA
NN
OOQQOO
AABBBB
AABBAA

Brad Gilbert b2gills

Posted 2016-12-15T22:24:08.423

Reputation: 12 713

1

Python3, 102 94 bytes

from collections import*
lambda s:"".join(c*(s.count(c)+1&-2)for c in OrderedDict.fromkeys(s))

Thanks to xnor for saving 8 bytes! -> bithack.

Yytsi

Posted 2016-12-15T22:24:08.423

Reputation: 3 582

This doesn't keep the letters in the right order. – xnor – 2016-12-16T05:19:36.607

@xnor Thanks for mentioning! Fixed. – Yytsi – 2016-12-16T05:33:48.427

Looks good. You can write the expression x+x%2 as x&-2. – xnor – 2016-12-16T05:54:56.573

@xnor I tried s.count(c)&-2 and it returned an empty string... :/ Any thoughts? – Yytsi – 2016-12-16T06:07:59.950

It's working for me. Did you put it out it in parens? – xnor – 2016-12-16T07:16:27.900

@xnor Here is what I tried. What did I do wrong?

– Yytsi – 2016-12-16T07:41:15.267

1Oh, you're right and I made a mistake. I think x+1&-2 should do it. Evens go to themselves and odds round up to evens. – xnor – 2016-12-16T08:09:45.673

dict preserves order in 3.6 and above. https://docs.python.org/3.6/whatsnew/3.6.html#new-dict-implementation – mbomb007 – 2016-12-16T14:51:44.710

1

05AB1E, 10 bytes

.¡vy¬ygÉ×J

Try it online!

Explanation

.¡           # split string into groups of the same char
  v          # for each group
   y         # push the group
    ¬        # push the char the group consists of
     yg      # push the length of the group
       É     # check if the length of the group is odd
        ×    # repeat the char is-odd times (0 or 1)
         J   # join to string

Emigna

Posted 2016-12-15T22:24:08.423

Reputation: 50 798

1

R, 81 bytes

r=rle(el(strsplit(scan(,""),"")));cat(do.call("rep",list(r$v,r$l+r$l%%2)),sep="")

Reads a string from stdin, splin into vector of characters and perform run-length encoding (rle). Subsequently repeat the each values from the rle, the sum of the lengths and the lengths mod 2.

If we can read input separated by space (implicitly as a vector/array of characters) then we can skip the splitting part and the program reduces to 64 bytes:

r=rle(scan(,""));cat(do.call("rep",list(r$v,r$l+r$l%%2)),sep="")

Billywob

Posted 2016-12-15T22:24:08.423

Reputation: 3 363

1

><> (Fish) 39 bytes

0v ;oo:~/:@@:@=?!voo
 >i:1+?!\|o !:  !<

Pretty sure this can be golfed a lot using a different technique.

It takes an input and compares against the current stack item, if it's different it'll print the first stack item twice, if the same it prints them both.

The stack when empty gets supplied with a 0 which prints nothing so can be appended on whenever.

Teal pelican

Posted 2016-12-15T22:24:08.423

Reputation: 1 338

1

Pyth, 15 bytes

Vrz8p*+hN%hN2eN

Verify all test cases here.

Thanks to Luis Mendo for the methodology.

Explanation

Vrz8p*+hN%hN2eN    z autoinitializes to the input
 rz8               run-length encode the input, returned as list of tuples (A -> [[1,"A"]])
V                  for every element N in this list
      +hN          add the head element of N (the number in the tuple)
         %hN2      to the head element of N mod 2
     *       eN    repeat the tail element of N that many times (the letter in the tuple)
    p              print repeated character without trailing newline

As is often the case, I feel like this could be shorter. I think there should be a better way to extract elements from the list than what I am using here.

Mike Bufardeci

Posted 2016-12-15T22:24:08.423

Reputation: 1 680

1

PowerShell, 28 bytes

$args-replace'(.)\1?','$1$1'

Try it online! (includes all test cases)

Port of the Retina answer. The only points of note are we've got $args instead of the usual $args[0] (since the -replace will iterate over each item in the input array, we can golf off the index), and the '$1$1' needs to be single quotes so they're replaced with the regex variables rather than being treated as PowerShell variables (which would happen if they were double-quote).

AdmBorkBork

Posted 2016-12-15T22:24:08.423

Reputation: 41 581

1

C, 67 bytes

i;f(char*s,char*d){i=*s++;*d++=i;*d++=i;*s?f(i-*s?s:++s,d):(*d=0);}

Call with:

int main()
{
    char *in="AAABBBCCCCDDDD";
    char out[128];
    f(in,out);
    puts(out);
}

Steadybox

Posted 2016-12-15T22:24:08.423

Reputation: 15 798

1

brainfuck, 22 bytes

,
[
  [>->+<<-]
  >[>..<<]
  >,
]

Try it online.

Prints the current character twice, unless it's equal to a character that was just printed twice.

Mitch Schwartz

Posted 2016-12-15T22:24:08.423

Reputation: 4 899