Removing unique elements from string

I came upon this question, because it seems to be very common use-case to find unique characters in string. But what if we want to get rid of them?

Input contains only lower case alphabets. Only letters from a to z are used. Input length may be from 1 to 1000 characters.

Example:
input: helloworld
output: llool

Objective: Shortest code wins
Language: Any of the top 20 of TIOBE languages

user14742

Posted 2012-10-10T23:57:26.087

Reputation: 239

Answers

Perl, 28 24 characters (includes 1 for 'p' option)

s/./$&x(s!$&!$&!g>1)/eg

Usage:

> perl -pe 's/./$&x(s!$&!$&!g>1)/eg'
helloworld
llool

At first I thought I could do this with negative look-ahead and negative look-behind, but it turns out that negative look-behinds must have a fixed length. So I went for nested regexes instead. With thanks to mob for the $& tip.

Gareth

Posted 2012-10-10T23:57:26.087

Reputation: 11 678

+1. I naively thought I could take this thing with my Ruby answer. – Steven Rumbalski – 2012-10-11T14:27:08.753

i tried this on chinese text and it did not do the trick. =( – ixtmixilix – 2012-10-12T00:12:34.153

@ixtmixilix - then run perl with the -CDS option – mob – 2012-10-12T00:38:11.630

@ixtmixilix I don't know enough about unicode and Perl's support of it to suggest a way to make it work with chinese text I'm afraid. Luckily for me the question says only lower case a to z. – Gareth – 2012-10-12T00:38:41.863

1Replace all the $1 with $& and you can lose a couple pairs of parentheses. – mob – 2012-10-12T00:39:01.410

(GolfScript, 15 13 characters)

:;{.;?);>?)},

GolfScript is not one of the top 20, but a codegolf without GolfScript... (run it yourself)

Previous Version: (run script)

1/:;{;\-,;,(<},

Howard

Posted 2012-10-10T23:57:26.087

Reputation: 23 109

1:;? You're deliberately trying to confuse newbies, aren't you? ;) – Peter Taylor – 2012-10-11T09:17:39.680

@PeterTaylor You're right. I should have chosen a ) - it would make it a smiley then :). Unfortunately, I didn't find a way to even eliminate the digit 1. (Note for GolfScript newbies: you may replace any ; in the code with a x (or any other letter or digit - or any character not used in the script otherwise). In this special case ; is just a variable name - and has not the meaning "pop and discard". In GolfScript almost all tokens are variables anyways, and using predefined symbols is great way to make scripts even more unreadable for outsiders ;-).) – Howard – 2012-10-11T11:09:43.307

Another 13-char solution: :a{]a.@--,(}, – Ilmari Karonen – 2014-04-11T19:01:16.787

J, 12 characters

Having entered a valid Perl answer, here's an invalid (language not in the TIOBE top 20) answer.

a=:#~1<+/@e.

Usage:

   a 'helloworld'
llool

Declares a verb a which outputs only non unique items.

Gareth

Posted 2012-10-10T23:57:26.087

Reputation: 11 678

GolfScript (14 chars)

:x{{=}+x\,,(},

Online demo

Might not qualify to win, but it's useful to have a yardstick.

Peter Taylor

Posted 2012-10-10T23:57:26.087

Reputation: 41 901

Ruby 46 40 36

gets.chars{|c|$><<c if$_.count(c)>1}

Steven Rumbalski

Posted 2012-10-10T23:57:26.087

Reputation: 1 353

You may save 4 chars if you inline s and use $_ for the second appearance (the space before is then dispensable). – Howard – 2012-10-11T11:22:00.983

@Howard: Nice catch. Thanks. I have about zero experience with Ruby. – Steven Rumbalski – 2012-10-11T14:03:19.893

Japt, 6 5 bytes

ÆèX É

-1 byte thanks to @Oliver

Try it online!

Quintec

Posted 2012-10-10T23:57:26.087

Reputation: 2 801

2Welcome to Japt! There is actually a shortcut for o@: Æ – Oliver – 2019-02-25T18:29:55.693

@Oliver Another shortcut that I missed, cool, thanks :) – Quintec – 2019-02-25T20:13:59.657

@Oliver, the better question is how the feck did I miss it?! :\ – Shaggy – 2019-02-27T18:23:54.210

Brachylog (v2), 8 bytes

⊇.oḅlⁿ1∧

Try it online!

Function submission. Technically noncompeting because the question has a limitation on what langauges are allowed to compete (however, several other answers have already ignored the restriction).

Explanation

⊇.oḅlⁿ1∧
⊇         Find {the longest possible} subset of the input
  o       {for which after} sorting it,
   ḅ        and dividing the sorted input into blocks of identical elements,
    lⁿ1     the length of a resulting block is never 1
 .     ∧  Output the subset in question.

ais523

Posted 2012-10-10T23:57:26.087

Reputation: 11

Why do you CW all your solutions? – Shaggy – 2019-02-25T20:11:19.460

1@Shaggy: a) because I'm fine with other people editing them, b) to avoid gaining reputation if they're upvoted. In general I think the gamififcation of Stack Exchange is a huge detriment to the site – there's sometimes a negative correlation between the actions that you can take to improve rep and the actions you can take to actually improve the site. Additionally, being at a high reputation count sucks; the site keeps nagging you to do admin tasks, and everything you do is a blunt instrument (e.g. when you're at low rep you can suggest an edit, at high rep it just gets forced through). – ais523 – 2019-02-25T20:24:31.217

Python 2.7 (52 51), Python 3 (52)

I didn't expect it to be so short.

2.7: a=raw_input();print filter(lambda x:a.count(x)>1,a)

~~3.0: a=input();print''.join(i for i in a if a.count(x)>1)~~

raw_input(): store input as a string (input() = eval(raw_input()))
(Python 3.0: input() has been turned into raw_input())

filter(lambda x:a.count(x)>1,a): Filter through all characters within a if they are found in a more than once (a.count(x)>1).

beary605

Posted 2012-10-10T23:57:26.087

Reputation: 3 904

If you use python 3 instead, you can use input() rather than raw_input(). Although you have to add one character for a closing bracket, since print is a function in python 3. – Strigoides – 2012-10-16T02:03:36.977

@Strigoides: I have added a Python 3 code snippet to my answer. – beary605 – 2012-10-16T02:18:51.030

Python 3's filter returns an iterator... You'll need to do ''.join(...) – JBernardo – 2012-10-16T04:23:01.247

@JBernardo: :( Dang. Thanks for notifying me. As you can see, I don't use 3.0. – beary605 – 2012-10-16T05:20:23.827

Perl 44

$l=$_;print join"",grep{$l=~/$_.*$_/}split""

Execution:

perl -lane '$l=$_;print join"",grep{$l=~/$_.*$_/}split""' <<< helloworld
llool

flodel

Posted 2012-10-10T23:57:26.087

Reputation: 2 345

K, 18

{x@&x in&~1=#:'=x}

tmartin

Posted 2012-10-10T23:57:26.087

Reputation: 3 917

You can save a byte using 1<# instead of ~1=# – J. Sendra – 2019-02-25T21:00:09.647

sed and coreutils (128)

Granted this is not part of the TIOBE list, but it's fun (-:

<<<$s sed 's/./&\n/g'|head -c -1|sort|uniq -c|sed -n 's/^ *1 (.*)/\1/p'|tr -d '\n'|sed 's:^:s/[:; s:$:]//g\n:'|sed -f - <(<<<$s)

De-golfed version:

s=helloworld
<<< $s sed 's/./&\n/g'        \
| head -c -1                  \
| sort                        \
| uniq -c                     \
| sed -n 's/^ *1 (.*)/\1/p'   \
| tr -d '\n'                  \
| sed 's:^:s/[:; s:$:]//g\n:' \
| sed -f - <(<<< $s)

Explanation

The first sed converts input into one character per line. The second sed finds characters that only occur once. Third sed writes a sed script that deletes unique characters. The last sed executes the generated script.

Thor

Posted 2012-10-10T23:57:26.087

Reputation: 2 526

Java 8, 90 bytes

s->{for(char c=96;++c<123;s=s.matches(".*"+c+".*"+c+".*")?s:s.replace(c+"",""));return s;}

Explanation:

Try it online.

s->{                         // Method with String as both parameter and return-type
  for(char c=96;++c<123;     //  Loop over the lowercase alphabet
    s=s.matches(".*"+c+".*"+c+".*")?
                             //   If the String contains the character more than once
       s                     //    Keep the String as is
      :                      //   Else (only contains it once):
       s.replace(c+"",""));  //    Remove this character from the String
  return s;}                 //  Return the modified String

Kevin Cruijssen

Posted 2012-10-10T23:57:26.087

Reputation: 67 575

PowerShell, 59 bytes

"$args"-replace"[^$($args|% t*y|group|?{$_.Count-1}|% n*)]"

Try it online!

Less golfed:

$repeatedСhars=$args|% toCharArray|group|?{$_.Count-1}|% name
"$args"-replace"[^$repeatedСhars]"

Note: $repeatedChars is an array. By default, a Powershell joins array elements by space char while convert the array to string. So, the regexp contains spaces (In this example, [^l o]). Spaces do not affect the result because the input string contains letters only.

mazzy

Posted 2012-10-10T23:57:26.087

Reputation: 4 832

APL (Dyalog Extended), 8 bytes^SBCS

Anonymous tacit prefix function.

∊⊢⊆⍨1<⍧⍨

Try it online!

⍧⍨ count-in selfie (count occurrences of argument elements in the argument itself)

1< Boolean mask where one is less than that

⊢⊆⍨ partition the argument by that mask (beginning a new partition on 1s and removing on 0s)

∊ ϵnlist (flatten)

Adám

Posted 2012-10-10T23:57:26.087

Reputation: 37 779

JavaScript, 45 bytes

s=>[...s].filter(c=>s.match(c+'.*'+c)).join``

kamoroso94

Posted 2012-10-10T23:57:26.087

Reputation: 739

R, 70 bytes

a=utf8ToInt(scan(,''));intToUtf8(a[!a%in%names(table(a)[table(a)<2])])

Try it online!

A poor attempt, even from a TIOBE top 20 language. I know something can be done about the second half, but at the moment, any golfs escape me.

Sumner18

Posted 2012-10-10T23:57:26.087

Reputation: 1 334

JavaScript (Node.js), 82 bytes

p=>[...p].map((v,i,a)=>a.filter(f=>f==v).length).reduce((a,c,i)=>c>1?a+=p[i]:a,[])

Try it online!

Kamil Naja

Posted 2012-10-10T23:57:26.087

Reputation: 121

1You can use .join\`` instead of .join(""). – recursive – 2019-02-25T21:35:24.840

JavaScript, 34 bytes

Input as a string, output as a character array.

s=>[...s].filter(x=>s.split(x)[2])

Try It Online!

Shaggy

Posted 2012-10-10T23:57:26.087

Reputation: 24 623

Nice. Here's an alternate 34-byter using map: https://tio.run/##BcExDoAgDADAvzgYOtjBHT9CTCSICqmUWKL8vt5l/3oJT6ptKrxHPayKXRwiyoq3r6bbRVAqpWY6uHkdxw4auAhTROLTHGa4IhF//NA@AGZOZdtAfw

– Oliver – 2019-02-25T21:41:08.663

@Oliver, not quite.

– Shaggy – 2019-02-25T22:05:51.003

Mathematica 72 63

Ok, Mathematica isn't among the top 20 languages, but I decided to join the party anyway.

x is the input string.

"" <> Select[y = Characters@x, ! MemberQ[Cases[Tally@y, {a_, 1} :> a], #] &]

DavidC

Posted 2012-10-10T23:57:26.087

Reputation: 24 524

Python (56)

Here's another (few chars longer) alternative in Python:

a=raw_input();print''.join(c for c in a if a.count(c)>1)

If you accept output as a list (e.g. ['l', 'l', 'o', 'o', 'l']), then we could boil it down to 49 characters:

a=raw_input();print[c for c in a if a.count(c)>1]

arshajii

Posted 2012-10-10T23:57:26.087

Reputation: 2 142

Hey, >1 is a good idea! May I incorporate that into my solution? – beary605 – 2012-10-11T02:32:19.017

@beary605 Sure no problem at all - easy way to trim a character off :D – arshajii – 2012-10-11T02:43:39.443

Perl (55)

@x=split//,<>;$s{$_}++for@x;for(@x){print if($s{$_}>1)}

Reads from stdin.

QuasarDonkey

Posted 2012-10-10T23:57:26.087

Reputation: 111

Ocaml, 139 133

Uses ExtLib's ExtString.String

open ExtString.String
let f s=let g c=fold_left(fun a d->a+Obj.magic(d=c))0 s in replace_chars(fun c->if g c=1 then""else of_char c)s

Non-golfed version

open ExtString.String
let f s =
  let g c =
    fold_left
      (fun a c' -> a + Obj.magic (c' = c))
      0
      s
  in replace_chars
  (fun c ->
    if g c = 1
    then ""
    else of_char c)
  s

The function g returns the number of occurences of c in the string s. The function f replaces all chars either by the empty string or the string containing the char depending on the number of occurences. Edit: I shortened the code by 6 characters by abusing the internal representation of bools :-)

Oh, and ocaml is 0 on the TIOBE index ;-)

ReyCharles

Posted 2012-10-10T23:57:26.087

Reputation: 525

f*** the TIOBE index. – ixtmixilix – 2012-10-12T00:04:03.603

I agree. Also, thanks for the upvote. Now I can comment :-) – ReyCharles – 2012-10-12T00:17:47.037

C# – 77 characters

Func<string,string>F=s=>new string(s.Where(c=>s.Count(d=>c==d)>1).ToArray());

If you accept the output as an array, it boils down to 65 characters:

Func<string,char[]>F=s=>s.Where(c=>s.Count(d=>c==d)>1).ToArray();

Mormegil

Posted 2012-10-10T23:57:26.087

Reputation: 1 148

PHP - 70

while($x<strlen($s)){$c=$s[$x];echo substr_count($s,$c)>1?$c:'';$x++;}

with asumption $s = 'helloworld'.

hengky mulyono

Posted 2012-10-10T23:57:26.087

Reputation: 11

C++, 139 bytes

string s;cin>>s;string w{s}; auto l=remove_if(begin(s),end(s),[&w](auto&s){return count(begin(w),end(w),s)==1;});s.erase(l,end(s));cout<<s;

ungolfed:

#include <algorithm>
#include <string>
#include <iostream>

int main() {
  using namespace std;
  string s;
  cin >> s;
  const string w{s};
  auto l = remove_if(begin(s), end(s), [&w](auto& s) {
                                         return count(begin(w), end(w), s) == 1;
                                       });
  s.erase(l, end(s));
  cout << s;
  return 0;
}

zelcon

Posted 2012-10-10T23:57:26.087

Reputation: 121

PHP - 137

Code

implode('',array_intersect(str_split($text),array_flip(array_filter(array_count_values(str_split($text)),function($x){return $x>=2;}))));

Normal Code

$text   = 'helloworld';
$filter = array_filter(array_count_values(str_split($text)), function($x){return $x>=2;});
$output = implode('',array_intersect(str_split($text),array_flip($filter)));

echo $output;

Wahyu Kristianto

Posted 2012-10-10T23:57:26.087

Reputation: 101

PHP - 83 78

<?for($a=$argv[1];$i<strlen($a);$r[$a[$i++]]++)foreach($ras$k=>$c)if($c>1)echo$k

Improved version:

<?for($s=$argv[1];$x<strlen($s);$c=$s[$x++]) echo substr_count($s,$c)>1?$c:'';

Of course this needs notices to be turned off

Edit: Improvement inspired by @hengky mulyono

I am so bad at codegolf :)

milo5b

Posted 2012-10-10T23:57:26.087

Reputation: 169

Asked: 2012-10-10T23:57:26.087

Viewed: 1 621 times

Active: 2019-02-26T01:33:19.580

Removing unique elements from string

Answers

Perl, 28 24 characters (includes 1 for 'p' option)

(GolfScript, 15 13 characters)

J, 12 characters

GolfScript (14 chars)

Ruby 46 40 36

Japt, 6 5 bytes

Brachylog (v2), 8 bytes

Explanation

Python 2.7 (52 51), Python 3 (52)

Perl 44

K, 18

sed and coreutils (128)

Explanation

Java 8, 90 bytes

PowerShell, 59 bytes

APL (Dyalog Extended), 8 bytesSBCS

JavaScript, 45 bytes

R, 70 bytes

JavaScript (Node.js), 82 bytes

JavaScript, 34 bytes

Mathematica 72 63

Python (56)

Perl (55)

Ocaml, 139 133

C# – 77 characters

PHP - 70

C++, 139 bytes

PHP - 137

APL (Dyalog Extended), 8 bytes^SBCS