Removing unique elements from string

12

1

I came upon this question, because it seems to be very common use-case to find unique characters in string. But what if we want to get rid of them?

Input contains only lower case alphabets. Only letters from a to z are used. Input length may be from 1 to 1000 characters.

Example:
input: helloworld
output: llool

Objective: Shortest code wins
Language: Any of the top 20 of TIOBE languages

user14742

Posted 2012-10-10T23:57:26.087

Reputation: 239

Answers

7

Perl, 28 24 characters (includes 1 for 'p' option)

s/./$&x(s!$&!$&!g>1)/eg

Usage:

> perl -pe 's/./$&x(s!$&!$&!g>1)/eg'
helloworld
llool

At first I thought I could do this with negative look-ahead and negative look-behind, but it turns out that negative look-behinds must have a fixed length. So I went for nested regexes instead. With thanks to mob for the $& tip.

Gareth

Posted 2012-10-10T23:57:26.087

Reputation: 11 678

+1. I naively thought I could take this thing with my Ruby answer. – Steven Rumbalski – 2012-10-11T14:27:08.753

i tried this on chinese text and it did not do the trick. =( – ixtmixilix – 2012-10-12T00:12:34.153

@ixtmixilix - then run perl with the -CDS option – mob – 2012-10-12T00:38:11.630

@ixtmixilix I don't know enough about unicode and Perl's support of it to suggest a way to make it work with chinese text I'm afraid. Luckily for me the question says only lower case a to z. – Gareth – 2012-10-12T00:38:41.863

1Replace all the $1 with $& and you can lose a couple pairs of parentheses. – mob – 2012-10-12T00:39:01.410

12

(GolfScript, 15 13 characters)

:;{.;?);>?)},

GolfScript is not one of the top 20, but a codegolf without GolfScript... (run it yourself)

Previous Version: (run script)

1/:;{;\-,;,(<},

Howard

Posted 2012-10-10T23:57:26.087

Reputation: 23 109

1:;? You're deliberately trying to confuse newbies, aren't you? ;) – Peter Taylor – 2012-10-11T09:17:39.680

@PeterTaylor You're right. I should have chosen a ) - it would make it a smiley then :). Unfortunately, I didn't find a way to even eliminate the digit 1. (Note for GolfScript newbies: you may replace any ; in the code with a x (or any other letter or digit - or any character not used in the script otherwise). In this special case ; is just a variable name - and has not the meaning "pop and discard". In GolfScript almost all tokens are variables anyways, and using predefined symbols is great way to make scripts even more unreadable for outsiders ;-).) – Howard – 2012-10-11T11:09:43.307

Another 13-char solution: :a{]a.@--,(}, – Ilmari Karonen – 2014-04-11T19:01:16.787

7

J, 12 characters

Having entered a valid Perl answer, here's an invalid (language not in the TIOBE top 20) answer.

a=:#~1<+/@e.

Usage:

   a 'helloworld'
llool

Declares a verb a which outputs only non unique items.

Gareth

Posted 2012-10-10T23:57:26.087

Reputation: 11 678

5

GolfScript (14 chars)

:x{{=}+x\,,(},

Online demo

Might not qualify to win, but it's useful to have a yardstick.

Peter Taylor

Posted 2012-10-10T23:57:26.087

Reputation: 41 901

4

Ruby 46 40 36

gets.chars{|c|$><<c if$_.count(c)>1}

Steven Rumbalski

Posted 2012-10-10T23:57:26.087

Reputation: 1 353

You may save 4 chars if you inline s and use $_ for the second appearance (the space before is then dispensable). – Howard – 2012-10-11T11:22:00.983

@Howard: Nice catch. Thanks. I have about zero experience with Ruby. – Steven Rumbalski – 2012-10-11T14:03:19.893

2

Japt, 6 5 bytes

ÆèX É

-1 byte thanks to @Oliver

Try it online!

Quintec

Posted 2012-10-10T23:57:26.087

Reputation: 2 801

2Welcome to Japt! There is actually a shortcut for o@: Æ – Oliver – 2019-02-25T18:29:55.693

@Oliver Another shortcut that I missed, cool, thanks :) – Quintec – 2019-02-25T20:13:59.657

@Oliver, the better question is how the feck did I miss it?! :\ – Shaggy – 2019-02-27T18:23:54.210

2

Brachylog (v2), 8 bytes

⊇.oḅlⁿ1∧

Try it online!

Function submission. Technically noncompeting because the question has a limitation on what langauges are allowed to compete (however, several other answers have already ignored the restriction).

Explanation

⊇.oḅlⁿ1∧
⊇         Find {the longest possible} subset of the input
  o       {for which after} sorting it,
   ḅ        and dividing the sorted input into blocks of identical elements,
    lⁿ1     the length of a resulting block is never 1
 .     ∧  Output the subset in question.

ais523

Posted 2012-10-10T23:57:26.087

Reputation: 11

Why do you CW all your solutions? – Shaggy – 2019-02-25T20:11:19.460

1@Shaggy: a) because I'm fine with other people editing them, b) to avoid gaining reputation if they're upvoted. In general I think the gamififcation of Stack Exchange is a huge detriment to the site – there's sometimes a negative correlation between the actions that you can take to improve rep and the actions you can take to actually improve the site. Additionally, being at a high reputation count sucks; the site keeps nagging you to do admin tasks, and everything you do is a blunt instrument (e.g. when you're at low rep you can suggest an edit, at high rep it just gets forced through). – ais523 – 2019-02-25T20:24:31.217

2

Python 2.7 (52 51), Python 3 (52)

I didn't expect it to be so short.

2.7: a=raw_input();print filter(lambda x:a.count(x)>1,a)

3.0: a=input();print''.join(i for i in a if a.count(x)>1)

raw_input(): store input as a string (input() = eval(raw_input()))
(Python 3.0: input() has been turned into raw_input())

filter(lambda x:a.count(x)>1,a): Filter through all characters within a if they are found in a more than once (a.count(x)>1).

beary605

Posted 2012-10-10T23:57:26.087

Reputation: 3 904

If you use python 3 instead, you can use input() rather than raw_input(). Although you have to add one character for a closing bracket, since print is a function in python 3. – Strigoides – 2012-10-16T02:03:36.977

@Strigoides: I have added a Python 3 code snippet to my answer. – beary605 – 2012-10-16T02:18:51.030

Python 3's filter returns an iterator... You'll need to do ''.join(...) – JBernardo – 2012-10-16T04:23:01.247

@JBernardo: :( Dang. Thanks for notifying me. As you can see, I don't use 3.0. – beary605 – 2012-10-16T05:20:23.827

2

Perl 44

$l=$_;print join"",grep{$l=~/$_.*$_/}split""

Execution:

perl -lane '$l=$_;print join"",grep{$l=~/$_.*$_/}split""' <<< helloworld
llool

flodel

Posted 2012-10-10T23:57:26.087

Reputation: 2 345

2

K, 18

{x@&x in&~1=#:'=x}

tmartin

Posted 2012-10-10T23:57:26.087

Reputation: 3 917

You can save a byte using 1<# instead of ~1=# – J. Sendra – 2019-02-25T21:00:09.647

2

sed and coreutils (128)

Granted this is not part of the TIOBE list, but it's fun (-:

<<<$s sed 's/./&\n/g'|head -c -1|sort|uniq -c|sed -n 's/^ *1 (.*)/\1/p'|tr -d '\n'|sed 's:^:s/[:; s:$:]//g\n:'|sed -f - <(<<<$s)

De-golfed version:

s=helloworld
<<< $s sed 's/./&\n/g'        \
| head -c -1                  \
| sort                        \
| uniq -c                     \
| sed -n 's/^ *1 (.*)/\1/p'   \
| tr -d '\n'                  \
| sed 's:^:s/[:; s:$:]//g\n:' \
| sed -f - <(<<< $s)

Explanation

The first sed converts input into one character per line. The second sed finds characters that only occur once. Third sed writes a sed script that deletes unique characters. The last sed executes the generated script.

Thor

Posted 2012-10-10T23:57:26.087

Reputation: 2 526

1

Java 8, 90 bytes

s->{for(char c=96;++c<123;s=s.matches(".*"+c+".*"+c+".*")?s:s.replace(c+"",""));return s;}

Explanation:

Try it online.

s->{                         // Method with String as both parameter and return-type
  for(char c=96;++c<123;     //  Loop over the lowercase alphabet
    s=s.matches(".*"+c+".*"+c+".*")?
                             //   If the String contains the character more than once
       s                     //    Keep the String as is
      :                      //   Else (only contains it once):
       s.replace(c+"",""));  //    Remove this character from the String
  return s;}                 //  Return the modified String

Kevin Cruijssen

Posted 2012-10-10T23:57:26.087

Reputation: 67 575

1

PowerShell, 59 bytes

"$args"-replace"[^$($args|% t*y|group|?{$_.Count-1}|% n*)]"

Try it online!

Less golfed:

$repeatedСhars=$args|% toCharArray|group|?{$_.Count-1}|% name
"$args"-replace"[^$repeatedСhars]"

Note: $repeatedChars is an array. By default, a Powershell joins array elements by space char while convert the array to string. So, the regexp contains spaces (In this example, [^l o]). Spaces do not affect the result because the input string contains letters only.

mazzy

Posted 2012-10-10T23:57:26.087

Reputation: 4 832

1

APL (Dyalog Extended), 8 bytesSBCS

Anonymous tacit prefix function.

∊⊢⊆⍨1<⍧⍨

Try it online!

⍧⍨ count-in selfie (count occurrences of argument elements in the argument itself)

1< Boolean mask where one is less than that

⊢⊆⍨ partition the argument by that mask (beginning a new partition on 1s and removing on 0s)

ϵnlist (flatten)

Adám

Posted 2012-10-10T23:57:26.087

Reputation: 37 779

1

JavaScript, 45 bytes

s=>[...s].filter(c=>s.match(c+'.*'+c)).join``

kamoroso94

Posted 2012-10-10T23:57:26.087

Reputation: 739

1

R, 70 bytes

a=utf8ToInt(scan(,''));intToUtf8(a[!a%in%names(table(a)[table(a)<2])])

Try it online!

A poor attempt, even from a TIOBE top 20 language. I know something can be done about the second half, but at the moment, any golfs escape me.

Sumner18

Posted 2012-10-10T23:57:26.087

Reputation: 1 334

1

JavaScript (Node.js), 82 bytes

p=>[...p].map((v,i,a)=>a.filter(f=>f==v).length).reduce((a,c,i)=>c>1?a+=p[i]:a,[])

Try it online!

Kamil Naja

Posted 2012-10-10T23:57:26.087

Reputation: 121

1You can use .join\`` instead of .join(""). – recursive – 2019-02-25T21:35:24.840

1

JavaScript, 34 bytes

Input as a string, output as a character array.

s=>[...s].filter(x=>s.split(x)[2])

Try It Online!

Shaggy

Posted 2012-10-10T23:57:26.087

Reputation: 24 623

@Oliver, not quite.

– Shaggy – 2019-02-25T22:05:51.003

1

Mathematica 72 63

Ok, Mathematica isn't among the top 20 languages, but I decided to join the party anyway.

x is the input string.

"" <> Select[y = Characters@x, ! MemberQ[Cases[Tally@y, {a_, 1} :> a], #] &]

DavidC

Posted 2012-10-10T23:57:26.087

Reputation: 24 524

1

Python (56)

Here's another (few chars longer) alternative in Python:

a=raw_input();print''.join(c for c in a if a.count(c)>1)

If you accept output as a list (e.g. ['l', 'l', 'o', 'o', 'l']), then we could boil it down to 49 characters:

a=raw_input();print[c for c in a if a.count(c)>1]

arshajii

Posted 2012-10-10T23:57:26.087

Reputation: 2 142

Hey, >1 is a good idea! May I incorporate that into my solution? – beary605 – 2012-10-11T02:32:19.017

@beary605 Sure no problem at all - easy way to trim a character off :D – arshajii – 2012-10-11T02:43:39.443

1

Perl (55)

@x=split//,<>;$s{$_}++for@x;for(@x){print if($s{$_}>1)}

Reads from stdin.

QuasarDonkey

Posted 2012-10-10T23:57:26.087

Reputation: 111

1

Ocaml, 139 133

Uses ExtLib's ExtString.String

open ExtString.String
let f s=let g c=fold_left(fun a d->a+Obj.magic(d=c))0 s in replace_chars(fun c->if g c=1 then""else of_char c)s

Non-golfed version

open ExtString.String
let f s =
  let g c =
    fold_left
      (fun a c' -> a + Obj.magic (c' = c))
      0
      s
  in replace_chars
  (fun c ->
    if g c = 1
    then ""
    else of_char c)
  s

The function g returns the number of occurences of c in the string s. The function f replaces all chars either by the empty string or the string containing the char depending on the number of occurences. Edit: I shortened the code by 6 characters by abusing the internal representation of bools :-)

Oh, and ocaml is 0 on the TIOBE index ;-)

ReyCharles

Posted 2012-10-10T23:57:26.087

Reputation: 525

f*** the TIOBE index. – ixtmixilix – 2012-10-12T00:04:03.603

I agree. Also, thanks for the upvote. Now I can comment :-) – ReyCharles – 2012-10-12T00:17:47.037

1

C# – 77 characters

Func<string,string>F=s=>new string(s.Where(c=>s.Count(d=>c==d)>1).ToArray());

If you accept the output as an array, it boils down to 65 characters:

Func<string,char[]>F=s=>s.Where(c=>s.Count(d=>c==d)>1).ToArray();

Mormegil

Posted 2012-10-10T23:57:26.087

Reputation: 1 148

1

PHP - 70

while($x<strlen($s)){$c=$s[$x];echo substr_count($s,$c)>1?$c:'';$x++;}

with asumption $s = 'helloworld'.

hengky mulyono

Posted 2012-10-10T23:57:26.087

Reputation: 11

0

C++, 139 bytes

string s;cin>>s;string w{s}; auto l=remove_if(begin(s),end(s),[&w](auto&s){return count(begin(w),end(w),s)==1;});s.erase(l,end(s));cout<<s;

ungolfed:

#include <algorithm>
#include <string>
#include <iostream>

int main() {
  using namespace std;
  string s;
  cin >> s;
  const string w{s};
  auto l = remove_if(begin(s), end(s), [&w](auto& s) {
                                         return count(begin(w), end(w), s) == 1;
                                       });
  s.erase(l, end(s));
  cout << s;
  return 0;
}

zelcon

Posted 2012-10-10T23:57:26.087

Reputation: 121

0

PHP - 137

Code

implode('',array_intersect(str_split($text),array_flip(array_filter(array_count_values(str_split($text)),function($x){return $x>=2;}))));

Normal Code

$text   = 'helloworld';
$filter = array_filter(array_count_values(str_split($text)), function($x){return $x>=2;});
$output = implode('',array_intersect(str_split($text),array_flip($filter)));

echo $output;

Wahyu Kristianto

Posted 2012-10-10T23:57:26.087

Reputation: 101

0

PHP - 83 78

<?for($a=$argv[1];$i<strlen($a);$r[$a[$i++]]++)foreach($ras$k=>$c)if($c>1)echo$k

Improved version:

<?for($s=$argv[1];$x<strlen($s);$c=$s[$x++]) echo substr_count($s,$c)>1?$c:'';

Of course this needs notices to be turned off

Edit: Improvement inspired by @hengky mulyono

I am so bad at codegolf :)

milo5b

Posted 2012-10-10T23:57:26.087

Reputation: 169