De-Parenthesizing a String

25

Given a properly-parenthesized string as input, output a list of all nonempty substrings within matching parentheses (or outside of all parentheses), with nested parentheses removed. Each substring should be the sequence of characters in exactly the same matching parentheses. Substrings should be listed in order of depth, and substrings of the same depth should be listed in the order they occur in the string. Assume the input is always correctly parenthesized.

You may assume that the input contains only lower-case ASCII letters and parentheses.

Your answer should be a function that, when given a string, returns a list of strings.

Examples:

                   'a(b)c(d)e' -> ['ace', 'b', 'd']
                   'a(b(c)d)e' -> ['ae', 'bd', 'c']
                  'a((((b))))' -> ['a', 'b']
                        'a()b' -> ['ab']
                            '' -> []
                           'a' -> ['a']
          '(((a(b)c(d)e)f)g)h' -> ['h', 'g', 'f', 'ace', 'b', 'd']
'ab(c(((d)ef()g)h()(i)j)kl)()' -> ['ab', 'ckl', 'hj', 'efg', 'i', 'd']

Fewest bytes wins.

redstonerodent

Posted 2015-11-16T03:49:44.383

Reputation: 351

Are 'i' and 'd' in the correct order in the last test case? – PurkkaKoodari – 2015-11-16T08:24:50.427

@Pietu1998 i is less deeply nested than d. – feersum – 2015-11-16T08:31:01.580

@feersum Oh, right. – PurkkaKoodari – 2015-11-16T08:31:57.857

1

Would you mind allowing the other standard submission types as well, in particular full programs? Not all languages have a concept of functions. For default consensus see http://meta.codegolf.stackexchange.com/a/2422/8478 and http://meta.codegolf.stackexchange.com/questions/2447/default-for-code-golf-input-output-methods.

– Martin Ender – 2015-11-16T20:17:37.537

@MartinBüttner Sure. I'm pretty new to code golf; I didn't realize that was an issue. – redstonerodent – 2015-11-16T20:43:16.973

2@redstonerodent The wording I tend to use is "You may write a program or function, taking input via STDIN (or closest alternative), command-line argument or function argument and outputting the result via STDOUT (or closest alternative), function return value or function (out) parameter." and in your case "The output may be in any convenient, unambiguous, flat list format." – Martin Ender – 2015-11-16T20:46:27.810

Answers

11

JavaScript ES6, 91 93 104 133 148

Edit2 2 bytes saved thx user81655

Edit Using more strings and less arrays

Test running the snippet below in an EcmaScript 6 compliant browser

f=s=>[...s].map(c=>l+=c<')'||-(o[l]=(o[l]||'')+c,c<'a'),o=[],l=0)&&(o+'').match(/\w+/g)||[]

// Less golfed

u=s=>{
  o=[]; l=0;
  [...s].map(c=>{
    if (c>'(') // letters or close bracket
      o[l]=(o[l]||'')+c, // add letter or close bracket to current level string
      l-=c<'a' // if close bracket, decrement level
    else
      ++l // open bracket, increment level
  })
  o = o+'' // collapse array to comma separated string
  return o.match(/\w+/g)||[] // fetch non empty strings into an array
}

// TEST
console.log=x=>O.innerHTML+=x+'\n'

;[ 'a(b)c(d)e'                    // ['ace', 'b', 'd']
 , 'a(b(c)d)e'                    // ['ae', 'bd', 'c']
 , 'a((((b))))'                   // ['a', 'b']
 , 'a()b'                         // ['ab']
 , ''                             // []
 , 'a'                            // ['a']
 , '(((a(b)c(d)e)f)g)h'           // ['h', 'g', 'f', 'ace', 'b', 'd']
 , 'ab(c(((d)ef()g)h()(i)j)kl)()' // ['ab', 'ckl', 'hj', 'efg', 'i', 'd']
].forEach(t=>console.log(t +" -> " + f(t)))
<pre id=O></pre>

edc65

Posted 2015-11-16T03:49:44.383

Reputation: 31 086

Save 2 bytes with c=>l+=c<')'||-(o[l]=(o[l]||'')+c,c<'a'),. – user81655 – 2015-11-16T22:37:01.617

@user81655 nice, thanks – edc65 – 2015-11-16T22:42:42.127

8

Julia, 117 86 83 bytes

v->(while v!=(v=replace(v,r"(\(((?>\w|(?1))*)\))(.*)",s"\g<3> \g<2>"))end;split(v))

It's a regex solution.

Ungolfed:

function f(v)
  w=""
  while v!=w
    w=v
    v=replace(v,r"(\(((?>\w|(?1))*)\))(.*)",s"\g<3> \g<2>"))
  end
  split(v)
end

r"(\(((?>\w|(?1))*)\))(.*)" is a recursive ((?1) recurses group 1) regex that will match the first outermost balanced parentheses (that don't contain unbalanced/reversed parentheses), with the second group containing everything inside the parentheses (not including the parentheses themselves) and the third group containing everything after the parentheses (until the end of the string).

replace(v,r"...",s"\g<3> \g<2>") will then move the second group to the end of the string (after a space, to act as a delimiter), with the relevant parentheses removed. By iterating until v==w, it is ensured that the replace is repeated until there are no parentheses left. Because matches are moved to the end, and then the next match goes for the first parenthesis, the result will be the string broken down in order of depth.

Then split returns all of the non-whitespace components of the string in the form of an array of strings (that have no whitespace).

Note that w="" is used in the ungolfed code to make sure that the while loop runs at least once (except if the input string is empty, of course), and is not needed in the golfed form.

Thanks to Martin Büttner for assistance with saving 3 bytes.

Glen O

Posted 2015-11-16T03:49:44.383

Reputation: 2 548

Neat, I've arrived at the same solution independently in Retina. It's 44 bytes there, but as it stands full-program solutions are not allowed. :/ – Martin Ender – 2015-11-16T20:25:03.217

You can save three bytes by using \w instead of [^()]. – Martin Ender – 2015-11-16T21:38:27.033

@MartinBüttner - thanks. I'd actually considered that, but I was worried that I'd overlooked something and that it would fail on some edge case. If you say it's fine, though, then it's fine. – Glen O – 2015-11-17T02:22:51.373

6

Python, 147 bytes

def f(s):
 d=0;r=[['']for c in s]
 for c in s:
  if c=='(':d+=1;r[d]+=['']
  elif c==')':d-=1
  else:r[d][-1]+=c
 return[i for i in sum(r,[])if i]

Unit tests:

assert f('a(b)c(d)e') == ['ace', 'b', 'd']
assert f('a(b(c)d)e') == ['ae', 'bd', 'c']
assert f('a((((b))))') == ['a', 'b']
assert f('a()b') == ['ab']
assert f('') == []
assert f('a') == ['a']
assert f('(((a(b)c(d)e)f)g)h') == ['h', 'g', 'f', 'ace', 'b', 'd']
assert f('ab(c(((d)ef()g)h()(i)j)kl)()') == ['ab', 'ckl', 'hj', 'efg', 'i', 'd']

I like this puzzle; it's very cute!

Quuxplusone

Posted 2015-11-16T03:49:44.383

Reputation: 625

4

Pyth, 32 bytes

fTscR)uX0.<GJ-FqLH`()@[Hdk)Jzmkz

Test suite

Loosely based off of @Quuxplusone's approach. Builds up space separated lists of the characters at each depth, then splits them and filters out the empty groups. The working list is rotated to keep the current depth's list in front at all times.

isaacg

Posted 2015-11-16T03:49:44.383

Reputation: 39 268

4

Retina, 44 41 bytes

+`\(((\w|(\()|(?<-3>.))*).(.*)
$4 $1
S_` 

Run with the -s flag. Notice the space at the end of the last line.

I came up with this solution independently of Glen O but it turns out to be identical. The idea is to match the first pair of parentheses, remove it, and insert its contents at the end of the output (repeatedly). Due to .NET's lack of recursion in regex, I had to use balancing groups which is four bytes longer.

If you don't understand the first regex, let me refer you to my SO answer about balancing groups. Since the input is guaranteed to be correctly parentheses, we can save two bytes by matching ) with . instead of \). Then we simply match the rest of the string with (.*). $4 $1 first writes back said rest of the string (omitting both the parentheses and their content), and then the content of the parentheses after a space. The +` tells Retina to repeat this step until the string stops changing (which only happens once all parentheses have been removed).

Empty parentheses will result in two consecutive spaces, so finally we split the entire string on spaces (S` activates split mode and the regex is a single space). The _ option tells Retina to omit empty parts of the split, so we don't include the empty results in the output.

Martin Ender

Posted 2015-11-16T03:49:44.383

Reputation: 184 808

3

Common Lisp, 160

(lambda(x)(labels((g(l)(cons(#1=format()"~(~{~A~}~)"(#2=remove-if'listp l))(mapcan #'g(#2#'atom l)))))(remove""(g(read-from-string(#1#()"(~A)"x))):test'equal))))

This could be four bytes less if the case conversion wasn't necessary. The idea is to add left and right parenthesis to each side of the input string, treat it as a list, write the top level elements of the list to a string, and then process the sublists the same way.

Joshua Taylor

Posted 2015-11-16T03:49:44.383

Reputation: 660

2

Haskell, 114 112 111 bytes

')'%(h:i:t)=("":i):t++[h]
'('%l=last l:init l
c%((h:i):t)=((c:h):i):t
g x=[a|a<-id=<<foldr(%)(x>>[[""]])x,a>""]

Usage example: g "ab(c(((d)ef()g)h()(i)j)kl)()" -> ["ab","ckl","hj","efg","i","d"].

I'm going backward through the input string. The intermediate data structure is a list of list of strings. The outer list is per level and the inner list is per group within a level, e.g. [["ab"],["ckl"],["hj"],["efg","i"],["d"]] (note: the real list has a lot of empty strings in-between). It all starts with a number of empty strings equal to the length of the input - more than enough, but empty lists are filtered out anyway. The outer lists either rotates on (/) or adds the character to the front element. ) also starts a new group.

Edit: @Zgarb has found a byte to save.

nimi

Posted 2015-11-16T03:49:44.383

Reputation: 34 639

1

Sed, 90 bytes

:
s/^(\w*)\((.*)\n?(.*)/\1\n\3_\2/M
s/(\n\w*_)(\w*)\)(.*)/\3\1\2/M
t
s/[_\n]+/,/g
s/,$//

Uses extended regexes (-r flag), accounted for by +1 byte. Also, this uses a GNU Extension (the M flag on the s command).

Sample Usage:

$ echo 'ab(c(((d)ef()g)h()(i)j)kl)()' | sed -r -f deparenthesize.sed
ab,ckl,hj,efg,i,d

Explanation: Since sed does not support stuff like recursive regexes, manual work is required. The expression is split up into multiple lines, each representing a level of nesting depth. The individual expressions on the same depth (and hence on the same line) are separated by an _. The script works through the input string one bracket at a time. The remaining input is always kept on the end of the line that corresponds to the current nesting level.

matz

Posted 2015-11-16T03:49:44.383

Reputation: 141

0

Python, 161 bytes

Here's what I came up with, a one-line functional python solution:

p=lambda s:filter(None,sum([''.join([s[i]for i in range(len(s))if s[:i+1].count('(')-s[:i+1].count(')')==d and s[i]!=')']).split('(')for d in range(len(s))],[]))

This challenge was inspired by https://github.com/samcoppini/Definition-book, which outputs a long string with word defined in parentheticals. I wanted to write code that would give me each sentence, with parenthesis removed. The functional solution is too slow to be effective on long strings, but imperative solutions (like @Quuxplusone's solution) are much faster.

redstonerodent

Posted 2015-11-16T03:49:44.383

Reputation: 351