Decompose string into blocks

7

Your program's input is a string containing whitespaces, parentheses, and other characters. The string is assumed to be parenthesed correctly, i.e. each right parenthesis matches a unique left parenthesis and vice versa : so the program is allowed to do anything on incorrectly parenthesed strings, such as )abc, (abc or )abc(. The character interval between a left parenthesis and the corresponding right parenthesis (including the parentheses) is called a parenthesed interval. Your program's job is to decompose the string into blocks according to the rules below, and output the result as a "vertical list", with one block per line.

Now, two characters in the string are in the same block if either

(a) there is no whitespace character strictly between them

or

(b) there is a parenthesed interval containing them both.

Also,

(c) Whitespaces not in any parenthesed interval are mere separators and to be discarded in the decomposition. All the other characters (including whitespaces) are retained.

For example, if the input is

I guess     () if(you say ( so)), I'll have((  ))to pack my (((things))) and go

The output should be

I
guess
()
if(you say ( so)),
I'll
have((  ))to
pack
my
(((things)))
and
go

Another useful test (thanks to NinjaBearMonkey) :

Hit the ( Road (Jack Don't) come ) back (no (more ) ) no more

The output should be

Hit
the
( Road (Jack Don't) come )
back
(no (more ) )
no 
more

The shortest code in bytes wins.

Ewan Delanoy

Posted 2016-07-09T14:31:31.193

Reputation: 995

You might want to have something like ((foo) bar) in the test case where all parentheses aren't closed at once. – NinjaBearMonkey – 2016-07-09T15:37:42.110

1@NinjaBearMonkey if(you say ( so)), – Neil – 2016-07-09T15:42:00.877

1A test case including something like a( b )c( d )e would also be good. – Martin Ender – 2016-07-09T19:22:15.160

1Can the input start or end with spaces? – Martin Ender – 2016-07-09T19:27:56.830

Answers

3

MATL, 22 bytes

j0G40=G41=-YsG32=*g(Yb

Try it online!

Explanation

This uses a builtin function for splitting a string at spaces. To avoid splitting at spaces between matching parentheses, these are first detected and replaced by character 0. This is not detected as a space by the string-splitting function, but is treated as a space by the implicit display function.

j        % Take input as a string. 
0        % Push 0
G40=     % True for occurrences of '(' 
G41=     % True for occurrences of ')' 
-Ys      % Subtract and cumulative sum. Positive values correspond to characters
         % between matching parentheses
G32=     % True for occurrences of space
*g       % Multiply, convert to logical. Gives true for spaces between matching
         % parentheses
(        % Assign 0 to those positions. This replaces spaces between matching
         % parentheses by character 0
Yb       % Split at spaces, treating consecutive spaces as a single space.
         % Character 0 does not cause splitting, but is displayed as a space

Luis Mendo

Posted 2016-07-09T14:31:31.193

Reputation: 87 464

@Ewan Thanks for the accept! However, although you can accept another answer at any time (if a shorter answer shows up), it's common to leave the challenge unaccepted for a few days, to encourage people to answer – Luis Mendo – 2016-07-10T07:25:29.780

8

Retina, 31 bytes

S-_` (?=([^)]|(\()|(?<-2>.))+$)

Try it online!

This is a standard exercise in balancing groups. We simply match all spaces from which we can reach the end of the string while passing only correctly matched parentheses (since the ones we don't want to remove are followed by an unmatched closing parenthesis), and split the input around those matches. The _ option removes empty segments in case there are several spaces in a row and the - suppresses the default behaviour to include captured substrings in the result.

Martin Ender

Posted 2016-07-09T14:31:31.193

Reputation: 184 808

7

Retina, 30 bytes

!`(\(()|(?<-2>)\)|(?(2).|\S))+

1 byte shorter...

Try it here.

jimmy23013

Posted 2016-07-09T14:31:31.193

Reputation: 34 042

3

JavaScript (ES6), 61 bytes

s=>s.replace(/[()]| +/g,c=>(c<`(`?n:c<`)`?++n:n--)?c:`
`,n=0)

Works by matching all parentheses and runs of spaces, maintaining a count of the nesting depth and replacing the run with a newline if the nesting depth is zero.

Neil

Posted 2016-07-09T14:31:31.193

Reputation: 95 035

2

APL, 55 53 bytes

{⍵⊂⍨((⍵≠' ')∨~b)×+\1,¯1↓(b←0=+\1 ¯1 0['()'⍳⍵])∧⍵=' '}

To be run in ⎕ml←3. Usual trick: cumulative scans of 1s and -1s to identify the areas enclosed by parentheses. Where the cumulative sum is 0, then we are outside parentheses. Boolean and the vector with the boolean vector with 1s where there are spaces. Rotate by 1 to the right inserting a 1 on the left. Build another running sum where each change identifies a segment. We're almost done, but we still need to remove the spaces at level 0. That's what the left side of the multiplication does: it zeroes out the level 0 spaces in the running sum. The dyadic enclose ⊂⍨ splits every time the running sum changes while throwing away the zeroes.

lstefano

Posted 2016-07-09T14:31:31.193

Reputation: 850

Umm... I don't see a second usage of s in your expression, so you should be able to save two bytes by removing the s←. – Zacharý – 2016-12-04T00:45:11.827

Indeed! Well spotted! – lstefano – 2016-12-05T12:06:47.873

I assume you brute-forced the year game, and just got lucky that Dan missed something? – Zacharý – 2017-07-31T22:11:20.640

My guess is that all those who scored less than 750 in the Year Game 2016 brute forced it. In fact, I know all the Italian ones did, because I know them personally. So why did we get different results? I'll leave the question without an answer. I am sure you noticed the little commute up there. – lstefano – 2017-08-02T06:13:51.900

I've heard from Adám what happened there ... FINNS! – Zacharý – 2017-08-02T23:28:12.793

Also, I regret making so many comments on your answers. I got spammed with 13, about 10 from you. Taste of my own medicine. And to how I need to "watch my words", my method of golfing other people's code is to say the least. I look completely at little syntax things, which sometimes causes the errors in my golfing suggestions that you see. And when I see one mistake that happens more than once, I go on a golfing frenzy like that. Also, I was hopped up on that 3rd place win. – Zacharý – 2017-08-02T23:34:32.253

(Clarification to my "FINNS!" comment, the Finnish "winners" had code that resulted in a character vector.) – Zacharý – 2017-08-02T23:36:52.677

Don't sweat it... yes... it was a bit disappointing to see that the winners had found a loophole in the rules of the context. We scratched our heads about how they managed to save more characters when we thought our brute force search should have covered all the search tree, until we saw the winning entries. We tried to complain to Fiona but she said the rules did not explicitly say that the result had to be numeric, only that it had too look like a number. Ah well... we all learned a lesson: Fiona is more careful with her wording now and we know we cannot take anything for granted. :-) – lstefano – 2017-08-03T08:47:35.233

1

Perl, 80 bytes (79 + -n flag)

$r=qr/[^\(\)]*(\((??{$r})\))?[^\(\)]*/;/((([^ \(]*(\($r\))?)+ *)(?{say$^N}))+/

Run with :

perl -M5.010 -ne '$r=qr/[^\(\)]*(\((??{$r})\))?[^\(\)]*/;/((([^ \(]*(\($r\))?)+ *)(?{say$^N}))+/'

It might not be easy to read, so here are the explanations :

# The first regex ($r) is meant to match anything between two matching parenthesis
$r = qr/ [^\(\)]*         # match anything but parenthesis
         (\(               # match an opening parenthesis
           (??{$r})        # reccursive call to the same regex
          \)               # match a closing parenthesis
         )?               # the 3 previous lines are optionnal
         [^\(\)]*         # match anything but parenthesis
       /x;       

# The second regex kinda split the input according to the rules
/(
    (
     ([^ \(]*           # match anything but spaces and parenthesis
      (\($r\))?          # optionnal : match an opening parenthesis, then let the 
                         #   first regex match anything until the matching closing parenthesis.
     )+                 # repeat (no spaces out of parenthesis has been found yet)
     \ *               # match the space(s) that delimite the outputed lines.
    )                  
    (?{say$^N})       # execute the code 'say $^N' which prints the content 
                      #   of the last closed group (which is the line above)
 )+                 # repeat as many times as possible
/x 

Dada

Posted 2016-07-09T14:31:31.193

Reputation: 8 279

1

C (32-bit), 115 113 bytes

s,*P;main(q,c){char*p=P=1[P=c],*o=p;for(;c=*p++;c^=32,s+=c^9?c==8:-1,*o=s|c?c^32:10,o+=s|c||q,q=c);*o=0;puts(P);}

Compile as 32-bit C, and pass input string as command-line argument. E.g.:

~/golf/ λ gcc -m32 -w golf.c && ./a.out "Hit the ( Road (Jack Don't) come ) back (no (more ) ) no more"
Hit
the
( Road (Jack Don't) come )
back
(no (more ) )
no
more

orlp

Posted 2016-07-09T14:31:31.193

Reputation: 37 067