Fix unbalanced brackets

15

2

Consider the alphabet A ="()[]{}<>". Think of the characters in the alphabet as opening and closing brackets of 4 different types. Let's say that a string over the alphabet A is balanced if

  • it is empty -or-
  • it is not empty -and- the number of opening and closing brackets of each type match -and- each opening bracket has a corresponding closing bracket of the same type to the right of it -and- the substring between each pair of corresponding brackets is balanced.

For example, the strings "", "()", "<[]>", "{}{}[[]]" are balanced, but the strings "(", "{>", ")(", "<>>", "[(])" are not balanced.

Task: Given a (possibly unbalanced) input string S over the alphabet A, find all possible ways to insert minimal number of brackets to make the string balanced. More precisely,

Input: any string S over the alphabet A.
Output: all possible balanced strings of the shortest length over the same alphabet containing S as a subsequence.

Some examples (a string to the left of an arrow denotes an input, the list of strings to the right of the arrow denotes the corresponding output):

  • """"
  • "()[]""()[]"
  • "[""[]"
  • ">""<>"
  • "{}]""[{}]", "{}[]"
  • "<<""<><>", "<<>>"

Note that:

  • The output always contains at least one string.
  • The output contains only those types of brackets that occur in the input string.
  • All strings in the output are of the same length.
  • If an input string is balanced, then the output consists of the same single string.

Implementation requirements:

  • Your implementation may be in any programming language and should be as short as possible (in terms of character count or byte count - which one is less). You may represent the source in ASCII, Unicode or any other alphabet normally used in your language. If a program in your language has a shorter standard encoding as a sequence of bytes (like x86 assembly), you may choose to claim its length based on byte count, although you may submit the program in a hexadecimal form (e.g. 7E-FF-02-...) and/or a textual form using standard mnemonics. If browsers cannot represent characters that you used in the program, you may submit a graphical image of the source and/or a sequence of keystrokes that can be used to type it, but still may claim its length based on character count.
  • Any required whitespace or newline characters are included in the character count. If you want, you may submit both an "official", shortest version that is used to claim the length and a "non-official", human-readable version with additional spaces, newlines, formatting, comments, parentheses, meaningful identifier names, etc.
  • The implementation may be a complete program/script, a function (along with any helper functions or definitions, if needed), a functional expression (e.g. lambda, anonymous function, code block, several built-in functions combined with a function composition operator, etc) or any similar construct available in your language.
  • You may use any standard library functions that normally available in your language, but any required import, using, open or other similar constructs or fully qualified names must be included in the source unless your language imports them by default.
  • The program may not communicate with any Internet services to obtain the result, intermediate data, or to download some parts of the algorithm.
  • If your language supports some form of compressed expressions (like Uncompress["..."] in Mathematica) you may use them to make the program shorter.
  • If your language does not support strings or you do not want to use them, you may use character or integer arrays (or other collections) instead. If your language does not support integers (e.g. lambda-calculus), you may use a suitable encoding for them (like Church numerals).
  • For input and output you may use whatever mechanism you prefer: standard input/output streams, user prompt, console output, function parameters and return value, Turing machine tape, reading/writing memory blocks, etc.
  • You may assume that an input is well-formed (e.g. that it does not include any characters except ()[]{}<>) and do not need to validate it. The program behavior may be undefined if the input is not well-formed.
  • The output strings may be printed delimited by commas, semicolons, spaces, newlines or other delimiters normally used in your language, with or without surrounding quotation marks, or returned as an array, list, sequence, a single delimited string or any other suitable collection.
  • The output strings may be returned in any order, but must not contain duplicates.
  • The program execution time must be polynomial with respect to the length of its output.

Vladimir Reshetnikov

Posted 2013-06-22T00:31:48.517

Reputation: 2 293

Question was closed 2016-04-07T18:56:02.963

"The program execution time must be polynomial with respect to the length of its output." -- which means that my "try every position x char, then every two positions x chars ..." algorithm is provably too slow :-( – John Dvorak – 2013-06-22T05:48:13.767

1is it even possible in polynomial time? – John Dvorak – 2013-06-22T06:23:58.077

Jan Dvorak: I don't think so: for < there is only 1 solution, for << there are 2 solutions, for <<< there are 5 solutions, for <<<< there are already 14 solutions. This number increases exponential. – Johannes Kuhn – 2013-06-22T16:58:43.747

1@JohannesKuhn That is why it is required to be polynomial w.r.t output (not input) length. – Vladimir Reshetnikov – 2013-06-22T19:34:46.207

Answers

4

Tcl, 212 characters.

eval [zlib i [encoding convertt unicode 兕櫭ッﰌꞟ膸줙ꎠ?곚ၻ菇몐䞝Ӫ᷇ꌃ齷靤녂閉璓陷ョ䎅汚组䕿䎢犄ﲕⷑ稑䞙⠧赧용㽫硣ꖠᓋ蘊湏뢈鱰蒺䡎ྚ嬣᪢꼍ⴚ벎艚㶼큔㱅麓傧ᶘ쳖岠ꇻ퇇䯺蛼෪ᾜ❂᠗移ﺶ䱔엻똆诖ꙧꈦ屪蘦㝖ᝎﴝ䏐駺鞜䓲䣎娯෺顡瑼曾⛁⩤暪鬙쩘覥喼\ud88eႂ嬲꫺\udfdf쒚쾟⒆鱔壌杗殰킞茶嵊䴪ꥁ⽠濶켖덼㘕拘⛇끓ᛱ㐼\ud927碀꣟禝悕衫瀇辸幇鬫瞽\udf90䵸㿕]]

You count characters, so I use 1 character for 2 bytes. (Except the invalid characters, that I replaced now)
The zlib is pretty self explanatory. i is just the the abbreviation of inflate.
eval. Well. I don't have to explain that?

After decoding it becomes

proc v i {set ::m {};t $i
set r [list [split $i {}]]
lmap c [split $::m {}] {set c [dict get {\{ \} \} \{ {[} \] \] {[} ( ) ) ( < > > <} $c]
lmap a $r[set r {}] {for {set i 0} \$i<=[llength $a] incr\ i {lappend r [linsert $a $i $c]}}}
lsort -u [lmap a [lmap a $r {join $a {}}] {if {![t $a]} continue set\ a}]
}
proc t i {if {{}ne$i} {lmap r {{^(.*){(.*?)}(.*)$} {^(.*)\((.*?)\)(.*)$} {^(.*)<(.*?)>(.*)$} {^(.*)\[(.*?)\](.*)$}} {if {[regexp $r $i - 1 2 3]} {return [expr {[t $1$3]&[t $2]}]}};append ::m $i} {return 1};list 0}

Which will be passed to eval.
It defines 2 commands: v, the main program, and t which checks the input and place unmatched braces in the global variable m. v calls t with the input, insert the corresponding brace at all possible positions and use t again to filter the valid ones out.

Input: 1. Parameter for v
Output: A Tcl list with each possible valid permutation. They have all the same length.

Example:

% v <<
<<>> <><>

Johannes Kuhn

Posted 2013-06-22T00:31:48.517

Reputation: 7 122

Perhaps you should show the unencoded version with those invalid characters replaced (there's plenty of characters outside the ASCII range if you want to) as well as the encoded version – John Dvorak – 2013-06-22T05:36:54.653

can you please post the source code as it is after inflation? – John Dvorak – 2013-06-22T07:29:40.327

@JanDvorak Posted a decoded version. It still requires Tcl 8.6 because I use lmap not only as replacement for foreach. – Johannes Kuhn – 2013-06-22T12:13:48.297

Thanks. I was kinda hoping I would be able to read your code and extract its algorithm, but still nope :-) – John Dvorak – 2013-06-22T16:28:39.147

@JanDvorak It's simple: The regexp split the string into 3 parts (on a matching brace). The first and last part together (concat) are tested if it is a balanced, the middle part ist tested too. If no regexp match, then the input only contains unmatched characters, append them to m and return false. I put the proc t ungolfed to idone

– Johannes Kuhn – 2013-06-22T16:49:21.577

@JohannesKuhn Could you comment on "I use 1 character for 2 bytes"? Not every pair of bytes is a valid Unicode character. – Vladimir Reshetnikov – 2013-06-24T18:21:59.277

@VladimirReshetnikov Yeah, you are correct. Just a few more characters. This also solved my inability to post it. – Johannes Kuhn – 2013-06-24T19:16:46.303