Split a word into parts with equal scores

9

2

Assuming A=1, B=2... Z=26, and the value of a word is the sum of these letter values, it is possible to split some words into two pieces such that they have equal values.

For example, "wordsplit" can be broken into two pieces like so: ordsl wpit, because o+r+d+s+l = w+p+i+t.

This was a challenge given to us by my computing teacher - it is an old Lionhead Studios challenge, apparently. I have solved it in Python, and will post my answer shortly.

Challenge: The shortest program which can list all the possible splits which have equal scores. Note that it only has to list one for each group of letters - ordsl wpit is the same as rdosl wtip, for example. It is easier to list them in the order they come in the word.

Bonus:

  • If you highlight pairs where both words are valid English words (or some permutation of the letters is), using a word list of some kind. (This could be done by placing an asterisk next to each or some other method, but make it clear.)
  • Adding the option for removing duplicates (this should not be the default.)
  • Supporting more than two splits, for example, three, four or even n-way splits.

Thomas O

Posted 2011-02-03T19:28:34.923

Reputation: 3 044

Must the program support mixed case input? And if so can it discard the case for the output? – Nemo157 – 2011-02-03T19:54:42.077

@Nemo157 It may ignore case and does not have to preserve it on the output. – Thomas O – 2011-02-03T19:55:39.180

Can the program output extra stuff, as long as what the requested part of the output is is clear to a human? – J B – 2011-02-03T22:49:22.843

@J B Yes it can. – Thomas O – 2011-02-03T23:22:02.277

ok, I'll improve that Perl then ;) Thanks – J B – 2011-02-04T07:51:57.010

Answers

4

Perl, 115 118 123

@_=~/./g;for$i(1..1<<@_){$l=$
r;$i&1<<$_?$l:$r+=64-ord$_[$_
]for 0..$#_;$l-$r||$i&1<<$_&&
print$_[$_]for 0..$#_;say""}

Run with perl -nE '<code goes here>'. That 'n' is counted in the code size.

Respaced:

@_ = /./g;
for $i (1 .. 1<<@_) {
  $l = $r;
  $i & 1<<$_ ? $l : $r -= 64 - ord $_[$_] for 0 .. $#_;

  $l - $r      ||
  $i & 1<<$_   &&
  print $_[$_]
    for 0 .. $#_;

  say ""
}

With comments and variable names:

# split a line of input by character
@chars = /./g;

# generate all binary masks of same length
for $mask (1 .. 1<<@_) {

  # start at "zero"
  $left_sum = $right_sum;

  # depending on mask, choose left or right count
  # 1 -> char goes left; 0 -> char goes right
  $mask & 1<<$_ ? $left_sum : $right_sum
    -= 64 - ord $chars[$_]   # add letter value
      for 0 .. $#chars;      # for all bits in mask

  # if left = right
  $left_sum - $right_sum ||

  # if character was counted left (mask[i] = 1)
  $mask & 1<<$_          &&

  # print it
  print $chars[$_]

  # ...iterating on all bits in mask
    for 0 .. $#chars;

  # newline
  say ""
}

Some of the tricks used:

  • 1..1<<@_ covers the same bit range as 0..(1<<@_)-1 , but is shorter. (note that considering the problem from further away, including the range boundaries multiple times wouldn't result in a wrong output anyway)
  • $left_range and $right_range aren't reset to actual "0" numeric zero: since we just accumulate and compare them in the end, all we need is them to start at the same value.
  • subtracting 64-ord$_[$_] instead of adding ord$_[$_]-64 wins an invisible character: since it ends with a delimiter, it makes the space before for unnecessary.
  • Perl lets you assign to a variable determined by the ternary conditional operator: cond ? var1 : var2 = new_value.
  • boolean expressions chained with && and || are used instead of proper conditionals.
  • $l-$r is shorter than $l!=$r
  • will output a newline even on splits that don't balance. Empty lines are ok by the rules! I asked!

J B

Posted 2011-02-03T19:28:34.923

Reputation: 9 638

Care to explain for those of us who don't speak line noise? It looks like your using a binary mask approach similar to mine, and I see the 64 means '@' = 'A' - 1, and after that I'm pretty much lost. – dmckee --- ex-moderator kitten – 2011-02-04T01:18:17.933

Is this edit any better? – J B – 2011-02-04T07:55:12.030

Nice. I need to think about taking advantage of the the add each count to the left or the right sum thing. Should have been obvious, but I missed it. – dmckee --- ex-moderator kitten – 2011-02-04T17:36:50.700

3

J (109)

~.(/:{[)@:{&a.@(96&+)&.>>(>@(=/@:(+/"1&>)&.>)#[),}.@(split~&.>i.@#@>)@<@(96-~a.&i.)"1([{~(i.@!A.i.)@#)1!:1[1

Output for wordsplit:

┌─────┬─────┐
│lorw │dipst│
├─────┼─────┤
│diltw│oprs │
├─────┼─────┤
│iptw │dlors│
├─────┼─────┤
│dlors│iptw │
├─────┼─────┤
│oprs │diltw│
├─────┼─────┤
│dipst│lorw │
└─────┴─────┘

Explanation:

  • 1!:1[1: read a line from stdin
  • ([{~(i.@!A.i.)@#): get all permutations
  • "1: for each permutation:
  • (96-~a.&i.): get letter scores
  • }.@(split~&.>i.@#@>)@<: split each permutation of the scores at each possible space, except before the first and after the last number
  • >(>@(=/@:(+/"1&>)&.>)#[): see which permutations have matching halves and select these
  • {&a.@(96&+)&.>: turn the scores back into letters
  • ~.(/:{[): remove trivial variations (e.g. ordsl wpit and ordsl wpti)

marinus

Posted 2011-02-03T19:28:34.923

Reputation: 30 224

Some of your answers are duplicates. – DavidC – 2013-05-08T14:00:27.690

@DavidCarraher: Well, either I'm blind or this one isn't, nor are my recent answers. I've never copied other people's answers on purpose, though you could of course be right, I've posted here while drunk sometimes and not remembered until I got loads of downvotes and it turned out I'd submitted something that wasn't even close to correct. If you've seen me misbehave, maybe leave the comment where I'm misbehaving, then I will delete any offending answers; or maybe downvote them as that's what downvotes are for. – marinus – 2013-05-08T14:45:04.453

No slight was intended. I simply meant, for example, that your first answer, {"lorw", "dipst"} is a duplicate of your final answer, {"dipst","lorw"}. Only the order of the words is different. – DavidC – 2013-05-08T14:57:10.213

@DavidCarraher: whoops :P I thought you meant I'd copied someone's answer... anyway this question says (if I've interpreted it right) to remove the duplicates where the individual parts are just permutations of each other, but not to remove the ones where the order of the parts is different, i.e if {a,bc} is already found, to remove {a,cb} but not to remove {bc,a}. (And of course I'm not offended, if I actually /had/ duplicated someone's answer I'd prefer it if someone pointed it out.) – marinus – 2013-05-08T15:17:38.947

You seem to be right. The instructions say that it's ok to ignore order of words ("Note that it only has to list one for each group of letters"), but they do not require it. This may save me a few characters. Thanks. – DavidC – 2013-05-08T15:31:14.133

2

c99 -- 379 necessary characters

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int s(char*w,int l,int m){int b,t=0;for(b=0;b<l;++b){t+=(m&1<<b)?toupper(w[b])-64:0;}return t;}
void p(char*w,int l,int m){for(int b=0;b<l;++b){putchar((m&1<<b)?w[b]:32);}}
int main(){char w[99];gets(w);int i,l=strlen(w),m=(1<<l),t=s(w,l,m-1);
for(i=0;i<m;i++){if(s(w,l,i)==t/2){p(w,l,i);putchar(9);p(w,l,~i);putchar(10);}}}

The approach is pretty obvious. There is a function which sums a words according to a mask and one that prints it also according to a mask. Input from the standard input. One oddity is that the printing routine inserts spaces for letter not in the mask. A tab is used to separate the groups.

I do none of the bonus items, nor is it easily converted to support them.

Readable and commented:

#include <stdio.h>
#include <string.h>
#include <ctype.h>
int s(char *w, int l, int m){ /* word, length, mask */
  int b,t=0;                  /* bit and total */
  for (b=0; b<l; ++b){        
/*     printf("Summing %d %d %c %d\n",b,m&(1<<b),w[b],toupper(w[b])-'A'-1); */
    t+=(m&1<<b)?toupper(w[b])-64:0; /* Add to toal if masked (A-1 = @ = 64) */
  }
  return t;
}
void p(char *w, int l, int m){
  for (int b=0; b<l; ++b){ 
    putchar((m&1<<b)?w[b]:32);  /* print if masked (space = 32) */
  }
}
int main(){
  char w[99];
  gets(w);
  int i,l=strlen(w),m=(1<<l),t=s(w,l,m-1);
/*   printf("Word is '%s'\n",w); */
/*   printf("...length %d\n",l); */
/*   printf("...mask   0x%x\n",m-1); */
/*   printf("...total  %d\n",t); */
  for (i=0; i<m; i++){
/*     printf("testing with mask 0x%x...\n",i); */
    if (s(w,l,i)==t/2) {p(w,l,i); putchar(9); p(w,l,~i); putchar(10);}
    /* (tab = 9; newline = 10) */
  }
}

Validation

 $ wc wordsplit_golf.c
  7  24 385 wordsplit_golf.c
 $ gcc -std=c99 wordsplit_golf.c
 $ echo wordsplit | ./a.out
warning: this program uses gets(), which is unsafe.
 or sp          w  d  lit
wor   l            dsp it
 ords l         w    p it
w    p it        ords l  
   dsp it       wor   l  
w  d  lit        or sp   

dmckee --- ex-moderator kitten

Posted 2011-02-03T19:28:34.923

Reputation: 2 726

1

Ruby: 125 characters

r=->a{a.reduce(0){|t,c|t+=c.ord-96}}
f=r[w=gets.chomp.chars]
w.size.times{|n|w.combination(n).map{|s|p([s,w-s])if r[s]*2==f}}

Sample run:

bash-4.2$ ruby -e 'r=->a{a.reduce(0){|t,c|t+=c.ord-96}};f=r[w=gets.chomp.chars.to_a];w.size.times{|p|w.combination(p).map{|q|p([q,w-q])if r[q]*2==f}}' <<< 'wordsplit'
[["w", "o", "r", "l"], ["d", "s", "p", "i", "t"]]
[["w", "p", "i", "t"], ["o", "r", "d", "s", "l"]]
[["o", "r", "s", "p"], ["w", "d", "l", "i", "t"]]
[["w", "d", "l", "i", "t"], ["o", "r", "s", "p"]]
[["o", "r", "d", "s", "l"], ["w", "p", "i", "t"]]
[["d", "s", "p", "i", "t"], ["w", "o", "r", "l"]]

manatwork

Posted 2011-02-03T19:28:34.923

Reputation: 17 865

1

Mathematica 123 111

Finds all subsets of word that have 1/2 the "ascii total" of the word, d. Then finds the complements of those subsets.

d = "WORDSPLIT"

{#, Complement[w, #]}&/@Cases[Subsets@#,x_/;Tr@x==Tr@#/2]&[Sort[ToCharacterCode@d - 64]];
FromCharacterCode[# + 64] & /@ %

{{"IPTW", "DLORS"}, {"LORW", "DIPST"}, {"OPRS", "DILTW"}, {"DILTW", "OPRS"}, {"DIPST", "LORW"}, {"DLORS", "IPTW"}}

DavidC

Posted 2011-02-03T19:28:34.923

Reputation: 24 524

1

J, 66 chars

Using digits of base2 numbers to select every possible subset.

   f=.3 :'(;~y&-.)"{y#~a#~(=|.)+/"1((+32*0&>)96-~a.i.y)#~a=.#:i.2^#y'
   f 'WordSplit'
┌─────┬─────┐
│Worl │dSpit│
├─────┼─────┤
│Wdlit│orSp │
├─────┼─────┤
│Wpit │ordSl│
├─────┼─────┤
│ordSl│Wpit │
├─────┼─────┤
│orSp │Wdlit│
├─────┼─────┤
│dSpit│Worl │
└─────┴─────┘

randomra

Posted 2011-02-03T19:28:34.923

Reputation: 19 909

0

Lua - 195

a=io.read"*l"for i=0,2^#a/2-1 do z,l=0,""r=l for j=1,#a do b=math.floor(i/2^j*2)%2 z=(b*2-1)*(a:byte(j)-64)+z if b>0 then r=r..a:sub(j,j)else l=l..a:sub(j,j)end end if z==0 then print(l,r)end end

input must be in caps:

~$ lua wordsplit.lua 
>WORDSPLIT
WDLIT   ORSP
DSPIT   WORL
WPIT    ORDSL

mniip

Posted 2011-02-03T19:28:34.923

Reputation: 9 396

0

Python - 127

w=rawinput()
for q in range(2**len(w)/2):
 a=[0]*2;b=['']*2
 for c in w:a[q%2]+=ord(c)-96;b[q%2]+=c;q/=2
 if a[0]==a[1]:print b

and here a n-split Version with 182 bytes without duplicates:

n,w=input()
def b(q):
 a=[0]*n;b=['']*n
 for c in w:a[q%n]+=ord(c)-96;b[q%n]+=c;q/=n
 return a[0]==a[1] and all(b) and frozenset(b)
print set(filter(None,map(b,range(n**len(w)/n))))

Input is e.g.:

3, 'wordsplit'

Daniel

Posted 2011-02-03T19:28:34.923

Reputation: 1 801

0

My solution is below. It is an almost anti-golf in its size, but it works very well. It supports n-way splits (although computational time becomes very long for any more than about 3 splits) and it supports removing duplicates.

class WordSplitChecker(object):
    def __init__(self, word, splits=2):
        if len(word) == 0:
            raise ValueError, "word too short!"
        if splits == 0:
            raise ValueError, "splits must be > 1; it is impossible to split a word into zero groups"
        self.word = word
        self.splits = splits

    def solve(self, uniq_solutions=False, progress_notifier=True):
        """To solve this problem, we first need to consider all the possible
        rearrangements of a string into two (or more) groups.

        It turns out that this reduces simply to a base-N counting algorithm,
        each digit coding for which group the letter goes into. Obviously
        the longer the word the more digits needed to count up to, so 
        computation time is very long for larger bases and longer words. It 
        could be sped up by using a precalculated array of numbers in the
        required base, but this requires more memory. (Space-time tradeoff.)

        A progress notifier may be set. If True, the default notifier is used,
        if None, no notifier is used, and if it points to another callable,
        that is used. The callable must take the arguments as (n, count, 
        solutions) where n is the number of iterations, count is the total 
        iteration count and solutions is the length of the solutions list. The
        progress notifier is called at the beginning, on every 1000th iteration, 
        and at the end.

        Returns a list of possible splits. If there are no solutions, returns
        an empty list. Duplicate solutions are removed if the uniq_solutions
        parameter is True."""
        if progress_notifier == True:
           progress_notifier = self.progress 
        solutions = []
        bucket = [0] * len(self.word)
        base_tuple = (self.splits,) * len(self.word)
        # The number of counts we need to do is given by: S^N,
        # where S = number of splits,
        #       N = length of word.
        counts = pow(self.splits, len(self.word))
        # xrange does not create a list in memory, so this will work with very
        # little additional memory.
        for i in xrange(counts):
            groups = self.split_word(self.word, self.splits, bucket)
            group_sums = map(self.score_string, groups)
            if len(set(group_sums)) == 1:
                solutions.append(tuple(groups))
            if callable(progress_notifier) and i % 1000 == 0:
                progress_notifier(i, counts, len(solutions))
            # Increment bucket after doing each group; we want to include the
            # null set (all zeroes.)
            bucket = self.bucket_counter(bucket, base_tuple)
        progress_notifier(i, counts, len(solutions))
        # Now we have computed our results we need to remove the results that
        # are symmetrical if uniq_solutions is True.
        if uniq_solutions:
            uniques = []
            # Sort each of the solutions and turn them into tuples.  Then we can 
            # remove duplicates because they will all be in the same order.
            for sol in solutions:
                uniques.append(tuple(sorted(sol)))
            # Use sets to unique the solutions quickly instead of using our
            # own algorithm.
            uniques = list(set(uniques))
            return sorted(uniques)
        return sorted(solutions)

    def split_word(self, word, splits, bucket):
        """Split the word into groups. The digits in the bucket code for the
        groups in which each character goes in to. For example,

        LIONHEAD with a base of 2 and bucket of 00110100 gives two groups, 
        "LIHAD" and "ONE"."""
        groups = [""] * splits
        for n in range(len(word)):
            groups[bucket[n]] += word[n]
        return groups

    def score_string(self, st):
        """Score and sum the letters in the string, A = 1, B = 2, ... Z = 26."""
        return sum(map(lambda x: ord(x) - 64, st.upper()))

    def bucket_counter(self, bucket, carry):
        """Simple bucket counting. Ex.: When passed a tuple (512, 512, 512)
        and a list [0, 0, 0] it increments each column in the list until
        it overflows, carrying the result over to the next column. This could
        be done with fancy bit shifting, but that wouldn't work with very
        large numbers. This should be fine up to huge numbers. Returns a new
        bucket and assigns the result to the passed list. Similar to most
        counting systems the MSB is on the right, however this is an 
        implementation detail and may change in the future.

        Effectively, for a carry tuple of identical values, this implements a 
        base-N numeral system, where N+1 is the value in the tuple."""
        if len(bucket) != len(carry):
            raise ValueError("bucket and carry lists must be the same size")
        # Increase the last column.
        bucket[-1] += 1 
        # Carry numbers. Carry must be propagated by at least the size of the
        # carry list.
        for i in range(len(carry)):
            for coln, col in enumerate(bucket[:]):
                if col >= carry[coln]:
                    # Reset this column, carry the result over to the next.
                    bucket[coln] = 0
                    bucket[coln - 1] += 1
        return bucket

    def progress(self, n, counts, solutions):
        """Display the progress of the solve operation."""
        print "%d / %d (%.2f%%): %d solutions (non-unique)" % (n + 1, counts, (float(n + 1) / counts) * 100, solutions) 

if __name__ == '__main__':
    word = raw_input('Enter word: ')
    groups = int(raw_input('Enter number of required groups: '))
    unique = raw_input('Unique results only? (enter Y or N): ').upper()
    if unique == 'Y':
        unique = True
    else:
        unique = False
    # Start solving.
    print "Start solving"
    ws = WordSplitChecker(word, groups)
    solutions = ws.solve(unique)
    if len(solutions) == 0:
        print "No solutions could be found."
    for solution in solutions:
        for group in solution:
            print group,
        print

Sample output:

Enter word: wordsplit
Enter number of required groups: 2
Unique results only? (enter Y or N): y
Start solving
1 / 512 (0.20%): 0 solutions (non-unique)
512 / 512 (100.00%): 6 solutions (non-unique)
dspit worl
ordsl wpit
orsp wdlit

Thomas O

Posted 2011-02-03T19:28:34.923

Reputation: 3 044

1In my opinion answers that admittedly make zero attempt at meeting the goal of the original question (code brevity) are effectively off-topic. You admit that this answer has no relation to code-golf so rather than post it as an answer I'd suggest posting it somewhere else and putting a link to it in a comment on the question. – Jeff Swensen – 2011-02-04T15:41:27.147

2@Sugerman: It's a reference implementation, not an attempt to win the game. I prefer reference implementations as answers rather than taking up space at the top of the page, and I prefer them on-site to remove the risk of link rot. – dmckee --- ex-moderator kitten – 2011-02-04T17:42:28.917

@Sugerman: Why not submit it? It's better than nothing. It could be minimised, but why bother - I can't really accept my own question (well, I can, but it's not in the spirit of it.) – Thomas O – 2011-02-04T17:46:46.150

Because at the base of it, this is a Question & Answer site. Something that is not intended as an answer should not be posted as such. – Jeff Swensen – 2011-02-04T17:50:29.623

@Sugerman, How else would you suggest I share a working demo of it? Besides, I could shorten this. I just made no attempt at doing so because I couldn't win my own competition. – Thomas O – 2011-02-04T17:53:23.177

1As I said in my first comment, I would've linked to it in a comment on the Question or since you own the Question edit the link in there. There also nothing wrong with submitting an Answer to your own Question as long as you don't automatically accept your own Answer without considering all others (and voting results). – Jeff Swensen – 2011-02-04T18:11:01.843