Huffman golfing

10

2

Write a filter that converts text from standard input to a representation of its Huffman tree on standard output. In as few characters as possible.

  • you're free to consider newlines or not
  • output format is free, as long as it's some human-readable encoding of a tree. s-exps, ascii-art and .png, all good. RPN, .dot, I'll smile or frown, but that ought to be ok. Program cores and/or raw pointers , no way.
  • no canonicalization needed; if the characters are at the right depth, that's good enough.

Sample possible output for "this is an example of a huffman tree" (from wikipedia):

(((e . (n . (o . u))) . (a . (t . m))) .
 (((i . (x . p)) . (h . s)) . (((r . l) . f) . #\space)))

I can't reproduce all possible valid outputs with human-readable representation combinations in here (the margin is a bit too thin), but what should check is which characters end up at which number of bits:

  • 3 bits: a e [space]
  • 4 bits: f h i m n s t
  • 5 bits: l o p r u x

If [newline] is kept in, it appears appears in the "5 bits" level, but (any) two characters have to drop from there to 6 bits.

J B

Posted 2011-02-17T18:32:40.333

Reputation: 9 638

Question was closed 2015-10-30T14:53:53.410

Why finishing that quick? There're only 2 answers so far. – FUZxxl – 2011-02-18T08:49:41.547

No, didn't know that. Is this written up somewhere? – J B – 2011-02-18T09:29:05.933

I tried it out, but strange enough I could change the accepted answer, even if accepted for half a year. They possibly changed the rules. – FUZxxl – 2011-02-18T09:52:44.753

@George Edison: Is this new? I remember this was different before. – FUZxxl – 2011-02-20T15:30:39.293

Answers

2

GolfScript, 54 34 characters

:)1/.&.,({{),)@-,-}$(\(@[[\]]+}*~`

The script takes input from STDIN and prints the tree in the following form:

[[["a" "e"] [["t" "h"] ["i" "s"]]] [[["n" "m"] [["x" "p"] ["l" "o"]]] [[["u" "r"] "f"] " "]]]

You may try the code online.

Edit: In contrast to the longer version the character counts are not saved inside the tree but recalculated each time we need them.

Previous version with comments:

1/           # split text into chars    
..&          # create string with unique chars

\`{          # for each char
  {1$=},       # filter the original string for this char
  ,            # count number of occurences
  ]            # build data entry [char count]
}+%          # end of for-each loop

.,           # count number of distinct chars
({           # decrement and loop that many times
  {1=}$        # sort list by count field
  (\(@         # take first two elements of sorted list
  +~           # flatten to stack
  [[           # start new entry
    @+           # add count values
    @@[\]        # join nodes into a new tree
    \            # swap count and tree
  ]]+          # close entry and add back to list
}*           # end of for-loop

~~           # flatten array
;            # discard count
`            # transform to readable

Howard

Posted 2011-02-17T18:32:40.333

Reputation: 23 109

6

Ruby 1.9 - 160 138 113

f=Hash.new(0);$<.chars{|c|f[c]+=1}
f.all?{a,b,*f=f.sort_by(&:last);*a,i=a;*b,j=b;f<<[a,b,i+j];f[1]}
p f[0][0..-2]

Ungolfed:

f=STDIN.chars.inject(Hash.new(0)){ |f,c|
        f[c]+=1
        f
}
f=f.to_a.map &:reverse

while f.size > 1
        a, b, *f = f.sort_by &:first
        f = [[a[0]+b[0], a, b], *f]
end

p=->a{ a[1].is_a?(String) ? a[1] : [p[a[1]],p[a[2]]] }
p p[f[0]]

Output for this is an example of a huffman tree:

[[[[["l", "p"], ["r", "u"]], ["s", ["o", "x"]]], [["i", "n"], "e"]], [["a", ["h", "t"]], [["m", "f"], " "]]]

Arnaud Le Blanc

Posted 2011-02-17T18:32:40.333

Reputation: 2 286

3

Perl, 148

With the p command-line option (counted in):

$b{$&}++while/./g;for(@_=map[$b{$_},$_],keys%b;$#
_;@_=([$_[0][0]+$_[1][0],"($_[0][1]$_[1][1])"],@_
[2..$#_])){@_=sort{$$a[0]<=>$$b[0]}@_}$_=$_[0][1]

Sample use:

$ <<< 'this is an example of a huffman tree' perl -pe '$b{$&}++while/./g;for(@_=map[$b{$_},$_],keys%b;$#_;@_=([$_[0][0]+$_[1][0],"($_[0][1]$_[1][1])"],@_[2..$#_])){@_=sort{$$a[0]<=>$$b[0]}@_}$_=$_[0][1]'
(((ae)(((rx)h)((po)(ul))))(((nm)(ti))((sf) )))

Ungolfed:

# count characters
$b{$&}++ while /./g;

for(
    # init: convert hash to array of [freq,tree] pairs
    @_ = map [$b{$_},$_], keys %b;
    # as long as there are more than one elements left
    $#_;
    # merge the two leftmost nodes
    @_ = ( [ $_[0][0]+$_[1][0], "($_[0][1]$_[1][1])" ], @_[2..$#_] )
)
{
    # keep array in ascending order at all times
    @_=sort{$$a[0]<=>$$b[0]}@_
}

# set up for print
$_=$_[0][1]

J B

Posted 2011-02-17T18:32:40.333

Reputation: 9 638

2

Haskell (233 228 223)

Not completely shure whether it is valid.

import List
data T=Char:=Int|N T T Int
s(x:=_)=show x
s(N a b _)='(':s a++s b++")"
g(_:=a)=a
g(N _ _ a)=a
i[a]=a
i(a:b:x)=t$N a b(g a+g b):x
t=i.sortBy((.g).(compare.g))
main=interact$s.t.map(\x->x!!0:=length x).group.sort

Outputs:

  • The quick, brown fox jumps over the lazy dog.
((((('k''l')('i''j'))(('p''q')('m''n')))((('a''b')('.''T'))(('f''g')('c''d'))))(((('z'('x''y'))'e')((('v''w')('s''t'))'o'))((('r''u')(('\n'',')'h'))' ')))
  • this is an example of a huffman tree
((('a''e')(('h''i')(('o''p')('\n''l'))))((('s''t')('m''n'))((('x'('r''u'))'f')' ')))

Output format is like in the question, but whithout whitespace and whithout , between the two arms. I assume it's human readable.

FUZxxl

Posted 2011-02-17T18:32:40.333

Reputation: 9 656

It's perfectly human-readable. Assuming I'm human. – J B – 2011-02-18T08:45:16.943

http://codepad.org/mjME3TGV - Program error: pattern match failure: i [] – Nathan Osman – 2011-02-18T21:45:06.837

@George Edison: you need to provide an input string. Or alternately: well, that [] is a perfectly sensible representation of the empty tree. O:-) – J B – 2011-02-19T00:41:32.773

Adding support for the empty tree would make the whole beast much more complicated. – FUZxxl – 2011-02-20T15:30:11.960

1

Python 3.1.2, 132 chars

i=input();n=[(i.count(c),c)for c in set(i)]
while n[1:]:n.sort(key=lambda x:x[0]);(a,b),(c,d),*e=n;n=e+[(a+c,(b,d))]
print(n[0][1])

It can't handle the empty string.

Zachary Vance

Posted 2011-02-17T18:32:40.333

Reputation: 131