Fully-justify & hyphenate a block of text

26

3

Given  a width  and  a block  of
text containing possible hyphen-
ation points,  format it  fully-
justified (in monospace).

Fully justified means it is aligned on the left and the right, and is achieved by increasing the spacing between words until each line fits.

Related:

Input

You can take input in any format you like. You will be given:

  • A target width (in characters), in the range 5-100 (inclusive);
  • A block of text containing possibly hyphenated words. This could be a space-separated string, an array of words, or an array of arrays of word fragments (or any other data representation you desire).

A typical input might be:

Width: 25
Text:  There's no bu-si-ne-ss lik-e s-h-o-w busine-ss, n-o bus-iness I know.

Where the hyphens denote possible hyphenation points, and the spaces denote word boundaries. A possible alternative representation of the text:

[["There's"], ["no"], ["bu", "si", "ne", "ss"], ["lik", "e"], (etc.)]

Output

The input text with spaces added between words, newlines at the column width, and hyphenation points chosen to fully-justify it to the column width. For functions, an array of strings (one for each line) can be returned instead of using newline separation.

A possible output for the above input might be:

There's no  business like
show  business,  no  bus-
iness I know.

Note that all hyphens have been removed except the one in the final "bus-iness", which is kept to show that the word wraps to the next line, and was chosen to ensure the second line contains as much text as possible.

Rules

  • Within each line, the number of spaces between words cannot vary by more than 1, but where you insert the extra spaces is otherwise up to you:

    hello hi foo     bar    <-- not permitted (1,1,5)
    hello  hi foo    bar    <-- not permitted (2,1,4)
    hello  hi  foo   bar    <-- OK (2,2,3)
    hello  hi   foo  bar    <-- OK (2,3,2)
    hello   hi  foo  bar    <-- OK (3,2,2)
    
  • No line can begin or end with spaces (except the last line, which can end with spaces).

  • The last line should be left justified, containing single spaces between each word. It can be followed by arbitrary whitespace / a newline if desired, but this is not required.

  • Words will consist of A-Z, a-z, 0-9 and simple punctuation (.,'()&)

  • You can assume that no word fragment will be longer than the target width, and it will always be possible to fill lines according to the rules (i.e. there will be at least 2 word fragments on each line, or 1 word fragment which fills the line perfectly)

  • You must choose hyphenation points which maximise the number of word characters on earlier lines (i.e. words must be consumed greedily by lines), for example:

    This is an input stri-ng with hyph-en-at-ion poi-nts.
    
    This     is     an     input    stri-      <-- not permitted
    ng with hyphenation points.
    
    This  is an  input string  with hyph-      <-- not permitted
    enation points.
    
    This is an input  string with hyphen-      <-- OK
    ation points.
    
  • Shortest code in bytes wins

Examples

Width: 20
Text:  The q-uick brown fox ju-mp-s ove-r t-h-e lazy dog.

The quick  brown fox
jumps over the  lazy
dog.

Width: 32
Text: Given a width and a block of text cont-ain-ing pos-sible hyphen-ation points, for-mat it ful-ly-just-ified (in mono-space).

Given  a width  and  a block  of
text containing possible hyphen-
ation points,  format it  fully-
justified (in monospace).

Width: 80
Text:  Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.

Programming Puzzles &  Code Golf  is a question and answer  site for programming
puzzle enthusiasts  and code golfers.  It's built and run  by you as part of the
Stack Exchange network  of Q&A sites. With your help,  we're working together to
build a library of programming puzzles and their solutions.

Width: 20
Text:  Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.

Programming  Puzzles
&  Code  Golf  is  a
question and  answer
site for programming
puzzle   enthusiasts
and  code   golfers.
It's  built  and run
by  you  as  part of
the  Stack  Exchange
network    of    Q&A
sites.   With   your
help,  we're working
together to  build a
library of  program-
ming   puzzles   and
their solutions.

Width: 5
Text:  a b c d e f g h i j k l mm nn oo p-p qq rr ss t u vv ww x yy z

a b c
d e f
g h i
j k l
mm nn
oo pp
qq rr
ss  t
u  vv
ww  x
yy z

Width: 10
Text:  It's the bl-ack be-ast of Araghhhhh-hhh-h-hhh-h-h-h-hh!

It's   the
black  be-
ast     of
Araghhhhh-
hhhhhhhhh-
hhh!

Dave

Posted 2017-06-06T22:36:37.753

Reputation: 7 519

Yesss, finally another (text-based) [tag:typography] challenge :-) – ETHproductions – 2017-06-06T22:45:45.733

May we use built-ins and libraries? – Adám – 2017-06-06T22:59:25.543

1@Adám yes to builtins: there's no code restrictions, and shortest code wins. Though of course, it might make for a boring answer! As for libraries, you can as long as the library is freely available and you mark your answer as "language + library". Also the library version has to pre-date this challenge. – Dave – 2017-06-06T23:04:20.843

Mathematica: Hyphenation -> True, TextJustification -> 1, Done! ;-) – J42161217 – 2017-06-06T23:45:18.317

1In the event that a line can end with either a hyphen or a single character, e.g. anybod-y with width 7, may we choose to output either anybody or anybod-\ny? – darrylyeo – 2017-06-07T02:32:32.977

1@JonathanAllan yes; sorry, I'll fix that – Dave – 2017-06-07T06:53:38.097

3@darrylyeo no you have to output the full word in that case, since it must greedily have as many word characters as possible on each line. – Dave – 2017-06-07T06:56:42.403

Answers

7

JavaScript (ES6), 218 bytes

w=>s=>s.map((c,i)=>c.map((p,j)=>(k+p)[l="length"]-w-(b=!i|j>0)+(j<c[l]-1)<0?k+=b?p:" "+p:(Array(w-k[l]-b).fill(h=k.split` `).map((_,i)=>h[i%(h[l]-1)]+=" "),o.push(h.join` `+(b?"-":"")),k=p)),o=[],k="")&&o.join`
`+`
`+k

Takes arguments in currying syntax (f(width)(text)) and the text input is in the double array format described in the challenge. Strings are converted to that format via .split` `.map(a=>a.split`-`)). Also, the newlines are literal newlines inside template strings.

Un-golfed and rearranged

width=>string=> {
    out=[];
    line="";
    string.map((word,i)=> {
        word.map((part,j)=> {

            noSpaceBefore = i==0 || j>0;
            if ((line+part).length - width - noSpaceBefore + (j<word.length-1) < 0) {
                line += noSpaceBefore ? part : " "+part;
            }
            else {
                words=line.split` `;
                Array(width - line.length - noSpaceBefore).fill()
                    .map((_,i) => words[i % (words.length-1)] += " ");
                out.push(words.join(" ") + (noSpaceBefore? "-" : ""));
                line=part;
            }
        });
    });
    return out.join("\n") + "\n"+line
}

The idea here was to step through each part of the entire string and build up each line one part at a time. Once a line was complete, it increases the word spacing from left to right until all extra spaces are placed.

Test Snippet

f=
w=>s=>s.map((c,i)=>c.map((p,j)=>(k+p)[l="length"]-w-(b=!i|j>0)+(j<c[l]-1)<0?k+=b?p:" "+p:(Array(w-k[l]-b).fill(h=k.split` `).map((_,i)=>h[i%(h[l]-1)]+=" "),o.push(h.join` `+(b?"-":"")),k=p)),o=[],k="")&&o.join`
`+`
`+k
<style>*{font-family:Consolas,monospace;}</style>
<div oninput="O.innerHTML=f(+W.value)(S.value.split` `.map(a=>a.split`-`))">
Width: <input type="number" size="3" min="5" max="100" id="W">
Tests: <select id="T" style="width:20em" oninput="let x=T.value.indexOf(','),s=T.value;W.value=s.slice(0,x);S.value=s.slice(x+2)"><option></option><option>20, The q-uick brown fox ju-mp-s ove-r t-h-e lazy dog.</option><option>32, Given a width and a block of text cont-ain-ing pos-sible hyphen-ation points, for-mat it ful-ly-just-ified (in mono-space).</option><option>80, Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.</option><option>20, Pro-gram-ming Puz-zles & Code Golf is a ques-tion and ans-wer site for pro-gram-ming puz-zle enth-usi-asts and code golf-ers. It's built and run by you as part of the St-ack Exch-ange net-work of Q&A sites. With your help, we're work-ing to-g-et-her to build a lib-rary of pro-gram-ming puz-zles and their sol-ut-ions.</option><option>5, a b c d e f g h i j k l mm nn oo p-p qq rr ss t u vv ww x yy z</option><option>10, It's the bl-ack be-ast of Araghhhhh-hhh-h-hhh-h-h-h-hh</option></select><br>
Text: &nbsp;<textarea id="S" cols="55" rows="4"></textarea>
</div>
<pre id="O" style="border: 1px solid black;display:inline-block;"></pre>

Justin Mariner

Posted 2017-06-06T22:36:37.753

Reputation: 4 746

8

GNU sed -r, 621 bytes

Takes input as two lines: The width as a unary number first and the string second.

I'm certain this could be golfed much more but I've already dumped way too much time into it.

x;N
G
s/\n/!@/
:
/@\n/bZ
s/-!(.*)@ /\1 !@/
s/!(.*[- ])(@.*1)$/\1!\2/
s/@(.)(.*)1$/\1@\2/
s/-!(.*-)(@.*)\n$/\1!\2\n1/
s/(\n!@) /\1/
s/-!(.* )(@.*)\n$/\1!\2\n1/
s/-!(.*-)(@.*1)$/\1!\21/
s/!(.*)-@([^ ]) /\1\2!@ /
t
s/ !@(.*)\n$/\n!@\1#/
s/!(.*-)@(.*)\n$/\1\n!@\2#/
s/!(.*)(@ | @)(.*)\n$/\1\n!@\3#/
s/-!(.*[^-])@([^ ]) (.*)\n$/\1\2\n!@\3#/
s/!(.+)@([^ ].*)\n$/\n!@\1\2#/
/#|!@.*\n$/{s/#|\n$//;G;b}
:Z
s/-?!|@.*//g
s/ \n/\n/g
s/^/%/
:B
G
/%.*\n.+\n/!bQ
:C
s/%([^\n])(.*)1$/\1%\2/
tC
s/([^\n]+)%\n/%\1\n/
:D
s/%([^ \n]* )(.*)1$/\1 %\2/
tD
s/(^|\n)([^\n]+)%(.*1)$/\1%\2\3/
tD
s/%([^\n]*)\n(.*)\n$/\1\n%\2/
tB
:Q
s/%(.*)\n1*$/\1/

Try it online!

Explanation

The program works in two phases: 1. Split, and 2. Justify. For the below, suppose our input is:

111111111111
I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.

Setup

First we read the input, moving the first line (the width as a unary number) to the hold space (x), then appending the next line (N) and then a copy of the width from the hold space (G) to the pattern space. Since N left us with a leading \n we replace it with !@, which we'll use as cursors in Phase 1.

x;N
G
s/\n/!@/

Now the content of the hold space is 1111111111111 (and won't change henceforth) and the pattern space is (in the format of sed's "print unambiguously" l command):

!@I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$

Phase 1

In Phase 1, the main @ cursor advances one character at a time, and for each character a 1 is removed from the "counter" at the end of the pattern space. In other words, @foo\n111$, f@oo\n11$, fo@o\n1$, etc.

The ! cursor trails behind the @ cursor, marking places we could break if the counter reaches 0 in the middle of the line. A couple rounds would look like this:

!@I re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$
!I@ re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n11111111111$
!I @re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111111$

Here there's a pattern we recognize: a space immediately followed by the @ cursor. Since the counter is greater than 0, we advance the break marker, then keep advancing the main cursor:

I !@re-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111111$
I !r@e-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n111111111$
I !re@-mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n11111111$
I !re-@mem-ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111111$

Here's another pattern: -@, and we still have 7 in the counter, so we advance the break cursor again and keep advancing:

I re-!mem-@ber a time of cha-os, ru-ined dreams, this was-ted land.\n111$

Here's a different pattern: A hyphen immediately preceding the break cursor and another preceding the main cursor. We remove the first hyphen, advance the break cursor, and, since we removed a character, add 1 to the counter.

I remem-!@ber a time of cha-os, ru-ined dreams, this was-ted land.\n1111$

We keep advancing the main cursor:

I remem-!ber@ a time of cha-os, ru-ined dreams, this was-ted land.\n1$

Similar to before, but this time the main cursor precedes a space rather than following a hyphen. We remove the hyphen, but since we're also advancing the main cursor we neither increment no decrement the counter.

I remember !@a time of cha-os, ru-ined dreams, this was-ted land.\n1$
I remember !a@ time of cha-os, ru-ined dreams, this was-ted land.\n$

Finally, our counter has reached zero. Since the character after the main cursor is a space, we insert a newline and put both cursors immediately after it. Then we replenish the counter (G) and start again.

I remember a\n!@ time of cha-os, ru-ined dreams, this was-ted land.\n111111111111$

Phase 1 continues, advancing the cursors and matching various patterns, until the @ cursor reaches the end of the string.

# Phase 1
:
  # End of string; branch to :Z (end of phase 1)
  /@\n/bZ

  # Match -!.*@_
  s/-!(.*)@ /\1 !@/

  # Match [-_]@ and >0
  s/!(.*[- ])(@.*1)$/\1!\2/

  # Advance cursor
  s/@(.)(.*)1$/\1@\2/

  # Match -!.*-@ and 0; add 1
  s/-!(.*-)(@.*)\n$/\1!\2\n1/

  # Match \n!@_
  s/(\n!@) /\1/

  # Match -!.*_@ and 0; add 1
  s/-!(.* )(@.*)\n$/\1!\2\n1/

  # Match -!.*-@ and >0; add 1
  s/-!(.*-)(@.*1)$/\1!\21/

  # Match -@[^_]_
  s/!(.*)-@([^ ]) /\1\2!@ /

  # If there were any matches, branch to `:`
  t

  # Match _!@ and 0
  s/ !@(.*)\n$/\n!@\1#/

  # Match -@ and 0
  s/!(.*-)@(.*)\n$/\1\n!@\2#/

  # Match @_|_@ and 0
  s/!(.*)(@ | @)(.*)\n$/\1\n!@\3#/

  # Match -!.*[^-]@[^_]_ and 0
  s/-!(.*[^-])@([^ ]) (.*)\n$/\1\2\n!@\3#/

  # Match !.+@[^_] and 0
  s/!(.+)@([^ ].*)\n$/\n!@\1\2#/

  # Match marked line (#) or !@ and 0
  /#|!@.*\n$/{
    # Remove mark; append width and branch to `:`
    s/#|\n$//
    G
    b
  }

:Z

# Cleanup
s/-?!|@.*//g
s/ \n/\n/g

At the end of Phase 1, our pattern space looks like this:

I remember a\ntime of cha-\nos, ruined\ndreams, this\nwasted land.

Or:

I remember a
time of cha-
os, ruined
dreams, this
wasted land.

Phase 2

In Phase 2 we use % as a cursor and use the counter in a similar way, beginning like this:

%I remember a\ntime of cha-\nos, ruined\ndreams, this\nwasted land.\n111111111111$

First, we count the characters on the first line by advancing the cursor and removing 1s from the counter, after which we have;

I remember a%\ntime of cha-\nos, ruined\ndreams, this\nwasted land.\n$

Since the counter is 0, we don't do anything else on this line. The second line also has the same number of characters as the counter, so let's skip to the third line:

I remember a\ntime of cha-\nos, ruined%\ndreams, this\nwasted land.\n11$

The counter is greater than 0, so we move the cursor back to the beginning of the line. Then we find the first run of spaces and add a space, decrementing the counter.

I remember a\ntime of cha-\nos, % ruined\ndreams, this\nwasted land.\n1$

The counter is greater than 0; since the cursor is already in the last (only) run of spaces on the line, we move it back to the beginning of the line and do it again:

I remember a\ntime of cha-\nos,  % ruined\ndreams, this\nwasted land.\n$

Now the counter is 0, so we move the cursor to the beginning of the next line. We repeat this for every line except the last. That's the end of phase 2 and the end of the program! The final result is:

I remember a
time of cha-
os,   ruined
dreams, this
wasted land.
# Phase 2
# Insert cursor
s/^/%/
:B
  # Append counter from hold space
  G
  # This is the last line; branch to :Q (end of phase 1)
  /%.*\n.+\n/!bQ

  :C
    # Count characters
    s/%([^\n])(.*)1$/\1%\2/
    tC

  # Move cursor to beginning of line
  s/([^\n]+)%\n/%\1\n/

  :D
    # Add one to each space on the line as long as counter is >0
    s/%([^ \n]* )(.*)1$/\1 %\2/
    tD

    # Counter is still >0; go back to beginning of line
    s/(^|\n)([^\n]+)%(.*1)$/\1%\2\3/
    tD

    # Counter is 0; move cursor to next line and branch to :B
    s/%([^\n]*)\n(.*)\n$/\1\n%\2/
    tB

:Q

# Remove cursor, any remaining 1s
s/%(.*)\n1*$/\1/

Jordan

Posted 2017-06-06T22:36:37.753

Reputation: 5 001

This is incredible, but when I run it using gsed (GNU sed) 4.4 I get gsed: -e expression #1, char 16: ":" lacks a label. Can you add a note on exactly how you're invoking it? (I'm using printf "%s\n%s" "$1" "$2" | gsed -r '<code here>';) – Dave – 2017-06-12T20:28:28.723

@Dave That works for me in GNU sed 4.2. Here's a gist: https://gist.github.com/jrunning/91a7584d95fe10ef6b036d1c82bd385c Note that TiO's sed page doesn't seem to respect the -r flag, which is why the TiO link above goes to the bash page.

– Jordan – 2017-06-12T20:50:30.080

Ah, I hadn't noticed the TiO link. That'll do for me; have a +1! There are 2 small mistakes on the last example though (the "black beast" one): it prints the second-to-last line one character short, and misses the final ! (though since I missed ! from the list of possible special characters, I won't hold that against it). – Dave – 2017-06-12T20:58:30.420

5

JavaScript (ES6), 147 bytes

Takes input as (width)(text).

w=>F=(s,p=S=' ')=>(g=([c,...b],o='',h=c=='-')=>c?o[w-1]?c==S&&o+`
`+F(b):o[w+~h]?o+c+`
`+F(b):c>S?g(b,h?o:o+c):g(b,o+p)||g(b,o+p+c):o)(s)||F(s,p+S)

Try it online!

Commented

w =>                              // w = requested width
  F = (                           // F is a recursive function taking:
    s,                            //   s = either the input string (first iteration) or an
                                  //       array of remaining characters (next iterations)
    p =                           //   p = current space padding
    S = ' '                       //   S = space character
  ) => (                          //
    g = (                         // g is a recursive function taking:
      [c,                         //   c   = next character
          ...b],                  //   b[] = array of remaining characters
      o = '',                     //   o   = output for the current line
      h = c == '-'                //   h   = flag set if c is a hyphen
    ) =>                          //
      c ?                         // if c is defined:
        o[w - 1] ?                //   if the line is full:
          c == S &&               //     fail if c is not a space
          o + `\n` + F(b)         //     otherwise, append o + a linefeed and process the
                                  //     next line
        :                         //   else:
          o[w + ~h] ?             //     if this is the last character and c is a hyphen:
            o + c + `\n` + F(b)   //       append o + c + a linefeed and process the next
                                  //       line
          :                       //     else, we process the next character:
            c > S ?               //       if c is not a space:
              g(b, h ? o : o + c) //         append c if it's not a hyphen
            :                     //       else:
              g(b, o + p) ||      //         append either the current space padding
              g(b, o + p + c)     //         or the current padding and one extra space
      :                           // else:
        o                         //   success: return o
  )(s)                            // initial call to g() with s
  || F(s, p + S)                  // in case of failure, try again with a larger padding

Arnauld

Posted 2017-06-06T22:36:37.753

Reputation: 111 334

4

APL (Dyalog Unicode), 129 123 121 118 111 109 107 104 100 95 bytesSBCS

{⊃⌽m←⍺≥-⌿c⍪+\⊢⌿c←' -'∘.≠⍵:⊂⍵/⍨⊢⌿c⋄(⊂∊l⊣l[(⍺-≢l)⍴⍸' '=l],←⊃0⍴l←⍵/⍨n×⊣⌿c⊖⍨1⌽n),⍺∇⍵/⍨~n←⌽∨\⌽m>×⌿c}

Try it online!

ngn

Posted 2017-06-06T22:36:37.753

Reputation: 11 449

1

Stax, 51 50 bytes

éS┴k╖♀╡¶┐'▼jÆD╨◙♠ß╒└╧rî{♀EÜ╢a┌ùLiƒ7ÉøΩS╜╪åⁿM◙┴,○9H

Run and debug it

recursive

Posted 2017-06-06T22:36:37.753

Reputation: 8 616

1

Python 2, 343 bytes

W,T=input()
T+=' '
L,l=[],len
while T:
 p,r=0,''
 for i in range(l(T)):
  s=T[:i].replace('-','')
  if'-'==T[i]:s+='-'
  if T[i]in' -'and W-l(s)>=0:p,r=i,s
 R=r.split()
 if R:
  d,k=W-l(''.join(R)),0
  for j in range(d):
   R[k]+=' '
   k+=1
   if k==l(R)-1:k=0
  L+=[''.join(R)]
  T=T[p+1:]
print'\n'.join(L[:-1])
print' '.join(L[-1].split())

Try it online!

The  input  is a block of text
containing possibly hyphenated
words.  For  each space/hyphen
position  p  the code computes
l(p)  the  length  of the line
induced  by  slipping the text
to this space/hyphen. Then the
code choses the position p for
which  the  length l(p) is the
closest  to  the given width W
(and  l(p)<=W).  If l(p)<W the
code  adds spaces  fairly  in-
between  the  words to achieve
the length W.

mdahmoune

Posted 2017-06-06T22:36:37.753

Reputation: 2 605

Though input may be in any format you like, it should still come from STDIN or parameters. See defaults for I/O. We don't generally allow "input" to be from pre-assigned variables.

– mbomb007 – 2019-07-15T22:00:15.813

You can save a byte by doing print'\n'.join(L[:-1]) instead of for e in L[:-1]:print e – mbomb007 – 2019-07-15T22:02:32.403

@mbomb007 ok yes I’ll do the needed changes to respect the I/O – mdahmoune – 2019-07-15T22:38:23.917