Justify a text by adding spaces

10

1

Given this text

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

write the shortest program that produces the same text justified at 80 character. The above text must look exactly as:

Lorem ipsum dolor sit amet,  consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut  labore et  dolore magna aliqua.  Ut  enim ad  minim veniam,  quis
nostrud exercitation ullamco laboris nisi ut  aliquip ex  ea  commodo consequat.
Duis aute irure dolor in  reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.  Excepteur sint occaecat cupidatat non proident,  sunt in
culpa qui officia deserunt mollit anim id est laborum.

Rules:

  • words must not be cut
  • extra spaces must be added
    • after a dot.
    • after a comma
    • after the shortest word (from left to right)
    • the result must not have more than 2 consecutive spaces
  • last line is not justified.
  • lines must not begin with comma or dot.
  • provide the output of your program

winner: The shortest program.

note: The input string is provided on STDIN as one line (no line feed or carriage return)

update:

The input string can be any text with word length reasonnable (ie. not more than 20~25 char) such as:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. Praesent egestas leo in pede. Praesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac mauris sed pede pellentesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit. Ut velit mauris, egestas sed, gravida nec, ornare ut, mi. Aenean ut orci vel massa suscipit pulvinar. Nulla sollicitudin. Fusce varius, ligula non tempus aliquam, nunc turpis ullamcorper nibh, in tempus sapien eros vitae ligula. Pellentesque rhoncus nunc et augue. Integer id felis. Curabitur aliquet pellentesque diam. Integer quis metus vitae elit lobortis egestas. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Morbi vel erat non mauris convallis vehicula. Nulla et sapien. Integer tortor tellus, aliquam faucibus, convallis id, congue eu, quam. Mauris ullamcorper felis vitae erat. Proin feugiat, augue non elementum posuere, metus purus iaculis lectus, et tristique ligula justo vitae magna. Aliquam convallis sollicitudin purus. Praesent aliquam, enim at fermentum mollis, ligula massa adipiscing nisl, ac euismod nibh nisl eu lectus. Fusce vulputate sem at sapien. Vivamus leo. Aliquam euismod libero eu enim. Nulla nec felis sed leo placerat imperdiet. Aenean suscipit nulla in justo. Suspendisse cursus rutrum augue. Nulla tincidunt tincidunt mi. Curabitur iaculis, lorem vel rhoncus faucibus, felis magna fermentum augue, et ultricies lacus lorem varius purus. Curabitur eu amet.

Toto

Posted 2011-12-10T12:57:31.253

Reputation: 909

3Why ask people to provide the output of their program? Are you that worried about people failing to check their results before posting? – Peter Taylor – 2011-12-10T14:05:31.317

1I'm tempted to provide a php program which consists of the output text. ;-) Seriously though, the spaces on the second line of the output text seem to have been added to the spaces at random? Is there some pattern to it that I'm not seeing, and if not, how can we be expected to produce exactly that output for the given input? – Gareth – 2011-12-10T14:23:34.543

@Gareth: Sorry, my bad. I made a mistake, is after the comma, not after incididunt. Question edited. – Toto – 2011-12-10T15:36:46.717

@Peter Taylor: Just because I'm not able to test all languages. – Toto – 2011-12-10T15:38:29.117

Does the program have to work also for inputs other than that one paragraph of Lipsum? – Ilmari Karonen – 2011-12-10T17:03:12.150

1@Ilmari Karonen: Yes, the input string can be anything. – Toto – 2011-12-10T17:06:52.587

Just after submitting my answer I saw that you changed the task by removing the restriction to the input string. Can you please adjust the question itself. I'll then try to rewrite my answer to work for any input. – Howard – 2011-12-10T17:32:46.103

What should the program do if the text cannot be justified within the rules given above (e.g. one word with more than 80 charcters or two words with 50 each, ...)? – Howard – 2011-12-10T17:35:41.430

@Howard: see my update – Toto – 2011-12-10T18:14:28.793

Thank you for clarifying. But consider also a text with e.g. 4 words with 20 caracters each. You cannot justify it according to your rules. – Howard – 2011-12-10T18:38:40.837

Answers

5

Perl, 94 chars

for(/(.{0,80}\s)/g){$i=1;$i+=!s/^(.*?\.|.*?,|(.*? )??\S{$i}) \b/$1  /until/
|.{81}/;chop;say}

Run with perl -nM5.01. (The n is included in the character count.)

The code above is the shortest I could make that could handle any curveballs I threw at it (such as one-letter words at the beginning of a line, input lines exactly 80 chars long, etc.) exactly according to spec:

Lorem ipsum dolor sit amet,  consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut  labore et  dolore magna aliqua.  Ut  enim ad  minim veniam,  quis
nostrud exercitation ullamco laboris nisi ut  aliquip ex  ea  commodo consequat.
Duis aute irure dolor in  reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.  Excepteur sint occaecat cupidatat non proident,  sunt in
culpa qui officia deserunt mollit anim id est laborum.

I'm  tempted to  provide a  php  program which consists of  the output text. ;-)
Seriously though,  the spaces on the second line of the output text seem to have
been added to  the spaces at  random? Is  there some pattern to  it that I'm not
seeing,  and if  not,  how can we be expected to produce exactly that output for
the given input?

(With apologies to Gareth for using his comment as additional test input.)

The following 75-char version works well enough to produce the sample output from the sample input, but can fail for other inputs. Also, it leaves an extra space character at the end of each the output line.

for(/(.{0,80}\s)/g){s/(.*?\.|.*?,|.*? ..) \b/$1  /until/.{81}/||s/
//;say}

Both versions will loop forever if they encounter input that they can't justify correctly. (In the longer version, replacing until with until$i>80|| would fix that at the cost of seven extra chars.)

Ilmari Karonen

Posted 2011-12-10T12:57:31.253

Reputation: 19 513

Ah, I should have started with a perl solution ;-) This language is of course really good for such a task. – Howard – 2011-12-10T19:03:24.157

I got Quantifier in {,} bigger than 32766 in regex; marked by <-- HERE in m/^(.*?\.|.*?,|(.*? )??\S{ <-- HERE 32767}) \b/ for the second text. – Toto – 2011-12-11T13:43:07.777

@M42: That's because the second example text cannot be justified according to the rules. If I add in the $i>80 check, it expands the 11th line to pede  pellentesque  fermentum.  Maecenas  adipiscing  ante  non  diam  sodales, which is only 78 chars long, and then gives up since each word (except the last) is followed by two spaces. – Ilmari Karonen – 2011-12-13T17:50:50.470

2

Ruby, 146 characters

$><<gets.gsub(/(.{,80})( |$)/){$2>""?(s=$1+$/;(['\.',?,]+(1..80).map{|l|"\\b\\w{#{l}}"}).any?{|x|s.sub! /#{x} (?=\w)/,'\& '}while s.size<81;s):$1}

It prints exactly the desired output (see below) if the given text is fed into STDIN.

Lorem ipsum dolor sit amet,  consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut  labore et  dolore magna aliqua.  Ut  enim ad  minim veniam,  quis
nostrud exercitation ullamco laboris nisi ut  aliquip ex  ea  commodo consequat.
Duis aute irure dolor in  reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur.  Excepteur sint occaecat cupidatat non proident,  sunt in
culpa qui officia deserunt mollit anim id est laborum.

Edit: Just after submitting my first solution I saw in the comments that it is required that any input string can be processed. The previous answer was only 95 characters but did not fulfill this requirement:

r=gets.split;l=0;'49231227217b6'.chars{|s|r[l+=s.hex]+=' '};(r*' ').gsub(/(.{,80}) ?/){puts $1}

Howard

Posted 2011-12-10T12:57:31.253

Reputation: 23 109

If I'm not mistaken, you're using the same cheat as I thought of (encoding the locations of the double-spaced words in the example output). Note that M42 has clarified that the programs should cope with other inputs too. – Ilmari Karonen – 2011-12-10T17:36:46.987

@Ilmari Karonen Yes, I saw that after submitting. See my edit and comments above. Going back to the golf course... – Howard – 2011-12-10T17:38:05.547