Avoiding Rivers

48

2

Background

In typography, rivers are visual gaps in a block of text, which occur due to coincidental alignment of spaces. These are particularly annoying since your brain seems to pick them up more easily in peripheral vision, which constantly distracts your eyes.

As an example, take the following block of text, lines broken such that the line width does not exceed 82 characters:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor
incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet,
consectetur adipisicing elit, sed do eismod tempor incididunt ut labore et dolore
maga aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.

There is a river spanning six lines in the bottom right part, which I've highlighted in the following block:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor
incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
officia deserunt mollit anim id est laborum. Lorem█ipsum dolor sit amet,
consectetur adipisicing elit, sed do eismod tempor█incididunt ut labore et dolore
maga aliqua. Ut enim ad minim veniam, quis nostrud█exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute█irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla█pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui█officia deserunt mollit anim id
est laborum.

We can mitigate this by choosing a slightly different column width. E.g. if we layout the same text using lines no longer than 78 characters, there is no river longer than two lines:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor
incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor
sit amet, consectetur adipisicing elit, sed do eismod tempor incididunt ut
labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis
aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Note that for the purpose of this question we're only considering monospaced fonts, such that rivers are simply vertical columns of spaces. The length of a river is the number of lines it spans.

Aside: If you're interesting in river detection in proportional fonts, there are some interesting posts around the network.

The Challenge

You're given a string of printable ASCII characters (code point 0x20 to 0x7E) - i.e. a single line. Print this text, with a line width between 70 and 90 characters (inclusive), such that the maximum length of any river in the text is minimised. If there are multiple text widths with the same (minimal) maximum river length, choose the narrower width. The above example with 78 characters is the correct output for that text.

To break lines, you should replace space characters (0x20) with line breaks, such that the resulting lines have as many characters as possible, but not more than the chosen text width. Note that the resulting line break itself is not part of that count. As an example, in the last block above, Lorem[...]tempor contains 78 characters, which is also the text's width.

You may assume that the input will not contain consecutive spaces, and won't have leading or trailing spaces. You may also assume that no word (consecutive substring of non-spaces) will contain more than 70 characters.

You may write a program or function, taking input via STDIN, command-line argument or function argument and printing the result to STDOUT.

This is code golf, so the shortest answer (in bytes) wins.

Martin Ender

Posted 2014-11-05T09:27:37.447

Reputation: 184 808

I think in your 78 and 82 column wrap examples, the last and second-to-last lines are incorrect. In the 82 example, the last break should be between id and est, and in the 78 example it should be between in and culpa. Or am I doing something wrong? – Cristian Lupascu – 2014-11-05T13:10:45.987

@Optimizer The tie break is the text length, not the river length. – FryAmTheEggman – 2014-11-05T17:32:15.923

I guess it doesn't count as an official river, but in the example 78 characters max length, there seems to be a pretty long diagonal river in the top-ish left-ish area – markasoftware – 2014-11-06T03:33:56.097

Do we consider cases like this as continues rivers ?

– Optimizer – 2014-11-06T08:25:51.150

Great challenge! Hm, next one could be about having (not purely vertical ) rivers shaping subliminal letters ;) – Tobias Kienzler – 2014-11-06T09:51:00.600

@Optimizer I don't think any of the five existing solutions considers that, and I've defined rivers as a vertical column of space characters, so no. – Martin Ender – 2014-11-06T10:38:31.473

I doubt that they are specially treating it as not a river ... I am not, but I have a fix for that – Optimizer – 2014-11-06T10:39:19.660

@Optimizer I'll leave it up to you (and any other participant). It was an oversight in the spec. – Martin Ender – 2014-11-06T10:41:32.740

That DSP.SE post has some awesome answers! – wchargin – 2014-11-07T05:31:15.317

Can I have trailing white spaces on each lines in the output ? – Optimizer – 2014-11-07T12:14:45.460

@Optimizer Sorry, no, I think the spec for breaking lines is pretty precise about replacing those spaces with line breaks. – Martin Ender – 2014-11-07T12:15:58.757

Answers

7

CJam, 116 106 99 84 77 72 bytes

l:X;93,72>{:D;OOXS/{S+_2$+,D<{+}{@@);a+\}?}/a+}%{z'K*S/:!0a/1fb$W=}$0=N*

Takes the single line input and prints the correct output to STDOUT.

UPDATE : Improved a lot and removed redundant loops by doing all calculations in the sorting loop itself. Also fixed a bug in river length calculation.

Explanation soon (after I golf it even further)

Try it here

Optimizer

Posted 2014-11-05T09:27:37.447

Reputation: 25 836

@Optimizer You could use input from ARGV though, then you can do ea~ instead of X each time. Saves two bytes. – Martin Ender – 2014-11-05T18:43:27.400

12

Ruby 162 160 158 152 160 157 (demo)

i=gets+' '
(69..s=r=89).map{|c|w=i.scan(/(.{1,#{c}}\S) /).flatten
m=(0..c).map{|i|w.map{|l|l[i]}+[?x]}.join.scan(/ +/).map(&:size).max
m<s&&(s=m;r=w)}
puts r

The un-golfed version:

input = gets+' '

result = ''

(69..smallest_max=89).each{|w|
  #split text into words of at most w characters
  wrap = (input+' ').scan(/(.{1,#{w}}\S) /).flatten

  #transpose lines and find biggest "river"
  max_crt_river = (0..99).map{|i| wrap.map{|l|l[i]} }.flatten.join.scan(/ +/).max_by(&:size).size

  if max_crt_river < smallest_max
    smallest_max = max_crt_river
    result = wrap.join ?\n
  end
}
puts result

Cristian Lupascu

Posted 2014-11-05T09:27:37.447

Reputation: 8 369

@MartinBüttner %r{...} allows me to use string interpolation. I just tried 21.times, but it has some more implications down the road, and I haven't managed to reach a shorter solution. – Cristian Lupascu – 2014-11-05T16:17:40.920

@MartinBüttner You are right, it does work! I've edited my answer. Thanks! – Cristian Lupascu – 2014-11-05T16:25:35.893

This doesn't work with http://pastebin.com/vN2iAzNd

– Joshpbarron – 2014-11-07T09:28:13.473

@Joshpbarron Very well spotted! I fixed it now. – Cristian Lupascu – 2014-11-07T10:16:12.323

8

Bash+coreutils, 236 157 bytes

Edited with a different approach - quite a bit shorter than before:

a=(`for i in {71..91};{
for((b=1;b++<i;));{
fold -s$i<<<$@|cut -b$b|uniq -c|sort -nr|grep -m1 "[0-9]  "
}|sort -nr|sed q
}|nl -v71|sort -nk2`)
fold -s$a<<<$@

Reads the input string from the command line.

With 3 nested sorts, I shudder to think what the big-O time complexity is for this, but it does complete the example in under 10 seconds on my machine.

Digital Trauma

Posted 2014-11-05T09:27:37.447

Reputation: 64 644

8

APL (105)

{∊{1↓∊⍵,3⊃⎕TC}¨⊃G/⍨V=⌊/V←{⌈/≢¨⊂⍨¨↓⍉2≠⌿+\↑≢¨¨⍵}¨G←(K⊂⍨' '=K←' ',⍵)∘{×⍴⍺:(⊂z/⍺),⍵∇⍨⍺/⍨~z←⍵>+\≢¨⍺⋄⍺}¨70+⍳21}

Explanation:

  • (K⊂⍨' '=K←' ',⍵): Add a space in front of , then split on the spaces. Each word retains the space it begins with.
  • ∘{...}¨70+⍳21: with that value, for each number in the range [71, 91]: (Because of the way the words are split, each 'line' ends up with an extra space on the beginning, which will be removed later. The range is shifted by one to compensate for the extra space.)
    • ×⍴⍺:: if there are still words left,
      • z←⍵>+\≢¨⍺: get the length for each word, and calculate a running total of the length, per word. Mark with a 1 all the words that can be taken to fill up the next line, and store this in z.
      • (⊂z/⍺),⍵∇⍨⍺⍨~z: take those words, and then process what's left of the list.
    • ⋄⍺: if not, return (which is now empty).
  • G←: store the list of lists of lines in G (one for each possible line length).
  • V←{...}¨G: for each possibility, calculate the length of the longest river and store it in V:
    • +\↑≢¨¨⍵: get the length of each word (again), and make a matrix out of the lengths. Calculate the running total for each line on the rows of the matrix. (Thus, the extra space at the beginning of each line is ignored.)
    • 2≠⌿: for each column of the matrix, see if the current length of the line at that point does not match the line after it. If so, there is not a river there.
    • ⊂⍨¨↓⍉: split each column of the matrix by itself (on the 1s). This gives a list of lists, where for each river there will be a list [1, 0, 0, ...], depending on the length of the river. If there is no river, the list will be [1].
    • ⌈/≢¨: get the length of each river, and get the maximum value of that.
  • ⊃G/⍨V=⌊/V: from G, select the first item for which the length of the longest river is equal to the minimum for all items.
  • {1↓∊⍵,3⊃⎕TC}¨: for each line, join all of the words together, remove the fist item (the extra space from the beginning), and add a newline to the end.
  • : join all the lines together.

marinus

Posted 2014-11-05T09:27:37.447

Reputation: 30 224

That's 200 bytes, not 105. – user11153 – 2014-11-06T15:12:55.160

3

@user11153 I haven't specified UTF-8 as the encoding. The APL character set fits into a single codepage (and that codepage exists, i.e. there is an existing encoding by which each of those characters fits into a byte, and hence 105 is perfectly fine.

– Martin Ender – 2014-11-06T16:06:37.820

Good to know! :) – user11153 – 2014-11-06T18:49:32.207

3

Python, 314 bytes

Many thanks to SP3000, grc, and FryAmTheEggman:

b=range;x=len
def p(i):
 l=[];z=''
 for n in t:
  if x(z)+x(n)<=i:z+=n+' '
  else:l+=[z];z=n+' '
 return l+[z]*(z!=l[x(l)-1])
t=input().split();q=[]
for i in b(70,91):l=p(i);q+=[max(sum(x(l[k+1])>j<x(l[k])and l[k][j]is' '==l[k+1][j]for k in b(x(l)-1))for j in b(i))]
print(*p(q.index(min(q))+70),sep='\n')

user10766

Posted 2014-11-05T09:27:37.447

Reputation:

2More like Pi-thon – Optimizer – 2014-11-07T12:20:33.327

3

JavaScript (ES6) 194 202

Iterative solution, maybe shorter if made recursive

F=s=>{
  for(m=1e6,b=' ',n=70;n<91;n++)
    l=b+'x'.repeat(n),x=r=q='',
    (s+l).split(b).map(w=>
      (t=l,l+=b+w)[n]&&(
        l=w,r=r?[...t].map((c,p)=>x<(v=c>b?0:-~r[p])?x=v:v,q+=t+'\n'):[]
      )
    ),x<m&&(o=q,m=x);
  alert(o)
}

Explained

F=s=> {
  m = 1e9; // global max river length, start at high value
  for(n=70; n < 91; n++) // loop on line length
  {
    l=' '+'x'.repeat(n), // a too long first word, to force a split and start
    x=0, // current max river length
    q='', // current line splitted text
    r=0, // current river length for each column (start 0 to mark first loop)
    (s+l) // add a too long word to force a last split. Last and first row will not be managed
    .split(' ').map(w=> // repeat for each word 
      (
        t=l, // current partial row in t (first one will be dropped)
        (l += ' '+w)[n] // add word to partial row and check if too long
        &&
        (
          l = w, // start a new partial row with current word
          r=r? // update array r if not at first loop
          ( 
            q+=t+'\n', // current row + newline added to complete text 
            [...t].map((c,p)=>( // for each char c at position p in row t
              v = c != ' ' 
                ? 0 // if c is not space, reset river length at 0
                : -~r[p], // if c is space, increment river length
              x<v ? x=v : v // if current > max, update max
            ))
          ):[]  
        )  
      )
    )
    x < m && ( // if current max less than global max, save current text and current max
      o = q,
      m = x
    )
  }
  console.log(o,m)
}

Test in FireFox/FireBug console.

F('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.')

Output

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eismod tempor
incididunt ut labore et dolore maga aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor
sit amet, consectetur adipisicing elit, sed do eismod tempor incididunt ut
labore et dolore maga aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis
aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

edc65

Posted 2014-11-05T09:27:37.447

Reputation: 31 086

3

Python 3, 329 bytes

import re,itertools as s
def b(t,n):
 l=0;o=""
 for i in t.split():
  if l+len(i)>n:o=o[:-1]+'\n';l=0
  l+=len(i)+1;o+=i+' '
 return o
t=input();o={}
for n in range(90,69,-1):o[max([len(max(re.findall('\s+',x),default='')) for x in ["".join(i) for i in s.zip_longest(*b(t,n).split('\n'),fillvalue='')]])]=n
print(b(t,o[min(o)]))

Ungolfed version:

# Iterates over words until length > n, then replaces ' ' with '\n'
def b(t,n):
    l = 0
    o = ""
    for i in t.split():
        if l + len(i) > n:
            o = o[:-1] + '\n'
            l = 0
        l += len(i) + 1
        o += i + ' '
    return o

t = input()
o = {}
# range from 90 to 70, to add to dict in right order
for n in range(90,69,-1):
    # break text at length n and split text into lines
    temp = b(t,n).split('\n')
    # convert columns into rows
    temp = itertools.zip_longest(*temp, fillvalue='')
    # convert the char tuples to strings
    temp = ["".join(i) for i in temp]
    # findall runs of spaces, get longest run and get length
    temp = [len(max(re.findall('\s+',x),default='')) for x in temp]
    # add max river length as dict key, with line length as value
    o[max(temp)] = n

print(b(t,o[min(o)]))

erebos

Posted 2014-11-05T09:27:37.447

Reputation: 31