Patch the Paragraph

32

4

In the spirit of Patch the Image, here's a similar challenge but with text.

Challenge

Bit rot has afflicted your precious text! Given a paragraph composed of ASCII characters, with a rectangular hole somewhere in it, your program should try to fill in the hole with appropriate text, so that the paragraph blends as best as possible.

Further definitions

  • The hole will always be rectangular, and it may span multiple lines.
  • There will only ever be one hole.
  • Note that the hole does not necessarily fall on word boundaries (in fact, it usually won't).
  • The hole will be at most 25% of the input paragraph, but may overlap or extend past the "end" of the "normal" text (see the Euclid or Badger examples below).
  • Since finding the hole is not the main point of this challenge, it will be composed solely of hash marks # to allow for easy identification.
  • No other location in the input paragraph will have a hash mark.
  • Your code cannot use the "normal" text in the examples below - it will only receive and process the text with the hole in it.
  • The input can be as a single multi-line string, as an array of strings (one element per line), as a file, etc. -- your choice of what is most convenient for your language.
  • If desired, an optional additional input detailing the coordinates of the hole can be taken (e.g., a tuple of coordinates or the like).
  • Please describe your algorithm in your submission.

Voting

Voters are asked to judge the entries based on how well the algorithm fills in the text hole. Some suggestions include the following:

  • Does the filled in area match the approximate distribution of spaces and punctuation as the rest of the paragraph?
  • Does the filled in area introduce faulty syntax? (e.g., two spaces in a row, a period followed by a question mark, an erroneous sequence like , ,, etc.)
  • If you squint your eyes (so you're not actually reading the text), can you see where the hole used to be?
  • If there aren't any CamelCase words outside the hole, does the hole contain any? If there aren't any Capitalized Letters outside the hole, does the hole contain any? If There Are A Lot Of Capitalized Letters Outside The Hole, does the hole contain a proportionate amount?

Validity Criterion

In order for a submission to be considered valid, it must not alter any text of the paragraph outside of the hole (including trailing spaces). A single trailing newline at the very end is optional.

Test Cases

Format is original paragraph in a code block, followed by the same paragraph with a hole. The paragraphs with the hole will be used for input.

1 (Patch the Image)

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, based on the information outside of that patch. And it does a
quite good job, considering it is just a program. As a human, you can sometimes
see that something is wrong, but if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, #############information outside of that patch. And it does a
quite good job, co#############is just a program. As a human, you can sometimes
see that something#############t if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

2 (Gettysburg Address)

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here have thus far so nobly advanced. It
is rather for us to be here dedicated to the great task remaining before us-
that from these honored dead we take increased devotion to that cause for which
they gave the last full measure of devotion-that we here highly resolve that
these dead shall not have died in vain-that this nation, under God, shall have
a new birth of freedom-and that government of the people, by the people, for
the people, shall not perish from the earth.

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here h######################advanced. It
is rather for us to be here dedicated to the######################before us-
that from these honored dead we take increas######################use for which
they gave the last full measure of devotion-######################solve that
these dead shall not have died in vain-that ######################, shall have
a new birth of freedom-and that government of the people, by the people, for
the people, shall not perish from the earth.

3 (Lorem Ipsum)

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conse################irure dolor in reprehenderit
in voluptate velit esse cil################giat nulla pariatur. Excepteur
sint occaecat cupidatat non################in culpa qui officia deserunt
mollit anim id est laborum.

4 (Jabberwocky)

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

'Twas brillig, and the slithy toves
Did gyre a######### in the wabe;
All mimsy #########borogoves,
And the mome raths outgrabe.

5 (Euclid's proof of the Pythagorean Theorem)

1.Let ACB be a right-angled triangle with right angle CAB.
2.On each of the sides BC, AB, and CA, squares are drawn,
CBDE, BAGF, and ACIH, in that order. The construction of
squares requires the immediately preceding theorems in Euclid,
and depends upon the parallel postulate. [footnote 14]
3.From A, draw a line parallel to BD and CE. It will
perpendicularly intersect BC and DE at K and L, respectively.
4.Join CF and AD, to form the triangles BCF and BDA.
5.Angles CAB and BAG are both right angles; therefore C, A,
and G are collinear. Similarly for B, A, and H.
6.Angles CBD and FBA are both right angles; therefore angle ABD
equals angle FBC, since both are the sum of a right angle and angle ABC.
7.Since AB is equal to FB and BD is equal to BC, triangle ABD
must be congruent to triangle FBC.
8.Since A-K-L is a straight line, parallel to BD, then rectangle
BDLK has twice the area of triangle ABD because they share the base
BD and have the same altitude BK, i.e., a line normal to their common
base, connecting the parallel lines BD and AL. (lemma 2)
9.Since C is collinear with A and G, square BAGF must be twice in area
to triangle FBC.
10.Therefore, rectangle BDLK must have the same area as square BAGF = AB^2.
11.Similarly, it can be shown that rectangle CKLE must have the same
area as square ACIH = AC^2.
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

1.Let ACB be a right-angled triangle with right angle CAB.
2.On each of the sides BC, AB, and CA, squares are drawn,
CBDE, BAGF, and ACIH, in that order. The construction of
squares requires the immediately preceding theorems in Euclid,
and depends upon the parallel postulate. [footnote 14]
3.From A, draw a line parallel to BD and CE. It will
perpendicularly intersect BC and DE at K and L, respectively.
4.Join CF and AD, to form the triangles BCF and BDA.
5.Angles CAB and BAG are both right angles; therefore C, A,
and G are #############milarly for B, A, and H.
6.Angles C#############e both right angles; therefore angle ABD
equals ang############# both are the sum of a right angle and angle ABC.
7.Since AB#############FB and BD is equal to BC, triangle ABD
must be co#############iangle FBC.
8.Since A-#############ight line, parallel to BD, then rectangle
BDLK has t############# of triangle ABD because they share the base
BD and hav#############titude BK, i.e., a line normal to their common
base, conn#############rallel lines BD and AL. (lemma 2)
9.Since C #############with A and G, square BAGF must be twice in area
to triangl#############
10.Therefo############# BDLK must have the same area as square BAGF = AB^2.
11.Similar############# shown that rectangle CKLE must have the same
area as square ACIH = AC^2.
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

6 (Badger, Badger, Badger by weebl)

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger##################badger, badger,
badger##################badger, badger
Mushro##################
Badger##################badger, badger,
badger##################badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

AdmBorkBork

Posted 2016-07-29T12:52:31.387

Reputation: 41 581

may I assume the hole is at least three characters wide – Rohan Jhunjhunwala – 2016-08-03T00:36:50.903

@RohanJhunjhunwala Sure. Given the sizes of the text, that's a fairly safe assumption. – AdmBorkBork – 2016-08-03T12:24:47.440

The gettysburg example apparently contains em dashes, which aren't plain ascii. Just pointing that out since you said in your comments in one of the answers that you'd use plain ascii test cases. – SuperJedi224 – 2017-05-23T01:39:58.847

@SuperJedi224 Thanks - fixed. – AdmBorkBork – 2017-07-18T12:55:45.603

Answers

15

Python 2

I know that @atlasologist already posted a solution in Python 2, but the way my works is a bit different. This works by going through all holes, from top to bottom, left to right, looking 5 characters back and at the character above, and finding a character where these matches. If multiple characters are found, it picks the most common one. In case there are no characters found, it removes the above-character restriction. If there are still no characters found, it decreases the amount of characters it looks back, and repeats.

def fix(paragraph, holeChar = "#"):
    lines = paragraph.split("\n")
    maxLineWidth = max(map(len, lines))
    lines = [list(line + " " * (maxLineWidth - len(line))) for line in lines]
    holes = filter(lambda pos: lines[pos[0]][pos[1]] == holeChar, [[y, x] for x in range(maxLineWidth) for y in range(len(lines))])

    n = 0
    for hole in holes:
        for i in range(min(hole[1], 5), 0, -1):
            currCh = lines[hole[0]][hole[1]]
            over = lines[hole[0] - 1][hole[1]]
            left = lines[hole[0]][hole[1] - i : hole[1]]

            same = []
            almost = []
            for y, line in enumerate(lines):
                for x, ch in enumerate(line):
                    if ch == holeChar:
                        continue
                    if ch == left[-1] == " ":
                        continue
                    chOver = lines[y - 1][x]
                    chLeft = lines[y][x - i : x]
                    if chOver == over and chLeft == left:
                        same.append(ch)
                    if chLeft == left:
                        almost.append(ch)
            sortFunc = lambda x, lst: lst.count(x) / (paragraph.count(x) + 10) + lst.count(x)
            if same:
                newCh = sorted(same, key=lambda x: sortFunc(x, same))[-1]
            elif almost:
                newCh = sorted(almost, key=lambda x: sortFunc(x, almost))[-1]
            else:
                continue
            lines[hole[0]][hole[1]] = newCh
            break


    return "\n".join(map("".join, lines))

Here's the result of Badger, Badger, Badger:

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mushroom, mushroom, a-                 
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mushroom, mushroom, a- b               
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Mush-mushroom, a                       
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger 
Argh! Snake, a snake!                  
Snaaake! A snaaaake, oooh its a snake! 

Here's the result from the proof:

1.Let ACB be a right-angled triangle with right angle CAB.                 
2.On each of the sides BC, AB, and CA, squares are drawn,                  
CBDE, BAGF, and ACIH, in that order. The construction of                   
squares requires the immediately preceding theorems in Euclid,             
and depends upon the parallel postulate. [footnote 14]                     
3.From A, draw a line parallel to BD and CE. It will                       
perpendicularly intersect BC and DE at K and L, respectively.              
4.Join CF and AD, to form the triangles BCF and BDA.                       
5.Angles CAB and BAG are both right angles; therefore C, A,                
and G are the same areamilarly for B, A, and H.                            
6.Angles CAB and CA, sqe both right angles; therefore angle ABD            
equals angle ABD becaus both are the sum of a right angle and angle ABC.   
7.Since ABD because theFB and BD is equal to BC, triangle ABD              
must be construction ofiangle FBC.                                         
8.Since A-angle ABD becight line, parallel to BD, then rectangle           
BDLK has the same area  of triangle ABD because they share the base        
BD and have the base thtitude BK, i.e., a line normal to their common      
base, conngle and G, sqrallel lines BD and AL. (lemma 2)                   
9.Since C = BD × BK + with A and G, square BAGF must be twice in area     
to triangle FBC. (lemma                                                    
10.Therefore angle and  BDLK must have the same area as square BAGF = AB^2.
11.Similarly for B, A,  shown that rectangle CKLE must have the same       
area as square ACIH = AC^2.                                                
12.Adding these two results, AB^2 + AC^2 = BD × BK + KL × KC             
13.Since BD = KL, BD × BK + KL × KC = BD(BK + KC) = BD × BC             
14.Therefore, AB^2 + AC^2 = BC^2, since CBDE is a square.

And the result of Jabberwocky:

'Twas brillig, and the slithy toves
Did gyre and the mo in the wabe;   
All mimsy toves, anborogoves,      
And the mome raths outgrabe.       

Loovjo

Posted 2016-07-29T12:52:31.387

Reputation: 7 357

5That Badger one is pretty impressive, and Jabberwocky looks like it could be the legit poem. Nice work. – AdmBorkBork – 2016-07-29T16:29:06.417

6

Python 2

This is a pretty straight-forward solution. It creates a sample string composed of words that are between average word length A-(A/2) and A+(A/2), then it applies leading and trailing space trimmed chunks from the sample to the patch area. It doesn't handle capitalization, and I'm sure there's a curveball test case out there that would break it, but it does okay on the examples. See the link below to run all the tests.

I also put a patch in the code for good measure.

def patch(paragraph):
    sample = [x.split() for x in paragraph if x.count('#') < 1]
    length = max([x.count('#') for x in paragraph if x.find('#')])
    s = sum(####################
    sample,[####################
    ])      ####################
    avg=sum(####################
    len(w)  ####################
    for w in####################
    s)//len(s)
    avg_range = range(avg-(avg//2),avg+(avg//2))
    sample = filter(lambda x:len(x) in avg_range, s)
    height=0
    for line in paragraph:
        if line.find('#'):height+=1
        print line.replace('#'*length,' '.join(sample)[(height-1)*length:height*length].strip())
    print '\n'

Lorem Ipsum, original then patched:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conseore dolore magnairure dolor in reprehenderit
in voluptate velit esse cilenim minim quisgiat nulla pariatur. Excepteur
sint occaecat cupidatat nonnisi mollit aniin culpa qui officia deserunt
mollit anim id est laborum.

Try it

atlasologist

Posted 2016-07-29T12:52:31.387

Reputation: 2 945

3Hehe mushroger ... – AdmBorkBork – 2016-07-29T15:43:23.237

Well, it doesn't patch your code in an interesting way. – mbomb007 – 2016-08-02T15:10:07.770

@mbomb007 that's because of the other # characters in the code. – atlasologist – 2016-08-02T17:58:37.677

@atlasologist Even if you change them to something else like @, nothing interesting. – mbomb007 – 2016-08-02T18:04:50.940

4

Java Shakespeare

Who needs a grasp of standard english conventions? Just make your own! Just like the bard was allowed to make up his own words. This bot doesn't worry to much about correcting the cut off words, he really just inserts random words. The result is some beautiful Poetry. As a bonus feature the bard is of a higher caliber and can handle multiple holes provided they are the same size!


Sample Input

 From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender############bear his memory:
  But thou c############ thine own bright eyes,
  Feed'st th############ame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st was############ding:
    Pity the world, or else t############be,
    To eat the world's due, b############and thee.


                     2
  When forty winters shall besiege thy brow,
  And dig deep trenches in thy beauty's field,
  Thy youth's proud livery so gazed on now,
  Will be a tattered weed of small worth held:  
  Then being asked, where all thy beauty lies,
  Where all the treasure of thy lusty days;
  To say within thine own deep sunken eyes,
  Were an all-eating shame, and thriftless praise.
  How much more praise deserved thy beauty's use,
  If thou couldst answer 'This fair child of mine
  Shall sum my count, and make my old excuse'
  Proving his beauty by succession thine.
    This were to be new made when thou art old,
    And see thy blood warm when thou feel'st it cold.


                     3
  Look in thy glass and tell the face thou viewest,
  Now is the time that face should form another,
  Whose fresh repair if now thou not renewest,
  Thou dost beguile the world, unbless some mother.
  For where is she so fair whose uneared womb
  Disdains the tillage of thy husbandry?
  Or who is he so fond will be the tomb,
  Of his self-love to stop posterity?  
  Thou art thy mother's glass and she in thee
  Calls back the lovely April of her prime,
  So thou through windows of thine age shalt see,
  Despite of ############s thy golden time.
    But if th############mbered not to be,
    Die singl############image dies with thee.

Beautiful output

 From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender should of bear his memory:
  But thou c all mbered  thine own bright eyes,
  Feed'st th Proving Or ame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st was he Thou my ding:
    Pity the world, or else t So the the be,
    To eat the world's due, b t thine so and thee.


                     2
  When forty winters shall besiege thy brow,
  And dig deep trenches in thy beauty's field,
  Thy youth's proud livery so gazed on now,
  Will be a tattered weed of small worth held:  
  Then being asked, where all thy beauty lies,
  Where all the treasure of thy lusty days;
  To say within thine own deep sunken eyes,
  Were an all-eating shame, and thriftless praise.
  How much more praise deserved thy beauty's use,
  If thou couldst answer 'This fair child of mine
  Shall sum my count, and make my old excuse'
  Proving his beauty by succession thine.
    This were to be new made when thou art old,
    And see thy blood warm when thou feel'st it cold.


                     3
  Look in thy glass and tell the face thou viewest,
  Now is the time that face should form another,
  Whose fresh repair if now thou not renewest,
  Thou dost beguile the world, unbless some mother.
  For where is she so fair whose uneared womb
  Disdains the tillage of thy husbandry?
  Or who is he so fond will be the tomb,
  Of his self-love to stop posterity?  
  Thou art thy mother's glass and she in thee
  Calls back the lovely April of her prime,
  So thou through windows of thine age shalt see,
  Despite of  Look gazed s thy golden time.
    But if th When be, mbered not to be,
    Die singl repair the image dies with thee.

The last couple lines are deeply poetic if I do say so myself. It performs suprisingly well on the gettysburg address as well.

But, in a larger sense, we can not dedicate, we can not consecrate, we can not
hallow this ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The world will
little note, nor long remember what we say here, but it can never forget what
they did here. It is for us the living, rather, to be dedicated here to the
unfinished work which they who fought here h to of rather us of advanced. It
is rather for us to be here dedicated to the who be it, vain who before us 
that from these honored dead we take increas be dead the the what use for which
they gave the last full measure of devotion  dead government The solve that
these dead shall not have died in vain that  the take nor world , shall have
a new birth of freedom and that government of the people, by the people, for
the people, shall not perish from the earth.


Lets see what makes Shakespeare tick. Here is the code. Essentially he strives to build a vocabulary base from the input. He then uses these words and randomly places them in the hole (ensuring that it fits nicely). He is deterministic as he uses a fixed seed for randomness.

package stuff;

import java.io.File;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Random;
import java.util.Scanner;
import java.util.Stack;

/**
 *
 * @author rohan
 */
public class PatchTheParagraph {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
Scanner in = new Scanner(System.in);
System.out.println("File Name :");
        String[] text = getWordsFromFile(in.nextLine());
System.out.println("==ORIGINAL==");
        for(String s:text){
    System.out.println(s);
}
                    int lengthOfHole= 0;
        int rows = 0;
            for(String s: text){
                s = s.replaceAll("[^#]", "");

//      System.out.println(s);
                if(s.length()>0){
                    lengthOfHole = s.length();
                rows++;
                }
            }
            ArrayList<String> words = new ArrayList<>();
            words.add("I");
            for(String s:text){
                String[] w = s.replaceAll("#", " ").split(" ");
for(String a :w){
    words.add(a);
            }

            }
                        Iterator<String> j = words.iterator();
            while(j.hasNext()){
                String o;
                if((o = j.next()).equals("")){
                    j.remove();
                }
            }
            System.out.println(words);
            Stack<String> out = new Stack<>();
            String hashRow = "";
            for(int i = 0;i<lengthOfHole;i++){
                hashRow+="#";
            }

        for(int i = 0;i<rows;i++){
            int length = lengthOfHole-1; 
            String outPut = " ";
            while(length>2){
String wordAttempt = words.get(getRandom(words.size()-1));
while(wordAttempt.length()>length-1){
 wordAttempt = words.get(getRandom(words.size()-1));
}           
length -= wordAttempt.length()+1;
            outPut+=wordAttempt;
                outPut+=" ";
            }
        out.push(outPut);
    }
System.out.println("==PATCHED==");
        for(String s : text){
            if(s.contains(hashRow)){
                System.out.println(s.replaceAll(hashRow,out.pop()));
            }else{
                System.out.println(s);
            }
        }
                                    }
public static final Random r = new Random(42);
    public static int getRandom(int max){
    return (int) (max*r.nextDouble());
}
    /**
     *
     * @param fileName is the path to the file or just the name if it is local
     * @return the number of lines in fileName
     */
    public static int getLengthOfFile(String fileName) {
        int length = 0;
        try {
            File textFile = new File(fileName);
            Scanner sc = new Scanner(textFile);
            while (sc.hasNextLine()) {
                sc.nextLine();
                length++;
            }
        } catch (Exception e) {
System.err.println(e);
        }
        return length;
    }

    /**
     *
     * @param fileName is the path to the file or just the name if it is local
     * @return an array of Strings where each string is one line from the file
     * fileName.
     */
    public static String[] getWordsFromFile(String fileName) {
        int lengthOfFile = getLengthOfFile(fileName);
        String[] wordBank = new String[lengthOfFile];
        int i = 0;
        try {
            File textFile = new File(fileName);
            Scanner sc = new Scanner(textFile);
            for (i = 0; i < lengthOfFile; i++) {
                wordBank[i] = sc.nextLine();
            }
            return wordBank;
        } catch (Exception e) {
            System.err.println(e);
            System.exit(55);
        }
        return null;
    }
}


Most of Shakespeare's poetry is public domain.

Rohan Jhunjhunwala

Posted 2016-07-29T12:52:31.387

Reputation: 2 569

Comments are not for extended discussion; this conversation has been moved to chat.

– Dennis – 2016-08-04T15:11:17.377

3

Python 2.7

Another Python solution with a different approach. My program sees the text as a Markov chain, where each letter is followed by another letter with a certain probability. So the first step is to build the table of probabilities. The next step is to apply that probabilities to the patch.

Full code, including one sample text is below. Because one example used unicode characters, I included an explicit codepage (utf-8) for compatibility with that example.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from collections import defaultdict
import numpy

texts = [
"""'Twas brillig, and the slithy toves
Did gyre a######### in the wabe;
All mimsy #########borogoves,
And the mome raths outgrabe."""
]

class Patcher:
    def __init__(self):
        self.mapper = defaultdict(lambda: defaultdict(int))

    def add_mapping(self, from_value, to_value):
        self.mapper[from_value][to_value] += 1

    def get_patch(self, from_value):
        if from_value in self.mapper:
            sum_freq = sum(self.mapper[from_value].values())
            return numpy.random.choice(
                self.mapper[from_value].keys(),
                p = numpy.array(
                    self.mapper[from_value].values(),dtype=numpy.float64) / sum_freq)
        else:
            return None

def add_text_mappings(text_string, patcher = Patcher(), ignore_characters = ''):
    previous_letter = text_string[0]
    for letter in text_string[1:]:
        if not letter in ignore_characters:
            patcher.add_mapping(previous_letter, letter)
            previous_letter = letter
    patcher.add_mapping(text_string[-1], '\n')

def patch_text(text_string, patcher, patch_characters = '#'):
    result = previous_letter = text_string[0]
    for letter in text_string[1:]:
        if letter in patch_characters:
            result += patcher.get_patch(previous_letter)
        else:
            result += letter
        previous_letter = result[-1]
    return result

def main():
    for text in texts:
        patcher = Patcher()
        add_text_mappings(text, patcher, '#')
        print patch_text(text, patcher, '#')
        print "\n"

if __name__ == '__main__':
    main()

Sample output for the Lorem Ipsum:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo conse Exe eut ccadamairure dolor in reprehenderit
in voluptate velit esse cilore indipserexepgiat nulla pariatur. Excepteur
sint occaecat cupidatat non upir alostat adin culpa qui officia deserunt
mollit anim id est laborum.

An extra poetic line in the Jabberwocky:

'Twas brillig, and the slithy toves
Did gyre and me the in the wabe;
All mimsy was
An inborogoves,
And the mome raths outgrabe.

agtoever

Posted 2016-07-29T12:52:31.387

Reputation: 2 661

Which example text has Unicode? They should all be straight ASCII. Please let me know, and I'll correct it. – AdmBorkBork – 2016-08-05T12:27:38.463

Python complains about the mınxomaτ in the first text, refering to PEP 263.

– agtoever – 2016-08-05T13:59:17.730

Ah - didn't even realize. I've edited that to be straight ASCII. Thanks for letting me know! – AdmBorkBork – 2016-08-05T14:08:28.023

2

C# 5 massive as ever

I threw this together, it's a bit of a mess, but it produces some OK results some of the time. It a mostly-deterministic algorithm, but with some (fixed-seed) randomness added to avoid it producing the same string for similar gaps. It goes to some effort to try to avoid just having columns of spaces either side of the gaps.

It works by tokenizing the input into words and punctuation (punctuation comes from a manually entered list, because I can't be bothered to work out if Unicode can do this for me), so that it can put spaces before words, and not before punctuation, because this is fairly typical. It splits on typical whitespace. In the vein of markov chains (I think), it counts how often each token follows each other token, and then doesn't compute probabilities for this (I figure that because the documents are so tiny, we would do better to bias toward things we see a lot where we can). Then we perform a breadth-first search, filling the space left by the hashes and the 'partial' words either side, with the cost being computed as -fabness(last, cur) * len(cur_with_space), where fabness returns the number of times cur has followed last for each appended token in the generated string. Naturally, we try to minimise the cost. Because we can't always fill the gap with words and punctuation found in the document, it also considers a number of 'special' tokens from certain states, including the partial strings on either side, which we bias against with arbitrarily increased costs.

If the BFS fails to find a solution, then we naively try to pick a random adverb, or just insert spaces to fill the space.

Results

All 6 can be found here: https://gist.github.com/anonymous/5277db726d3f9bdd950b173b19fec82a

The Euclid test-case did not go very well...

Patch the Image

In a popular image editing software there is a feature, that patches (The term
used in image processing is inpainting as @minxomat pointed out.) a selected
area of an image, that patches information outside of that patch. And it does a
quite good job, co the patch a is just a program. As a human, you can sometimes
see that something In a short it if you squeeze your eyes or just take a short
glance, the patch seems to fill in the gap quite well.

Jabberwocky

'Twas brillig, and the slithy toves
Did gyre and the in in the wabe;
All mimsy the mome borogoves,
And the mome raths outgrabe.

Badger

Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, mushroom, a-
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mushroom, badger, badger
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Mush-mushroom, a
Badger, badger, badger, badger, badger,
badger, badger, badger, badger, badger
Argh! Snake, a snake!
Snaaake! A snaaaake, oooh its a snake!

_I'm glad with how this one turned out... it's fortuate that "badger, badger," fits, or this one would not have done so well

Code

Run it with

csc ParaPatch.cs
ParaPatch.exe infile outfile

There is quite a lot of it. The only remotely interesting bit is the Fill method. I include the heap implementation, because .NET doesn't have one (WHY MS WHY?!).

using System;
using System.Collections.Generic;
using System.Linq;

namespace ParaPatch
{
    class Program
    {
        private static string[] Filler = new string[] { "may", "will", "maybe", "rather", "perhaps", "reliably", "nineword?", "definitely", "elevenword?", "inexplicably" }; // adverbs
        private static char[] Breaking = new char[] { ' ', '\n', '\r', '\t' };
        private static char[] Punctuation = new char[] { ',', '.', '{', '}', '(', ')', '/', '?', ':', ';', '\'', '\\', '"', ',', '!', '-', '+', '[', ']', '£', '$', '%', '^', '—' };

        private static IEnumerable<string> TokenizeStream(System.IO.StreamReader reader)
        {
            System.Text.StringBuilder sb = new System.Text.StringBuilder();

            HashSet<char> breaking = new HashSet<char>(Breaking);
            HashSet<char> punctuation = new HashSet<char>(Punctuation);

            while (!reader.EndOfStream)
            {
                int ci = reader.Read();
                if (ci == -1) // sanity
                    break;

                char c = (char)ci;

                if (breaking.Contains(c))
                {
                    if (sb.Length > 0)
                        yield return sb.ToString();
                    sb.Clear();
                }
                else if (punctuation.Contains(c))
                {
                    if (sb.Length > 0)
                        yield return sb.ToString();
                    yield return ""+c;
                    sb.Clear();
                }
                else
                {

                    sb.Append(c);
                }
            }

            if (sb.Length > 0)
                yield return sb.ToString();
        }

        private enum DocTokenTypes
        {
            Known,
            LeftPartial,
            RightPartial,
            Unknown,
        }

        private class DocToken
        {
            public DocTokenTypes TokenType { get; private set; }
            public string StringPart { get; private set; }
            public int Length { get; private set; }

            public DocToken(DocTokenTypes tokenType, string stringPart, int length)
            {
                TokenType = tokenType;
                StringPart = stringPart;
                Length = length;
            }
        }

        private static IEnumerable<DocToken> DocumentTokens(IEnumerable<string> tokens)
        {
            foreach (string token in tokens)
            {
                if (token.Contains("#"))
                {
                    int l = token.IndexOf("#");
                    int r = token.LastIndexOf("#");

                    if (l > 0)
                        yield return new DocToken(DocTokenTypes.LeftPartial, token.Substring(0, l), l);

                    yield return new DocToken(DocTokenTypes.Unknown, null, r - l + 1);

                    if (r < token.Length - 1)
                        yield return new DocToken(DocTokenTypes.RightPartial, token.Substring(r + 1), token.Length - r - 1);
                }
                else
                    yield return new DocToken(DocTokenTypes.Known, token, token.Length);
            }
        }

        private class State : IComparable<State>
        {
            // missing readonly params already... maybe C#6 isn't so bad
            public int Remaining { get; private set; }
            public int Position { get; private set; }
            public State Prev { get; private set; }
            public string Token { get; private set; }
            public double H { get; private set; }
            public double Fabness { get; private set; }
            public string FullFilling { get; private set; }

            public State(int remaining, int position, Program.State prev, double fabness, double h, string token, string toAdd)
            {
                Remaining = remaining;
                Position = position;
                Prev = prev;
                H = h;
                Fabness = fabness;
                Token = token;

                FullFilling = prev != null ? prev.FullFilling + toAdd : toAdd;
            }

            public int CompareTo(State other)
            {
                return H.CompareTo(other.H);
            }
        }

        public static void Main(string[] args)
        {
            if (args.Length < 2)
                args = new string[] { "test.txt", "testout.txt" };

            List<DocToken> document;
            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0], System.Text.Encoding.UTF8))
            {
                document = DocumentTokens(TokenizeStream(reader)).ToList();
            }

            foreach (DocToken cur in document)
            {
                Console.WriteLine(cur.StringPart + " " + cur.TokenType);
            }

            // these are small docs, don't bother with more than 1 ply
            Dictionary<string, Dictionary<string, int>> FollowCounts = new Dictionary<string, Dictionary<string, int>>();
            Dictionary<string, Dictionary<string, int>> PreceedCounts = new Dictionary<string, Dictionary<string, int>>(); // mirror (might be useful)

            HashSet<string> knowns = new HashSet<string>(); // useful to have lying around

            // build counts
            DocToken last = null;
            foreach (DocToken cur in document)
            {
                if (cur.TokenType == DocTokenTypes.Known)
                {
                    knowns.Add(cur.StringPart);
                }

                if (last != null && last.TokenType == DocTokenTypes.Known && cur.TokenType == DocTokenTypes.Known)
                {
                    {
                        Dictionary<string, int> ltable;
                        if (!FollowCounts.TryGetValue(last.StringPart, out ltable))
                        {
                            FollowCounts.Add(last.StringPart, ltable = new Dictionary<string, int>());
                        }

                        int count;
                        if (!ltable.TryGetValue(cur.StringPart, out count))
                        {
                            count = 0;
                        }
                        ltable[cur.StringPart] = count + 1;
                    }


                    {
                        Dictionary<string, int> ctable;
                        if (!PreceedCounts.TryGetValue(cur.StringPart, out ctable))
                        {
                            PreceedCounts.Add(cur.StringPart, ctable = new Dictionary<string, int>());
                        }

                        int count;
                        if (!ctable.TryGetValue(last.StringPart, out count))
                        {
                            count = 0;
                        }
                        ctable[last.StringPart] = count + 1;
                    }
                }

                last = cur;
            }

            // build probability grid (none of this efficient table filling dynamic programming nonsense, A* all the way!)
            // hmm... can't be bothered
            Dictionary<string, Dictionary<string, double>> fabTable = new Dictionary<string, Dictionary<string, double>>();
            foreach (var k in FollowCounts)
            {
                Dictionary<string, double> t = new Dictionary<string, double>();

                // very naive
                foreach (var k2 in k.Value)
                {
                    t.Add(k2.Key, (double)k2.Value);
                }

                fabTable.Add(k.Key, t);
            }

            string[] knarr = knowns.ToArray();
            Random rnd = new Random("ParaPatch".GetHashCode());

            List<string> fillings = new List<string>();
            for (int i = 0; i < document.Count; i++)
            {
                if (document[i].TokenType == DocTokenTypes.Unknown)
                {
                    // shuffle knarr
                    for (int j = 0; j < knarr.Length; j++)
                    {
                        string t = knarr[j];
                        int o = rnd.Next(knarr.Length);
                        knarr[j] = knarr[o];
                        knarr[o] = t;
                    }

                    fillings.Add(Fill(document, fabTable, knarr, i));
                    Console.WriteLine(fillings.Last());
                }
            }

            string filling = string.Join("", fillings);

            int fi = 0;

            using (System.IO.StreamWriter writer = new System.IO.StreamWriter(args[1]))
            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0]))
            {
                while (!reader.EndOfStream)
                {
                    int ci = reader.Read();
                    if (ci == -1)
                        break;

                    char c = (char)ci;
                    c = c == '#' ? filling[fi++] : c;

                    writer.Write(c);
                    Console.Write(c);
                }
            }

//            using (System.IO.StreamWriter writer = new System.IO.StreamWriter(args[1], false, System.Text.Encoding.UTF8))
//            using (System.IO.StreamReader reader = new System.IO.StreamReader(args[0]))
//            {
//                foreach (char cc in reader.ReadToEnd())
//                {
//                    char c = cc;
//                    c = c == '#' ? filling[fi++] : c;
//                    
//                    writer.Write(c);
//                    Console.Write(c);
//                }
//            }

            if (args[0] == "test.txt")
                Console.ReadKey(true);
        }

        private static string Fill(List<DocToken> document, Dictionary<string, Dictionary<string, double>> fabTable, string[] knowns, int unknownIndex)
        {
            HashSet<char> breaking = new HashSet<char>(Breaking);
            HashSet<char> punctuation = new HashSet<char>(Punctuation);

            Heap<State> due = new Heap<Program.State>(knowns.Length);

            Func<string, string, double> fabness = (prev, next) =>
            {
                Dictionary<string, double> table;
                if (!fabTable.TryGetValue(prev, out table))
                    return 0; // not fab
                double fab;
                if (!table.TryGetValue(next, out fab))
                    return 0; // not fab
                return fab; // yes fab
            };

            DocToken mostLeft = unknownIndex > 2 ? document[unknownIndex - 2] : null;
            DocToken left = unknownIndex > 1 ? document[unknownIndex - 1] : null;
            DocToken unknown = document[unknownIndex];
            DocToken right = unknownIndex < document.Count - 2 ? document[unknownIndex + 1] : null;
            DocToken mostRight = unknownIndex < document.Count - 3 ? document[unknownIndex + 2] : null;

            // sum of empty space and partials' lengths
            int spaceSize = document[unknownIndex].Length
                + (left != null && left.TokenType == DocTokenTypes.LeftPartial ? left.Length : 0)
                + (right != null && right.TokenType == DocTokenTypes.RightPartial ? right.Length : 0);

            int l = left != null && left.TokenType == DocTokenTypes.LeftPartial ? left.Length : 0;
            int r = l + unknown.Length;

            string defaultPrev =
                left != null && left.TokenType == DocTokenTypes.Known ? left.StringPart :
                mostLeft != null && mostLeft.TokenType == DocTokenTypes.Known ? mostLeft.StringPart :
                "";

            string defaultLast =
                right != null && right.TokenType == DocTokenTypes.Known ? right.StringPart :
                mostRight != null && mostRight.TokenType == DocTokenTypes.Known ? mostRight.StringPart :
                "";

            Func<string, string> topAndTail = str =>
            {
                return str.Substring(l, r - l);
            };

            Func<State, string, double, bool> tryMove = (State prev, string token, double specialFabness) => 
            {
                bool isPunctionuation = token.Length == 1 && punctuation.Contains(token[0]);
                string addStr = isPunctionuation || prev == null ? token : " " + token;
                int addLen = addStr.Length;

                int newRemaining = prev != null ? prev.Remaining - addLen : spaceSize - addLen;
                int oldPosition = prev != null ? prev.Position : 0;
                int newPosition = oldPosition + addLen;

                // check length
                if (newRemaining < 0)
                    return false;

                // check start
                if (oldPosition < l) // implies left is LeftPartial
                {
                    int s = oldPosition;
                    int e = newPosition > l ? l : newPosition;
                    int len = e - s;
                    if (addStr.Substring(0, len) != left.StringPart.Substring(s, len))
                        return false; // doesn't match LeftPartial
                }

                // check end
                if (newPosition > r) // implies right is RightPartial
                {
                    int s = oldPosition > r ? oldPosition : r;
                    int e = newPosition;
                    int len = e - s;
                    if (addStr.Substring(s - oldPosition, len) != right.StringPart.Substring(s - r, len))
                        return false; // doesn't match RightPartial
                }

                if (newRemaining == 0)
                {
                    // could try to do something here (need to change H)
                }

                string prevToken = prev != null ? prev.Token : defaultPrev;
                bool isLastunctionuation = prevToken.Length == 1 && punctuation.Contains(prevToken[0]);

                if (isLastunctionuation && isPunctionuation) // I hate this check, it's too aggresive to be realistic
                    specialFabness -= 50;

                double fab = fabness(prevToken, token);

                if (fab < 1 && (token == prevToken))
                    fab = -1; // bias against unrecognised repeats

                double newFabness = (prev != null ? prev.Fabness : 0.0)
                    - specialFabness // ... whatever this is
                    - fab * addLen; // how probabilistic

                double h = newFabness; // no h for now

                State newState = new Program.State(newRemaining, newPosition, prev, newFabness, h, token, addStr);

//                Console.WriteLine((prev != null ? prev.Fabness : 0) + "\t" + specialFabness);
//                Console.WriteLine(newFabness + "\t" + h + "\t" + due.Count + "\t" + fab + "*" + addLen + "\t" + newState.FullFilling);

                due.Add(newState);
                return true;
            };

            // just try everything everything
            foreach (string t in knowns)
                tryMove(null, t, 0);

            if (left != null && left.TokenType == DocTokenTypes.LeftPartial)
                tryMove(null, left.StringPart, -1);

            while (!due.Empty)
            {
                State next = due.RemoveMin();

                if (next.Remaining == 0)
                {
                    // we have a winner!!
                    return topAndTail(next.FullFilling);
                }

                // just try everything
                foreach (string t in knowns)
                    tryMove(next, t, 0);
                if (right != null && right.TokenType == DocTokenTypes.RightPartial)
                    tryMove(next, right.StringPart, -5); // big bias
            }

            // make this a tad less stupid, non?
            return Filler.FirstOrDefault(f => f.Length == unknown.Length) ?? new String(' ', unknown.Length); // oh dear...
        }
    }

    //
    // Ultilities
    //

    public class Heap<T> : System.Collections.IEnumerable where T : IComparable<T>
    {
        // arr is treated as offset by 1, all idxes stored need to be -1'd to get index in arr
        private T[] arr;
        private int end = 0;

        private void s(int idx, T val)
        {
            arr[idx - 1] = val;
        }

        private T g(int idx)
        {
            return arr[idx - 1];
        }

        public Heap(int isize)
        {
            if (isize < 1)
                throw new ArgumentException("Cannot be less than 1", "isize");

            arr = new T[isize];
        }

        private int up(int idx)
        {
            return idx / 2;
        }

        private int downLeft(int idx)
        {
            return idx * 2;
        }

        private int downRight(int idx)
        {
            return idx * 2 + 1;
        }

        private void swap(int a, int b)
        {
            T t = g(a);
            s(a, g(b));
            s(b, t);
        }

        private void moveUp(int idx, T t)
        {
        again:
            if (idx == 1)
            {
                s(1, t);
                return; // at end
            }

            int nextUp = up(idx);
            T n = g(nextUp);
            if (n.CompareTo(t) > 0)
            {
                s(idx, n);
                idx = nextUp;
                goto again;
            }
            else
            {
                s(idx, t);
            }
        }

        private void moveDown(int idx, T t)
        {
        again:
            int nextLeft = downLeft(idx);
            int nextRight = downRight(idx);

            if (nextLeft > end)
            {
                s(idx, t);
                return; // at end
            }
            else if (nextLeft == end)
            { // only need to check left
                T l = g(nextLeft);

                if (l.CompareTo(t) < 0)
                {
                    s(idx, l);
                    idx = nextLeft;
                    goto again;
                }
                else
                {
                    s(idx, t);
                }
            }
            else
            { // check both
                T l = g(nextLeft);
                T r = g(nextRight);

                if (l.CompareTo(r) < 0)
                { // left smaller (favour going right if we can)
                    if (l.CompareTo(t) < 0)
                    {
                        s(idx, l);
                        idx = nextLeft;
                        goto again;
                    }
                    else
                    {
                        s(idx, t);
                    }
                }
                else
                { // right smaller or same
                    if (r.CompareTo(t) < 0)
                    {
                        s(idx, r);
                        idx = nextRight;
                        goto again;
                    }
                    else
                    {
                        s(idx, t);
                    }
                }
            }
        }

        public void Clear()
        {
            end = 0;
        }

        public void Trim()
        {
            if (end == 0)
                arr = new T[1]; // don't /ever/ make arr len 0
            else
            {
                T[] narr = new T[end];
                for (int i = 0; i < end; i++)
                    narr[i] = arr[i];
                arr = narr;
            }
        }

        private void doubleSize()
        {
            T[] narr = new T[arr.Length * 2];
            for (int i = 0; i < end; i++)
                narr[i] = arr[i];
            arr = narr;
        }

        public void Add(T item)
        {
            if (end == arr.Length)
            {
                // resize
                doubleSize();
            }

            end++;
            moveUp(end, item);
        }

        public T RemoveMin()
        {
            if (end < 1)
                throw new Exception("No items, mate.");

            T min = g(1);

            end--;
            if (end > 0)
                moveDown(1, g(end + 1));

            return min;
        }

        public bool Empty
        {
            get
            {
                return end == 0;
            }
        }

        public int Count
        {
            get
            {
                return end;
            }
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        public IEnumerator<T> GetEnumerator()
        {
            return (IEnumerator<T>)arr.GetEnumerator();
        }
    }
}

VisualMelon

Posted 2016-07-29T12:52:31.387

Reputation: 3 810