Put together cut strips of paper to get original message

4

This question is from a programming challenge (not DARPA) which has ended. A sheet of paper consists of message, which has been shredded into vertical strips. Given the vertical strips and a dictionary of valid words, the goal is to put together the original message.

The vertical strips contain only one character from a line. The input to the program is the number of vertical strips (N), number of lines in the original message (K) and a string containing the characters in each vertical strip. Then the number of words in the dictionary and the list of valid words.

EXAMPLE INPUT

10   
3  
"MrM"  
"ak "  
"Gaa"  
"oM "  
" d "  
"t 3"  
" aP"  
"nt "  
"ont"  
"ie "  
15  
1  
2  
3  
4  
5  
AND  
AM  
AT  
GO  
IN  
Main  
Market  
ON  
PM  
TO  

OUTPUT

Go to Main  
and Market  
at 3 PM  

Description of the input above
10 - number of vertical strips
3 - number of lines in the original message corresponds to number of characters in each string of the veritcal strips
{actual strings} - pay careful attention to spaces in strings
15 - number of words in the dictionary
{valid words}

I was able to create a map of characters -> count of each character, for each line from the vertical strips. Then parsing the dictionary and looking at the character map, you can recreate words in each line. Looks like finding the order of words is the difficult problem. What is the best approach to finding the correct message?

cppcoder

Posted 2014-08-28T13:27:33.837

Reputation: 143

Dupe? – Beta Decay – 2015-08-15T23:57:08.170

You'll need to specify an objective winning criterion, from your question you seem to be looking for fastest-algorithm, but I would suggest code-golf. – overactor – 2014-08-28T14:20:00.657

I would suggest to remove the codegolf tag if you don't want brute-force answers. Also you should format the input and output example using code block to align characters. – Ray – 2014-08-28T20:12:23.007

Answers

5

Python 3 - Trie and DFS

Trie is a tree shape data structure. If we build a trie from a collection of words, then each path from the root to a leaf node will represent a word from the collection.

To solve this problem, first we build a trie using the valid words. Suppose the original text has n rows and m columns, then we will have m strips to put together. We try to reconstruct the original text from left to right, column by column, using a recursive depth-first search. To perform branch elimination, we also maintain a list of heads on the trie, name it heads. Each head points to a node on the trie and point to the root at first. We will need n heads, each correspond to a row. Each time after a column(strip) is appended, heads[i] will be moved according to the character strip[i]. If we found any head that cannot be moved, then we know the column just appended is incorrect. After putting m columns, we get the original text.

I implemented this algorithm using Python. It is fast enough to solve a 40x80 case with 10000 valid word in instant.

I use this python 3 script to generate test cases. Run it with:

python3 make_script.py {dict path} {n rows} {n cols} > intput.txt

The following is a 40 x 80 case. The dictionary file can be found here, which has 10000 words.

80
40
c el efp e otl  desaiweseeme  jomp ebc g
synwds  irnsatseityl rkutpsiysei ssraat 
gvmjewrell fclpadrll nbsvwiwteqdeniditba
uet criderndm eesih dalbrcra d iacnnoi e
lbipoeivedrwddivensne aeihrpaa eicangpni
 gontgtulvcbatdo glunsniiguoleu eigtbelh
xmsyyu ereio edvrttdonure tpselriniipnxs
ssdieop bgkg  wu  eloyr dllrhza t hic af
rttruminetnt rcu rasa ogosrywrnietvldgod
lpe aritour alolt yeomlpimmxryftenen  sn
cneregsc igodbsigeca r kbhiiisoss eagglp
crm nhdreh raokdtr rcfiotpsfgtjanotaaiaf
s pntenpnmv plemiaaxeeittgtg ncnaooaou a
srrganeu ut toaantetedccnr seiise eur ni
 xpndg psafl ooemlusof nba pauosina esnb
ge du  tbfe  spuoeuesen o s um ldsted  w
eto rs l nhuc caexr bsoptwcrrnaennrasary
e  maldneg nep nmaid isie ko ctedh necaa
esoeisi u ifran  rgirrts ntipuneccnvftrp
gui oeriwraisc og vsepharn latotts aenra
eenylorobneyrem nwtefrasdeoeet ot  osdsr
abia e i iwrcneaaiaeeu lie oihotgdicntdi
euugmjsdfcamazndei e yoagaserxatrultoaio
tamdfesgtitc xseuegs  fene thatfueo l ut
ocn sdrtn mai aerwmimoss te  oceorrit el
netbu wymsrfobelltemetrisioao uri  l ce 
deaa ee e aadrgcoln cnto semroi  ofdnanu
aeaunrmvonirieaxapicnaioxaoromeviiuaeoou
 oaodtb hpretdarie n e recei etunsrsedal
ejht n jy ees tnr gsegr s  e jhb ayegup 
ohids bi g iesiogs rytimtorrsab pnirwlc 
ieuereaklieapeatlan sdeeg e lasedaob  be
rihbegaisvdtrprnatservaraco na imtcgplna
denrdnnelymisenimi gnfw bssalupcnansu et
itsrwnbeclwfmstasg craaim ei  atrifvpr  
ubksogxernu auletinieo krte cgmltiliioir
atnhndcutigiasuir ntsaiauem usiireg idms
yniis ssle aisicsngqseazdifiertercaauvoc
ds prcatnoiaafsiopiratdrrmksnfidakr  gon
p ngaoolyeolg nep oae dnsnln srfcontsrve
boges massscgss arsobod ierseh rnivtagsp
nlstadhndnroieecntc slhoucbrcouo uohtiis
ltainnpif nfrn tinrdoi ninsanteu n   aeb
oun htoioaftupedsmowtmeatanyi sltnh ftto
a emgrasancass oelrefddy alacinsut n s  
kgttnltednrmea t yhug welentn tu fsneeah
yia bilgwybnisgfrmidlh t rssnaiultgagk a
  iaeeorega sdbhc otadiresdagcipcuet rde
tygui artnsot le wtes etrracm pde oev e 
cvrpsabemddamylpeiscdmktpnnolbsesardlwro
rnirmestiwntlrfresosedlvshydrsrrucnntr a
nans hinslmoxenfeolo cxeeneosgrphantiuyg
m dans llnrtlav ensptun acfaev s gtesfad
litl gonungirgk pioeutvinsstgintpsmbrdsu
altpjretooode rtnseeyepdrotllxric s rice
aikaunnskdoh agtnl cuegoacwhnsx esrurgg 
ef drant  rsaaeymreleiiessmniesdoeeroeer
eereozurmoicueeegtcolbeioteodocrsnaeaatt
 h cre g  snsihssatlunisi odeelcpminoeba
oisstnirhpmnaybaertcnenhdnondsunaaprlipr
vetou egriemerpicacb  ygmiaw rrrocnisecm
ibs tara  neeiaqlbbtsgegcs sooeoeuel ten
blrkqaiacortqmn ien ylrrrnecieircs qeind
en s e ieu cnaiscikuirocnr hlnpfjooimohs
dctlehtr ntlgpddeoipmog elllrnncnkgcseeu
o  aaet henyaue mbntgeeselsws   mmawnwve
lns e xoorail yn  eesnee uctaud nrasenha
swydiecacaefolsenmttnsoer ohmahttpt radc
yvltaxin pgrauluktystkmot mrf aluuiepl  
b sea usksoim hr idhtttd e owrn cnferevc
lstsibaocesfsmbejsdaellmepmbdedoffdbrtmq
axou nlnae ise gn esse  ihbstgsfmeteestf
yeaasra oopaeaildkiynteteidilnuenec vral
msylcdecsy  lkgr  icdkxrmie strle fei sz
 rdae gde al ngswsc soaru  tlsrtg durfne
 rrpodl i asg  essvuo aeaaheenaloe  geru
grus nbrls eqme ydioradhsoolmdiouoggnssv
 tfirotrkdnnnrtmwenan   abs cfcigscmpsed
 a soteiicdero nsaeuisrmisntau iirammlyi
anroiim tsiiaarrhoyaimdo oera dblogysmr 

Solved:

lacerated deliberateness canvas gould scold sexually piggyback obey agrimony moe
seventy except fixates warble shrubbery instrumenting beveling convolute hairs j
tarring aptness homotopy misted unkindness frustrations marketing latitudinary h
superhumanly redbud bends atomics sprawls piggy oiling djakarta estop reads plat
insomnia delaware fruition auger hooded terramycin statuesqueness adjourns boca 
brazed leghorn agnes seethed roentgen sen conjuring goa wrangled extremes hidden
ambuscade tribunals winced heap boxing fixates impostor rainstorm iberia billet 
overturn protestingly pairings grieved protruded insulate ascertain tinkling c j
committees bock sat muncie drab lorelei honk fretfully block dense howell swishy
endowing annuls veins machining sandy reproducers neve flooding supportingly ye 
sidings afterword thrived wrecks furman maintaining cone programs groaner mba ne
fractionally fisticuff ferromagnet wilsonian modifiable fathom accreditations ye
simulated grammars corporacies squads atalanta marriage ceq designates plexiglas
meyers propels apex ballooners impudently froze angst islamabad saud creases ku 
baleful godmother scenes keep wheelings bystander kidnappings basilar caving get
experienced laryngeal mendacious devise animadvert coequal tithe surtout offer n
jaeger moments manuel instance systemwide ownerships plodding cracking legers mr
spits wallow girt extramarital administer petitioning berkeley writes ransoms b 
discontinuity desegregate accretions cysteine thyroglobulin chomsky evans living
accosted speechless mixture bellowing lacerated adequately cautiousness products
endless comforters berenices fourteens insane odiousness nyu gambit yeasty lodge
lambda informative stressful dynamo forwent dynamites gentler doorkeep dutch keg
likeliest galatia fortiori hydride awakened could vandenberg wisdom phoenix axer
motivation spider epistemology shake rusher carbonizing strokers cordage meters 
exposure bedim saints tritium distributed ranger indiscoverable interrogate ames
panther salem eschewing special oaths penumbra consigns winchester cons conrail 
moneymake lome mob cottons ballooners smocks stressful sidewinder met refreshes 
brood complexion straightforwardly patients separations wichita shrilly arose we
doldrum rarer winthrop magic chemically dancers angel outlining elf lawlessness 
emboss county reagan unauthorized gauss suffixed tiresomeness cohn extravagant j
description fans staunch journalism prejudicial denture quixotic patrons briar h
override scott differentiators collection distribute folder superfluities pull b
fissured intercommunicating outputting managerial precedences conjuncted phloem 
fiance honk nineteen coproduct monica sparks unconsciousness furious stagnate ma
durango fag effector notation highlands parceling magnetic reservoirs voting fay
baden end converge alva machining insurers mutiny battled quantities alberta ewe
relativeness propels formants configurable propos rubs diverge tampered swigging
toward cased reels actualities estop facings animadvert trigger golding fluke wu
mort meanness venturer dyadic abstinent phoenix resolve bangladesh acrobacy rsvp
quotas auburn crafty pacifism favorite grandiose buchenwald phelps leaded gauze 

Code

def build_trie(words):
    root = {}
    for word in words:
        node = root
        for c in word:
            if c in node:
                node1 = node[c]
            else:
                node1 = node[c] = {}
            node = node1
    return root

def search_origins(root, strips):
    n_rows = len(strips[0])
    n_cols = len(strips)
    ordered = [None] * n_cols
    heads = [root] * n_rows
    used = set()

    def get_next(node, c):
        if c == ' ':
            return root
        return node.get(c, None)

    def search(col_id, heads):
        if col_id == n_cols:
            yield transpose(ordered)
            return
        for i in range(n_cols):
            if i in used:
                continue
            heads1 = [get_next(head, c) for c, head in zip(strips[i].lower(), heads)]
            if any(x is None for x in heads1):
                continue
            used.add(i)
            ordered[col_id] = strips[i]
            yield from search(col_id + 1, heads1)
            used.remove(i)
    yield from search(0, heads)

def transpose(cols):
    return '\n'.join(map(''.join, zip(*cols)))

def parse():
    m = int(input())
    n = int(input())
    strips = []
    for i in range(m):
        strip = input()
        if strip[0] == '"':
            strip = strip[1:1 + n]
        strips.append(strip)
    n_words = input()
    if n_words:
        words = [input() for _ in range(int(n_words))]
    else:
        dict_path = input()
        words = list(map(str.rstrip, open(dict_path)))
    return words, strips

def main():
    words, strips = parse()
    words = list(map(str.lower, words))
    # print(words, strips)
    root = build_trie(words)
    for i, result in enumerate(search_origins(root, strips)):
        print('#{}{}'.format(i, '-' * 80))
        print(result)
        # Uncomment next line if you want only one result
        # break

if __name__ == '__main__':
    main()

Ray

Posted 2014-08-28T13:27:33.837

Reputation: 1 946

0

Groovy 311

i=(args[0]as File).readLines()
(a,b)=i[0..1]*.toShort()
j=2
s=[]
a.times{s<<i[j++]-'"'-'"'}
d=[]
i[j++].toLong().times{d<<i[j++].toLowerCase()}
f={(0..<b).collect{c->it.collect{it[c]}.join()}.join("\n")}
o=0;l=0
z=s.permutations().each{x=f(it);c=d.count{x.toLowerCase().contains(it)};if(c>l){l=c;o=x}}
print o

let's start the bidding and just go brute force; return the first result with the most matches against the dictionary by checking all permutations

Here is a shorter example helloworld.txt

5
2
"HW"
"eo"
"lr"
"ll"
"od"
2
hello
world

run as

% groovy destriper.groovy helloworld.txt

cfrick

Posted 2014-08-28T13:27:33.837

Reputation: 313