Python 3 - Trie and DFS
Trie is a tree shape data structure. If we build a trie from a collection of words, then each path from the root to a leaf node will represent a word from the collection.
To solve this problem, first we build a trie using the valid words. Suppose the original text has n
rows and m
columns, then we will have m
strips to put together. We try to reconstruct the original text from left to right, column by column, using a recursive depth-first search. To perform branch elimination, we also maintain a list of heads on the trie, name it heads
. Each head points to a node on the trie and point to the root at first. We will need n
heads, each correspond to a row. Each time after a column(strip
) is appended, heads[i]
will be moved according to the character strip[i]
. If we found any head that cannot be moved, then we know the column just appended is incorrect. After putting m
columns, we get the original text.
I implemented this algorithm using Python. It is fast enough to solve a 40x80 case with 10000 valid word in instant.
I use this python 3 script to generate test cases. Run it with:
python3 make_script.py {dict path} {n rows} {n cols} > intput.txt
The following is a 40 x 80 case. The dictionary file can be found here, which has 10000 words.
80
40
c el efp e otl desaiweseeme jomp ebc g
synwds irnsatseityl rkutpsiysei ssraat
gvmjewrell fclpadrll nbsvwiwteqdeniditba
uet criderndm eesih dalbrcra d iacnnoi e
lbipoeivedrwddivensne aeihrpaa eicangpni
gontgtulvcbatdo glunsniiguoleu eigtbelh
xmsyyu ereio edvrttdonure tpselriniipnxs
ssdieop bgkg wu eloyr dllrhza t hic af
rttruminetnt rcu rasa ogosrywrnietvldgod
lpe aritour alolt yeomlpimmxryftenen sn
cneregsc igodbsigeca r kbhiiisoss eagglp
crm nhdreh raokdtr rcfiotpsfgtjanotaaiaf
s pntenpnmv plemiaaxeeittgtg ncnaooaou a
srrganeu ut toaantetedccnr seiise eur ni
xpndg psafl ooemlusof nba pauosina esnb
ge du tbfe spuoeuesen o s um ldsted w
eto rs l nhuc caexr bsoptwcrrnaennrasary
e maldneg nep nmaid isie ko ctedh necaa
esoeisi u ifran rgirrts ntipuneccnvftrp
gui oeriwraisc og vsepharn latotts aenra
eenylorobneyrem nwtefrasdeoeet ot osdsr
abia e i iwrcneaaiaeeu lie oihotgdicntdi
euugmjsdfcamazndei e yoagaserxatrultoaio
tamdfesgtitc xseuegs fene thatfueo l ut
ocn sdrtn mai aerwmimoss te oceorrit el
netbu wymsrfobelltemetrisioao uri l ce
deaa ee e aadrgcoln cnto semroi ofdnanu
aeaunrmvonirieaxapicnaioxaoromeviiuaeoou
oaodtb hpretdarie n e recei etunsrsedal
ejht n jy ees tnr gsegr s e jhb ayegup
ohids bi g iesiogs rytimtorrsab pnirwlc
ieuereaklieapeatlan sdeeg e lasedaob be
rihbegaisvdtrprnatservaraco na imtcgplna
denrdnnelymisenimi gnfw bssalupcnansu et
itsrwnbeclwfmstasg craaim ei atrifvpr
ubksogxernu auletinieo krte cgmltiliioir
atnhndcutigiasuir ntsaiauem usiireg idms
yniis ssle aisicsngqseazdifiertercaauvoc
ds prcatnoiaafsiopiratdrrmksnfidakr gon
p ngaoolyeolg nep oae dnsnln srfcontsrve
boges massscgss arsobod ierseh rnivtagsp
nlstadhndnroieecntc slhoucbrcouo uohtiis
ltainnpif nfrn tinrdoi ninsanteu n aeb
oun htoioaftupedsmowtmeatanyi sltnh ftto
a emgrasancass oelrefddy alacinsut n s
kgttnltednrmea t yhug welentn tu fsneeah
yia bilgwybnisgfrmidlh t rssnaiultgagk a
iaeeorega sdbhc otadiresdagcipcuet rde
tygui artnsot le wtes etrracm pde oev e
cvrpsabemddamylpeiscdmktpnnolbsesardlwro
rnirmestiwntlrfresosedlvshydrsrrucnntr a
nans hinslmoxenfeolo cxeeneosgrphantiuyg
m dans llnrtlav ensptun acfaev s gtesfad
litl gonungirgk pioeutvinsstgintpsmbrdsu
altpjretooode rtnseeyepdrotllxric s rice
aikaunnskdoh agtnl cuegoacwhnsx esrurgg
ef drant rsaaeymreleiiessmniesdoeeroeer
eereozurmoicueeegtcolbeioteodocrsnaeaatt
h cre g snsihssatlunisi odeelcpminoeba
oisstnirhpmnaybaertcnenhdnondsunaaprlipr
vetou egriemerpicacb ygmiaw rrrocnisecm
ibs tara neeiaqlbbtsgegcs sooeoeuel ten
blrkqaiacortqmn ien ylrrrnecieircs qeind
en s e ieu cnaiscikuirocnr hlnpfjooimohs
dctlehtr ntlgpddeoipmog elllrnncnkgcseeu
o aaet henyaue mbntgeeselsws mmawnwve
lns e xoorail yn eesnee uctaud nrasenha
swydiecacaefolsenmttnsoer ohmahttpt radc
yvltaxin pgrauluktystkmot mrf aluuiepl
b sea usksoim hr idhtttd e owrn cnferevc
lstsibaocesfsmbejsdaellmepmbdedoffdbrtmq
axou nlnae ise gn esse ihbstgsfmeteestf
yeaasra oopaeaildkiynteteidilnuenec vral
msylcdecsy lkgr icdkxrmie strle fei sz
rdae gde al ngswsc soaru tlsrtg durfne
rrpodl i asg essvuo aeaaheenaloe geru
grus nbrls eqme ydioradhsoolmdiouoggnssv
tfirotrkdnnnrtmwenan abs cfcigscmpsed
a soteiicdero nsaeuisrmisntau iirammlyi
anroiim tsiiaarrhoyaimdo oera dblogysmr
Solved:
lacerated deliberateness canvas gould scold sexually piggyback obey agrimony moe
seventy except fixates warble shrubbery instrumenting beveling convolute hairs j
tarring aptness homotopy misted unkindness frustrations marketing latitudinary h
superhumanly redbud bends atomics sprawls piggy oiling djakarta estop reads plat
insomnia delaware fruition auger hooded terramycin statuesqueness adjourns boca
brazed leghorn agnes seethed roentgen sen conjuring goa wrangled extremes hidden
ambuscade tribunals winced heap boxing fixates impostor rainstorm iberia billet
overturn protestingly pairings grieved protruded insulate ascertain tinkling c j
committees bock sat muncie drab lorelei honk fretfully block dense howell swishy
endowing annuls veins machining sandy reproducers neve flooding supportingly ye
sidings afterword thrived wrecks furman maintaining cone programs groaner mba ne
fractionally fisticuff ferromagnet wilsonian modifiable fathom accreditations ye
simulated grammars corporacies squads atalanta marriage ceq designates plexiglas
meyers propels apex ballooners impudently froze angst islamabad saud creases ku
baleful godmother scenes keep wheelings bystander kidnappings basilar caving get
experienced laryngeal mendacious devise animadvert coequal tithe surtout offer n
jaeger moments manuel instance systemwide ownerships plodding cracking legers mr
spits wallow girt extramarital administer petitioning berkeley writes ransoms b
discontinuity desegregate accretions cysteine thyroglobulin chomsky evans living
accosted speechless mixture bellowing lacerated adequately cautiousness products
endless comforters berenices fourteens insane odiousness nyu gambit yeasty lodge
lambda informative stressful dynamo forwent dynamites gentler doorkeep dutch keg
likeliest galatia fortiori hydride awakened could vandenberg wisdom phoenix axer
motivation spider epistemology shake rusher carbonizing strokers cordage meters
exposure bedim saints tritium distributed ranger indiscoverable interrogate ames
panther salem eschewing special oaths penumbra consigns winchester cons conrail
moneymake lome mob cottons ballooners smocks stressful sidewinder met refreshes
brood complexion straightforwardly patients separations wichita shrilly arose we
doldrum rarer winthrop magic chemically dancers angel outlining elf lawlessness
emboss county reagan unauthorized gauss suffixed tiresomeness cohn extravagant j
description fans staunch journalism prejudicial denture quixotic patrons briar h
override scott differentiators collection distribute folder superfluities pull b
fissured intercommunicating outputting managerial precedences conjuncted phloem
fiance honk nineteen coproduct monica sparks unconsciousness furious stagnate ma
durango fag effector notation highlands parceling magnetic reservoirs voting fay
baden end converge alva machining insurers mutiny battled quantities alberta ewe
relativeness propels formants configurable propos rubs diverge tampered swigging
toward cased reels actualities estop facings animadvert trigger golding fluke wu
mort meanness venturer dyadic abstinent phoenix resolve bangladesh acrobacy rsvp
quotas auburn crafty pacifism favorite grandiose buchenwald phelps leaded gauze
Code
def build_trie(words):
root = {}
for word in words:
node = root
for c in word:
if c in node:
node1 = node[c]
else:
node1 = node[c] = {}
node = node1
return root
def search_origins(root, strips):
n_rows = len(strips[0])
n_cols = len(strips)
ordered = [None] * n_cols
heads = [root] * n_rows
used = set()
def get_next(node, c):
if c == ' ':
return root
return node.get(c, None)
def search(col_id, heads):
if col_id == n_cols:
yield transpose(ordered)
return
for i in range(n_cols):
if i in used:
continue
heads1 = [get_next(head, c) for c, head in zip(strips[i].lower(), heads)]
if any(x is None for x in heads1):
continue
used.add(i)
ordered[col_id] = strips[i]
yield from search(col_id + 1, heads1)
used.remove(i)
yield from search(0, heads)
def transpose(cols):
return '\n'.join(map(''.join, zip(*cols)))
def parse():
m = int(input())
n = int(input())
strips = []
for i in range(m):
strip = input()
if strip[0] == '"':
strip = strip[1:1 + n]
strips.append(strip)
n_words = input()
if n_words:
words = [input() for _ in range(int(n_words))]
else:
dict_path = input()
words = list(map(str.rstrip, open(dict_path)))
return words, strips
def main():
words, strips = parse()
words = list(map(str.lower, words))
# print(words, strips)
root = build_trie(words)
for i, result in enumerate(search_origins(root, strips)):
print('#{}{}'.format(i, '-' * 80))
print(result)
# Uncomment next line if you want only one result
# break
if __name__ == '__main__':
main()
Dupe? – Beta Decay – 2015-08-15T23:57:08.170
You'll need to specify an objective winning criterion, from your question you seem to be looking for fastest-algorithm, but I would suggest code-golf. – overactor – 2014-08-28T14:20:00.657
I would suggest to remove the codegolf tag if you don't want brute-force answers. Also you should format the input and output example using code block to align characters. – Ray – 2014-08-28T20:12:23.007