Find the minimum edit distance between two strings

13

4

Explanation

The edit distance between two strings is a function of the minimum possible number of insertions, deletions, or substitutions to convert one word into another word.

Insertions and deletions cost 1, and substitutions cost 2.

For example, the distance between AB and A is 1, because deletions cost 1 and the only edit needed is the deletion of the B character.

The distance between CAR and FAR is 2, because substitutions cost 2. Another way to look at this is one deletion and one insertion.

Rules

Given two input strings (supplied however is convenient in your language), your program must find the minimum edit distance between the two strings.

You may assume that the strings only contain the characters A-Z and have fewer than 100 characters and more than 0 characters.

This is code golf, so the shortest solution wins.

Sample Test Cases

ISLANDER, SLANDER
> 1
MART, KARMA
> 5
KITTEN, SITTING
> 5
INTENTION, EXECUTION
> 8

Peter Olson

Posted 2012-03-16T15:11:03.147

Reputation: 7 412

I did this in excel in my algorithms class once. Was kinda fun to turn the project in as an .xls document. It actually worked pretty well. I'll see if I can find it. – captncraig – 2012-03-16T17:32:54.153

PHP should win this easily. – st0le – 2012-03-19T12:13:58.900

@st0le - The built in levenshtein function treats substitutions as one edit (substitute), not two (delete + insert). – Mr. Llama – 2012-03-20T17:25:15.720

Answers

7

brainfuck, 2680

+>>>>>>>>>>>>>>>>>>>>>>++++++>,<[->-----<]>--[+++++++++++++++++++++
+++++++++++>>[->>+<<]>>+<<,--------------------------------]>>[-<<<
+>>>]<<<<[>[-<<+>>]<<<]>[-<<<<<+>>>>>]>[>>],----------[++++++++++>>
[->>+<<]>>+<<,----------]>>[-<<<+>>>]<<<<[>[-<<+>>]<<<]>[-<<+>>]<<[
-<+>>+<]<<<[->+<]>[-<+>>>>+<<<]>>>[-<+>]<[->+>+<<]<[-<+>]<[->+>+<<]
<[-<+>>+<]<[->+<]>>>>>>[-[->>+<<]<<[->>+<<]<<[->>+<<]>>>>>>]<<[->>+
>+<<<]>>>[-<<<+>>>]<[->>+[->+>+<<]<<[->>+<<]>>]>>[-]<[<<]<<<+[->>>+
<<<]>>>[-<+<<+>>>]<<<<<[->>[->>>>[->>+<<]<<[->>+<<]<<[->>+<<]<<[->>
+<<]>>>>]>>[->>>>+<<<<]>>>>[-<<<<+<<+>>>>>>]<<+[->>+<<]>>[-<<+<+>>>
]<<<<<<<<]>>[-]>>[-]>>[-]-[+<<-]+>>>>>>>>>>>>>>>>>[-<+>]<[->+<<<<+>
>>]<<<[->>>>>>[-<+>]<[->+<<<<+>>>]<<<<<<<[-]<<+>>>>>>[-<<<<+>>>>>>>
>>>[-<+>]<[->+>+<<]<<<<<<<<<[->+<]>[->>>>>>>>+<<<<<<<<<+>]<<<[->+<]
>[->>>>>>>>+<<<<<<<<<+>]>>>>>>>>>[-<<<<<+>>>>>]<<<<<[->>+>>>+<<<<<]
>>>>>>>>[-[->>+<<]<<[->>+<<]<<[->>+<<]<<[->>+<<]>>>>>>>>]<<<<<<+>>-
[-<<[-<<<<+>>>>]<<<<[->>+>>+<<<<]>>[->>>>>>[->>+<<]<<[->>+<<]<<[->>
+<<]<<[->>+<<]>>]>>>>]>>[->>+<<]>>[-<<+>>>>+<<]->>-[-[->>+<<]>>]>[-
>+<]>[-<+>>>+<<]<<<[->+<]>[-<+>>>+<<]+[->>[-<<+>>]>>[-<<+>>]<<<<<<+
]<<<<<<[->>>>>>+<<<<<<]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<[->>>>+<<<<]
-<<[-]>>>>>>>>[-<<<<<<<<+>>>>>>>>]<<<<[->>[->>+<<]<<[->>+<<]>>]>>-[
-[->>+<<]>>]<[->+<]>[-<+>>>+<<]+[->>[-<<+>>]<<<<+]>>[-<<+>>]<<<<<<<
<-[+>>[-<<+>>]>>[-<<+>>]>>[-<<+>>]<<<<<<<<-]+>>>[-]<[->+<]>>>[-]<[-
>+<]>>>[-]<[->+<]>>>[->+<]>[-<+>>>>>>>>>>>>>+<<<<<<<<<<<<]>>>>>>>>>
>>>-[-[->>+<<]>>]>[->+<]>[-<+<+>>]<<<<-[+>>[-<<+>>]<<<<-]+>>[-<+>]>
>>>>>>>>[->+<]>[-<+>>>>>>>>>>>+<<<<<<<<<<]>>>>>[->+<]>[-<+>>>>>+<<<
<]>>>>-[-[->>+<<]>>]>[->+<]>[-<+<+>>]<<<<-[+>>[-<<+>>]<<<<-]+>[->-<
]>[-<[-]+>]<[->>>+<<<]>>>[-<<+<+>>>]<<-[+>[->+<]<]<[->>+<-[[-]>[->+
<]>[-<+>>>+<<]+>>[-<<[-]>>]<[->+<]>[-<+>>>+<<]+>>[-<<[-]>>]<[->+<]>
[-<+>>>+<<]+>>[-<<[-]>>]<<[-<<[-]+>>]<<[-<<[-]+>>]<<[-<<[-]+>>]<<<+
>->->>->>-<<<<<]<[->+<]]>[-<+>]>>[-<<<+>>>]<<<[->>>>>>>>>>>>>+<<<<<
<<<<<<<<]>>>>>>>>[->+<]>[-<+>>>>>>>>>+<<<<<<<<]>[->+<]>[-<+>>>>>>>>
>+<<<<<<<<]>>>>>>>>>[->>>+<<<]>>>[-<<+<+>>>]<<<<<[->>>>>+<<<<<]>>>>
>[-<<<<<+<<<+>>>>>>>>]<<<<<<<<+>>>>>>[-[->>+<<]<<[->>+<<]<<[->>+<<]
<<[->>+<<]<<[->>+<<]>>>>>>>>>>]<<<<[-<<[-<<<<<<+>>>>>>]<<<<<<[->>+>
>>>+<<<<<<]>>[->>>>>>>>[->>+<<]<<[->>+<<]<<[->>+<<]<<[->>+<<]<<[->>
+<<]>>]>>>>>>]<<[-]<<[->>>>+<<<<]>>>>>>[-[->>+<<]<<[->>+<<]>>>>]<<[
->>>>>+<<<<<]-[+<<-]+>>>>>>>>>>>>>>>]<<]>>>>>>>>[->+<]<<[->+<]<<[->
+<]>>>>>[-[->>+<<]<<[->>+<<]<<[->>+<<]>>>>>>]<<<<[->>[->>>>+<<<<]>>
>>[-<<+<<+>>>>]<<+[-[->>+<<]<<[->>+<<]<<[->>+<<]>>>>>>]<<<<]>>[-[->
>+<<]>>]>>>>++++++++++<[->-[>+>>]>[+[-<+>]>+>>]<<<<<]>>>[->-[>+>>]>
[+[-<+>]>+>>]<<<<<]>[-]<++++++[->++++++++[->+>+<<<<+>>]<]>>>.<.<<<.

Using dynamic programming, of course.

Input strings should be separated by a space, and followed by a new line. For example:

INTENTION EXECUTION
008

cardboard_box

Posted 2012-03-16T15:11:03.147

Reputation: 5 150

1Neatly aligned and very readable - I like it, +1! – Thomas – 2012-11-18T11:45:53.613

Is this a computer language? :O – None – 2016-04-04T09:38:09.737

This is the most confusing submission to this question... +1 just because it's BF. – HyperNeutrino – 2017-02-24T01:50:07.930

5

Python, 138 148 152 characters

Ok, I knew the principle but had never implemented edit distance before, so here goes:

x,y=raw_input().split()
d=range(99)
for a in x:
 c=d;d=[d[0]+1]
 for b,p,q in zip(y,c[1:],c):d+=[min(d[-1]+1,p+1,q+2*(a!=b))]
print d[-1]

han

Posted 2012-03-16T15:11:03.147

Reputation: 1 226

4

PHP, 40

Follows the rules as stated, but not the spirit of the rules:

<?=levenshtein($argv[1],$argv[2],1,2,1);

Takes it's two parameters as arguments. Example calling: php edit_dist.php word1 word2.
Normally levenshtein() considers a substitution as one operation, but it has an overloaded form where the insert/sub/delete weights can be specified. In this case, its 1/2/1.

Mr. Llama

Posted 2012-03-16T15:11:03.147

Reputation: 2 387

3

APL (90 characters)

⊃⌽({h←1+c←⊃⍵⋄d←h,1↓⍵⋄h,(⊂⍵){y←⍵+1⋄d[y]←⍺{h⌷b=⍵⌷a:⍵⌷⍺⋄1+d[⍵]⌊y⌷⍺}⍵}¨⍳x}⍣(⊃⍴b←,⍞))0,⍳x←⍴a←,⍞

I'm using Dyalog APL as my interpreter, with ⎕IO set to 1 and ⎕ML set to 0. The direct function ({ ... }) computes, given a row as input, the next row in the distance matrix. (It is started with the initial row: 0,⍳x.) The power operator () is used to nest the function once for each character in the second string (⊃⍴b). After all of that, the final row is reversed (), and its first element is taken ().

Example:

      ⊃⌽({h←1+c←⊃⍵⋄d←h,1↓⍵⋄h,(⊂⍵){y←⍵+1⋄d[y]←⍺{h⌷b=⍵⌷a:⍵⌷⍺⋄1+d[⍵]⌊y⌷⍺}⍵}¨⍳x}⍣(⊃⍴b←,⍞))0,⍳x←⍴a←,⍞
LOCKSMITH
BLACKSMITH
3
      ⊃⌽({h←1+c←⊃⍵⋄d←h,1↓⍵⋄h,(⊂⍵){y←⍵+1⋄d[y]←⍺{h⌷b=⍵⌷a:⍵⌷⍺⋄1+d[⍵]⌊y⌷⍺}⍵}¨⍳x}⍣(⊃⍴b←,⍞))0,⍳x←⍴a←,⍞
GATTTGTGACGCACCCTCAGAACTGCAGTAATGGTCCAGCTGCTTCACAGGCAACTGGTAACCACCTCATTTGGGGATGTTTCTGCCTTGCTAGCAGTGCCAGAGAGAACTTCATCATTGTCACCTCATCAAAGACTACTTTTTCAGACATCTCCTGTAG
AAGTACTGAACTCCTAATATCACCAATTCTTCTAAGTTCCTGGACATTGATCCGGAGGAGGATTCGCAGTTCAACATCAAGGTAAGGAAGGATACAGCATTGTTATCGTTGTTGAGATATTAGTAAGAAATACGCCTTTCCCCATGTTGTAAACGGGCTA
118

Dillon Cower

Posted 2012-03-16T15:11:03.147

Reputation: 2 192

3

R 51

This reads in two lines from the user (which are the input) and then just uses the adist function to calculate the distance. Since substitutions cost 2 for this problem we need to add a named vector for the costs parameter for adist. Since there is also a counts parameter for adist we can only shorten costs to cos so we still have a unique match for the parameter names.

x=readLines(n=2);adist(x[1],x[2],cos=c(i=1,d=1,s=2))

Example use

> x=readLines(n=2);adist(x[1],x[2],cos=c(i=1,d=1,s=2))
MART
KARMA
     [,1]
[1,]    5

Dason

Posted 2012-03-16T15:11:03.147

Reputation: 260

0

Smalltalk (Smalltalk/X), 34

I am lucky - the standard "String" class already contains levenshtein:

input a, b:

a levenshteinTo:b s:2 k:2 c:1 i:1 d:1

(the additional parameters allow for different weights on substitution, deletion, etc.)

Test run:

#( 'ISLANDER'  'SLANDER' 
   'MART'      'KARMA'   
   'KITTEN'    'SITTING' 
   'INTENTION' 'EXECUTION'  
) pairWiseDo:[:a :b|
    a print. ' ' print. b print. ' ' print.
    (a levenshteinTo:b
            s:2 k:2 c:1 i:1 d:1) printCR.
].

->

ISLANDER SLANDER 1
MART KARMA 5
KITTEN SITTING 5
INTENTION EXECUTION 8

blabla999

Posted 2012-03-16T15:11:03.147

Reputation: 1 869

0

Ruby 1.9, 124

l=->a,b{a[0]?b[0]?[(a[0]==b[0]?-1:1)+l[a[1..-1],b[1..-1]],l[a[1..-1],b],l[a,b[1..-1]]].min+1: a.size: b.size}
puts l[*ARGV]

Golfed version of the slow, recursive Ruby method here, modified to double the weight for substitutions. The fact that it takes 7 characters to remove the first character of a string really hurts, and it'd be cool to replace (a[0]==b[0]?-1:1) with something like -1**a[0]<=>b[0] but the order of operations is not my friend.

histocrat

Posted 2012-03-16T15:11:03.147

Reputation: 20 600