Input:

Two strings without newlines or whitespaces.

Output:

Both input strings on separate lines, with spaces where necessary^† for one of the two strings. And a third line with the characters A, R, M and , representing added, removed, modified, and unchanged.

^† We add spaces to either the top or bottom input string (if we have to). The goal of this challenge is to output with the least amount of changes (ARM) possible, also known as the Levenshtein distance.

Example:

Let's say the input strings are ABCDEF and AFBECD, then the output would be this:

A B CDEF
AFBECD  
 A A  RR

Here are some other possible invalid outputs as example (and there are a lot more):

ABCDEF
AFBECD
 MMMMM

A BCDEF
AFBECD 
 A MMMR

AB CDEF
AFBECD 
 MAMMMR

ABC DEF
AFBECD 
 MMAMMR

ABC  DEF
AFBECD  
 MMAA RR

ABCDEF 
AFB ECD
 MMR MA

 AB CDEF   // This doesn't make much sense,
AFBECD     // but it's to show leading spaces are also allowed
AM A  RR

None of these have only four changes however, so only A B CDEF\nAFBECD \n A A RR is a valid output for this challenge.

Challenge rules:

You can assume the input strings won't contain any new-lines or spaces.
The two input strings can be of different lengths.
One of the two input strings should remain as is, except for optional leading/trailing spaces.
If your languages doesn't support anything besides ASCII, you can assume the input will only contain printable ASCII characters.
The input and output format are flexible. You can have three separate Strings, a String array, a single String with new-lines, 2D character array, etc.
You are allowed to use something else instead of ARM, but state what you've used (i.e. 123, or abc., etc.)
If more than one valid output is possible with the same amount of changes (ARM), you can choose whether to output one of the possible outputs or all of them.

Leading and trailing spaces are optional:

A B CDEF
AFBECD
 A A  RR

"A B CDEF\nAFBECD\n A A  RR"
                 ^
                 Note there are no spaces here

General rules:

This is code-golf, so shortest answer in bytes wins.
Don't let code-golf languages discourage you from posting answers with non-codegolfing languages. Try to come up with an as short as possible answer for 'any' programming language.
Standard rules apply for your answer, so you are allowed to use STDIN/STDOUT, functions/method with the proper parameters, full programs. Your call.
Default Loopholes are forbidden.
If possible, please add a link with a test for your code.
Also, please add an explanation if necessary.

Test cases:

In: "ABCDEF" & "AFBECD"

Output (4 changes):
A B CDEF
AFBECD  
 A A  RR                  

In: "This_is_an_example_text" & "This_is_a_test_as_example"

Possible output (13 changes):
This_is_an       _example_text
This_is_a_test_as_example     
         MAAAAAAA        RRRRR

In: "AaAaABBbBBcCcCc" & "abcABCabcABC"

Possible output (10 changes):
AaAaABBbBBcCcCc
 abcABCab cABC 
R MM  MMMR MM R

In: "intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;}" & "intf(){intr=(int)(Math.random()*10);returnr>0?r%2:2;}"

Possible output (60 changes):
intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;}
intf(){i ntr=(      i    n      t)(M  ath.r   andom        ()*         10          );returnr>0?r%2:2;}
       MR M  MRRRRRR RRRR RRRRRR MMMRR MMMMRRR     RRRRRRRR  MRRRRRRRRR  RRRRRRRRRR 

In: "ABCDEF" & "XABCDF"

Output (2 changes):
 ABCDEF
XABCD F 
A    R 

In: "abC" & "ABC"

Output (2 changes):
abC
ABC
MM

Kevin Cruijssen

Posted 2017-10-05T12:19:33.243

Reputation: 67 575

Related – Kevin Cruijssen – 2017-10-05T12:19:44.083

If there are multiple arrangements that are the same distance, is it OK to only output one of them? – AdmBorkBork – 2017-10-05T12:28:47.883

@AdmBorkBork Yes, just one of the possible outputs is indeed the intended output (although outputting all available options is also fine). I'll clarify this in the challenge rules. – Kevin Cruijssen – 2017-10-05T12:30:59.413

@Arnauld I've removed the rule about leading spaces, so leading and trailing spaces are both optional and valid on the unmodified line. (Which means the last test case in your answer is completely valid.) – Kevin Cruijssen – 2017-10-05T14:32:18.967

Can we use lowercase instead of uppercase? – Stan Strum – 2017-10-05T18:17:13.090

Do we need to compare byte, characters, code points, or graphene clusters? (when dealing with a language that supports unicode) – Ferrybig – 2017-10-06T06:34:09.563

@StanStrum "You are allowed to use something else instead of ARM, but state what you've used (i.e. 123, or abc., etc.)" – Kevin Cruijssen – 2017-10-06T06:43:43.170

@Laikoni "The input and output format are flexible. You can have three separate Strings, a String array, a single String with new-lines, 2D character array, etc." – Kevin Cruijssen – 2017-10-06T06:43:56.703

@Ferrybig Hmm.. characters I guess. Aren't these all unique when you use a single encoding? Could you give an example of where a character could be different in character, but still might be equal in code points, graphene clusters, and/or bytes in the same encoding? – Kevin Cruijssen – 2017-10-06T06:46:22.697

@KevinCruijssen स्ते this string is 1 graphene cluster, consisting of 4 code points and 4 characters, and is 12 bytes when represented on disk using UTF-8, or 10 bytes on disk using UTF-16 . Unicode support is harder than most people realise, and while you probably meant to support graphene clusters for this challenge, most answers probably won't support it. You might be interesting to read the exact difference here: http://utf8everywhere.org/#characters (no affiliation)

– Ferrybig – 2017-10-06T06:53:03.173

1@Ferrybig Ah ok, thanks for the explanation. But as for this challenge, just supporting printable ASCII is actually already enough. If you want to support more, be my guest. But as long as it works for the test cases given I'm ok with undefined behavior for graphene clusters consisting of more than 1 character like that. :) – Kevin Cruijssen – 2017-10-06T07:09:43.350

Answers

Haskell, 192 181 174 161 158 150 147 143 158¹ bytes

e@(a:r)&f@(b:s)=snd$maximum[([1|' '<-s!!2],s)|s<-z(z(:))[a:" R",' ':b:"A",a:b:last("M":[" "|a==b])][r&f,e&s,r&s]]
x&y=[x,y,"RA"!!(0^length x)<$x++y]
z=zipWith

Try it online! Example usage: "ABCDEF" & "AFBECD". Returns a list of three strings. This is an extension of my recursive solution to the ordinary Levenshtein distance question.

Explanation:

To compute the minimal modifications to get from "xyz" to "yw", we focus on the first character of both strings. There are three possibilities:

Remove: Drop x from the first string and recursively compute the modifications to get from "yz" to "yw". This yields the three lines ["yz","yw"," M"]. Add x to the first one, a space to the second one and R to the third one. We get
```
xyz
yw
R M
```
Add: Drop y from the second string and compute "xyz" & "w", which returns the result ["xyz","w","MRR"]. We need to add a space on the first line, y to the second and A to the third line:
```
 xyz
yw
AMRR
```
Modified/Unchanged: We can combine those two cases because both require to drop the first character of both strings and compute the minimal modifications between the remaining strings: "yz" & "w". To the result ["yz","w","MR"], we add x on he first and y on the second line. Only for the last line we need to differentiate whether the initial characters are the same. If they are the same, a space is added to the third line, otherwise (as in this case because x \= y) an M is added:
```
xyz
yw
MMR
```

From those three candidates, we need to find the one with the fewest modifications. This is equivalent to having the most spaces on the third line. Therefore we convert each candidate s (a list of three strings) to a tuple ([1|' '<-s!!2],s), where s appears as second component and the first component is a list with as many elements as there are spaces in the third line of s (s!!2 because of 0-indexing). As list element 1 is used, but the actual element is irrelevant as long as it is the same for all candidates.

Altogether, this yields the list of tuples

[([1],["xyz"," yw","R M"]),([],[" xyz","yw","AMRR"]),([],["xyz","yw","MMR"])]

The build-in maximum selects the largest element of this list, where tuples are compared lexicographically, that is component-wise from left to right. As [1] is larger than [], the first tuple is selected, and snd returns the second of component, that is the list of lines, of the tuple.

_{¹ +15 bytes to fix a bug where A-changes at the end of a string would be displayed as R-changes}

Laikoni

Posted 2017-10-05T12:19:33.243

Reputation: 23 676

lol this makes the userscript think that this is 1 byte – HyperNeutrino – 2017-10-10T15:12:25.050

JavaScript (ES6), 413 ... 343 342 bytes

Saved 5 bytes by tweaking the loop indices, as suggested by @KevinCruijssen

Takes input as 2 strings in currying syntax. Returns an array of 3 strings.

b=>a=>{m=[];x=a.length;y=b.length;for(i=j=0,c=d='';i<=y;c+=R='R')m[i]=[[c,i++]];for(;j++<x;)m[i=0][j]=[d+=A='A',j];for(;i<y;i++)for(j=0;j<x;)[C]=m[[X,S]=m[i][j],[Y,I]=m[i+1][j],[Z,D]=m[i][++j],Q=[Z+R,D+1],i+1][j]=b[i]==a[j-1]?[X+' ',S]:S<I?D<S?Q:[X+'M',S+1]:D<I?Q:[Y+A,I+1];return[(g=s=>C.replace(/./g,c=>c==s?' ':b[i++],i=0))(A),g(R,b=a),C]}

Test cases

let f =

b=>a=>{m=[];x=a.length;y=b.length;for(i=j=0,c=d='';i<=y;c+=R='R')m[i]=[[c,i++]];for(;j++<x;)m[i=0][j]=[d+=A='A',j];for(;i<y;i++)for(j=0;j<x;)[C]=m[[X,S]=m[i][j],[Y,I]=m[i+1][j],[Z,D]=m[i][++j],Q=[Z+R,D+1],i+1][j]=b[i]==a[j-1]?[X+' ',S]:S<I?D<S?Q:[X+'M',S+1]:D<I?Q:[Y+A,I+1];return[(g=s=>C.replace(/./g,c=>c==s?' ':b[i++],i=0))(A),g(R,b=a),C]}

console.log(f('ABCDEF')('AFBECD').join`\n`);
console.log(f('This_is_an       _example_text')('This_is_a_test_as_example').join`\n`);
console.log(f('AaAaABBbBBcCcCc')('abcABCabcABC').join`\n`);
console.log(f("intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;}")("intf(){intr=(int)(Math.random()*10);returnr>0?r%2:2;}").join`\n`);
console.log(f('ABCDEF')('XABCDF').join`\n`);

Less golfed

b => a => {
  m = []; x = a.length; y = b.length;

  // initialize the leftmost column and the topmost row
  for(i = j = 0, c = d = ''; i <= y; c += R = 'R')
    m[i] = [[c, i++]];
  for(; j++ < x;)
    m[i = 0][j] = [d += A = 'A', j];

  // walk through the Levenshtein matrix
  for(; i < y; i++)
    for(j = 0; j < x;)
      [C] = m[                                // C = current string, once updated
        [X, S] = m[i][j],                     // substitution
        [Y, I] = m[i + 1][j],                 // insertion
        [Z, D] = m[i][++j],                   // deletion
        Q = [Z + R, D + 1],                   // precomputed update for deletion
        i + 1
      ][j] =
        b[i] == a[j - 1] ?
          [X + ' ', S]                        // unchanged character
        :
          S < I ?
            D < S ? Q : [X + 'M', S + 1]      // deletion or substitution
          :
            D < I ? Q : [Y + A, I + 1];       // deletion or insertion

  return [
    // g = helper function to format the output strings by inserting spaces
    (g = s => C.replace(/./g, c => c == s ? ' ' : b[i++], i = 0))(A),
    g(R, b = a),

    // final modification string, picked from the last visited cell
    C
  ]
}

Example

Below is the initial matrix for b = "foo" and a = "ok":

//                     'o'           'k'
[ [ [ '',    0 ], [ 'A',   1 ], [ 'AA',  2 ] ],
  [ [ 'R',   1 ],  (undefined),  (undefined) ],  // 'f'
  [ [ 'RR',  2 ],  (undefined),  (undefined) ],  // 'o'
  [ [ 'RRR', 3 ],  (undefined),  (undefined) ] ] // 'o'

and here is the final matrix after all iterations:

//                     'o'           'k'
[ [ [ '',    0 ], [ 'A',   1 ], [ 'AA',  2 ] ],
  [ [ 'R',   1 ], [ 'M',   1 ], [ 'MA',  2 ] ],  // 'f'
  [ [ 'RR',  2 ], [ 'R ',  1 ], [ 'R A', 2 ] ],  // 'o'
  [ [ 'RRR', 3 ], [ 'RR ', 2 ], [ 'R M', 2 ] ] ] // 'o'

The final modification string along with the Levenshtein distance are stored in the bottom-right cell.

Arnauld

Posted 2017-10-05T12:19:33.243

Reputation: 111 334

Same change I suggested to save 1 byte regarding -1/+1 of j and x still applies to your latest edit: b=>a=>{m=[];x=a.length;y=b.length+1;for(i=y;i--;)m[i]=[[i,'R'.repeat(i)]];for(j=x+1;j--;)m[i=0][j]=[j,'A'.repeat(j)];for(;++i<y;)for(j=-1;++j<x;)R=m[S=(X=m[i-1][j])[0],I=(Y=m[i][j])[0],D=(Z=m[i-1][j+1])[0],Q=[D+1,Z[1]+'R'],i][j+1]=b[i-1]==a[j]?[S,X[1]+' ']:S<I?D<S?Q:[S+1,X[1]+'M']:D<I?Q:[I+1,Y[1]+'A'];return[(g=s=>R[1].replace(/./g,c=>c==s?' ':b[i++],i=0))('A'),g('R',b=a),R[1]]} :) – Kevin Cruijssen – 2017-10-05T15:21:15.993

1@KevinCruijssen I saved 5 bytes by taking your idea a step further. Thanks! – Arnauld – 2017-10-05T15:43:07.023

Python 2, 548 536 484 500¹ 488 447 381² 373 371 357 350 bytes

s,t=input()
n,m=len(s),len(t)
r=range
L=[[(j,'RA'[i<1]*j)for j in r(i,m-~i)]for i in r(n+1)]
for i in r(n):
 for j in r(m):M,R=L[i][j:j+2];A=L[i+1][j];v,V=min(A,R,M);x=('AR'[v in R],'M_'[s[i]==t[j]])[v in M];_=M;X=eval(x)[1]+x;L[i+1][j+1]=v+(x<'_'),X
for i in r(len(X)):s=s[:i]+' '*('B'>X[i])+s[i:];t=t[:i]+' '*('R'==X[i])+t[i:]
print s+'\n'+t+'\n'+X

Try it online!

Uses 'ARM_' instead of 'ARM '

Works by building up the Levenshtein matrix, ~~and then traversing backwards to find the operations~~. Now builds the operation-string at the same time as the Levenshtein matrix, as in Arnauld's JS answer

_{1: More bytes, as it did not work if the first string was a single character.}

_{2: Switched to building the combinations in the Levenshtein matrix.}

TFeld

Posted 2017-10-05T12:19:33.243

Reputation: 19 246

+1 for taking less than 60 seconds for 6 character words like my initial (failed) attempt lol – HyperNeutrino – 2017-10-05T14:00:48.737

Nice answer! +1 from me. Since I never program in Python I can't really help you with the golfing, except for one thing: m+i+1 can be m-~i. – Kevin Cruijssen – 2017-10-05T14:13:15.480

You can use a tab instead of double spaces on line 7. – Stephen – 2017-10-05T14:16:11.457

You can get to 463 bytes by reducing the wile loop to one line: while i+j<n+m:v,A=(L[i]+[m,m])[j:j+2];R,M=(L[i+1]+[m,m])[j:j+2];d=min(A,R,M);q=M==d or(R==d)*2;r+=' '*(d==v==M)or'AMR'[q];i+=q>0;j+=q<2

– ovs – 2017-10-05T18:42:14.590

Mathematica, 250 256 259 384 bytes

~0.00035 seconds for the java code case.

(i=o=p="";a=Array;r=StringLength;If[Length@#>0,g=#&@@#;h=#[[2]];u=r@g;v=r@h;i=i<>g;o=o<>h;w=Abs[v-u];k=a[" "&,w];If[u<v,i=i<>k,o=o<>k];p=p<>a["M"&,u~Min~v];p=p<>a[If[u>v,"R","A"]&,w],i=i<>#;o=o<>#;p=p<>a[" "&,r@#]]&/@SequenceAlignment[#,#2];{i,o,p})&

Usage: f["ABCDE", "ABCDF"]

Try it online!

The code is based on a fact that SequenceAlignment works by default on

With the default setting SimilarityRules->Automatic, each match between two elements contributes 1 to the total similarity score, while each mismatch, insertion, or deletion contributes -1.

Namely, the scoring is calculated by M, A and R, accordingly.

Example:

Keyu Gan

Posted 2017-10-05T12:19:33.243

Reputation: 2 028

2Hmm, I've never programmed in Mathemetica, but isn't it possible to change i="";o="";p=""; to i="";o=i;p=i; to reduce 2 bytes? – Kevin Cruijssen – 2017-10-06T06:50:32.730

2What about i=o=p=""? – DavidC – 2017-10-06T12:26:07.113

@DavidC Yes I had realized that and changed it already. thank you anyway – Keyu Gan – 2017-10-06T19:31:28.337

Python 2, 310 bytes

from difflib import*
a,b=input()
U,j=[],' '
for g,s,t,x,y in SequenceMatcher(0,a,b).get_opcodes():
	g,Y,T=g[0],y-x,t-s
	z,A,B=Y-T,a[s:t],b[x:y]
	if'q'<g:U+=[A+z*j,B+j*-z,'M'*min(T,Y)+'A'*z+'R'*-z]
	if'e'==g:U+=[A,B,j*Y]
	if'i'==g:U+=[j*Y,B,'A'*Y]
	if'e'>g:U+=[A,j*T,'R'*T]
for e in[0,1,2]:print''.join(U[e::3])

Try it online!

Using difflib.SequenceMatcher that computes an alignment between two strings

mdahmoune

Posted 2017-10-05T12:19:33.243

Reputation: 2 605

This seems to give some incorrect results for some of the other test cases. More particular: "This_is_an_example_text","This_is_a_test_as_example"

– Kevin Cruijssen – 2017-10-06T16:10:39.153

@KevinCruijssen thanx ,I just fixed it ^_^ – mdahmoune – 2017-10-06T16:25:24.597

Nice, gj! But umm.. the third test case (and fourth as well) is also incorrect unfortunately. One of the two lines should be unmodified (except for leading/trailing spaces). Both lines currently contain spaces in the middle.

– Kevin Cruijssen – 2017-10-06T17:55:33.330

@KevinCruijssen thanx again, I’m fixing it – mdahmoune – 2017-10-06T17:58:31.263

D, 351 345 bytes

-6 bytes thanx to KevinCruijssen

string f(string a,string b){import std.algorithm;string x,y,z;size_t i,j,k;foreach(c;levenshteinDistanceAndPath(a,b)[1]){final switch(c)with(EditOp){case none:x~=a[i++];y~=b[j++];z~=" ";break;case substitute:x~=a[i++];y~=b[j++];z~="M";break;case insert:x~=" ";y~=b[j++];z~="A";break;case remove:x~=a[i++];y~=" ";z~="R";}}return x~"\n"~y~"\n"~z;}

Try it online!

mdahmoune

Posted 2017-10-05T12:19:33.243

Reputation: 2 605

You can golf six bytes by removing the last break;. +1 though, first time I'm seeing the programming language D. – Kevin Cruijssen – 2017-10-10T13:00:06.263

I don't like change!

Input:

Output:

Example:

Challenge rules:

General rules:

Test cases:

Answers

Haskell, 192 181 174 161 158 150 147 143 1581 bytes

Explanation:

JavaScript (ES6), 413 ... 343 342 bytes

Test cases

Less golfed

Example

Python 2, 548 536 484 5001 488 447 3812 373 371 357 350 bytes

Mathematica, 250 256 259 384 bytes

Python 2, 310 bytes

D, 351 345 bytes

Haskell, 192 181 174 161 158 150 147 143 158¹ bytes

Python 2, 548 536 484 500¹ 488 447 381² 373 371 357 350 bytes