How can I diff two XML files?

78

33

On Linux, how could I generate a diff between two XML files?

Ideally, I would like to be able configure it to some things strict, or loosen some things, like whitespace, or attribute order.

I'll often care that the files are functionally the same, but diff by itself, would be annoying to use, especially if the XML file doesn't have a lot of linebreaks.

For example, the following should really be okay to me:

<tag att1="one" att2="two">
  content
</tag>

<tag att2="two" att1="one">
  content
</tag>

qedi

Posted 2009-12-07T16:27:57.780

Reputation: 1 291

Answers

92

One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.

$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Or as a one-liner.

$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)

Jukka Matilainen

Posted 2009-12-07T16:27:57.780

Reputation: 2 304

1and xmllint ships with OS X – ClintM – 2016-09-20T16:17:39.230

11In case it wasn't obvious, c14n is an abbreviation for canonicalization. – Brandin – 2016-11-08T17:53:30.263

4It is better to execute an additional step before diff - formatting of both XMLs (xmllint --format). Because I've noticed that without this step diff shows more differences than necessary. – ka3ak – 2016-12-09T12:07:03.260

another run with xmllint --format may be helpful (see other answers) – törzsmókus – 2019-06-03T06:43:22.690

18You can do it in one line too vimdiff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml) – Nathan Villaescusa – 2013-03-03T01:53:44.037

1Never knew about the --c14n switch in xmllint. That's handy. – qedi – 2009-12-10T20:21:51.837

26

Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".

$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

$ xmllint | grep c14
    --c14n : save in W3C canonical format v1.0 (with comments)
    --c14n11 : save in W3C canonical format v1.1 (with comments)
    --exc-c14n : save in W3C exclusive canonical format (with comments)

$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686

$ cat /etc/system-release
CentOS release 6.5 (Final)

Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.

rjt

Posted 2009-12-07T16:27:57.780

Reputation: 878

19

Tried to use @Jukka Matilainen's answer but had problems with white-space (one of the files was a huge one-liner). Using --format helps to skip white-space differences.

xmllint --format one.xml > 1.xml  
xmllint --format two.xml > 2.xml  
diff 1.xml 2.xml  

Note: Use vimdiff command for side-by-side comparison of the xmls.

GuruM

Posted 2009-12-07T16:27:57.780

Reputation: 291

1This was the option I needed. Supposedly the most canonical version can be obtained by combining --format with --exc-c14n; will probably be still slower to process :( – ᴠɪɴᴄᴇɴᴛ – 2014-11-27T14:05:57.713

It's been quite some time since I wrote the answer, but I faintly remember using the --exc-c14n flag. However, diff-ing the output with/without the flag showed no differences so just stopped using it. Dropping unnecessary/unused flags might make the process faster. – GuruM – 2014-12-21T06:49:09.670

6The --exc-c14n option specifies sorting of the attributes. In your specific files the attributes probably were already sorted, but the general advice would be to use the combination --format --exc-c14n. – ᴠɪɴᴄᴇɴᴛ – 2014-12-22T14:33:59.180

In my case two.xml was generated from one.xml by a script. So I just needed to check what was added/removed by the script. – GuruM – 2012-08-08T10:36:29.170

7

Diffxml gets the basic functionality correct, though it doesn't seem to offer many options for configuration.

Edit: Project Diffxml has been migrated to GitHub since 2013.

dsolimano

Posted 2009-12-07T16:27:57.780

Reputation: 2 778

not useful for large files though, died after eating 40GB (RAM + SWAP) when comparing two files ~20k lines each – Grzegorz – 2017-10-26T06:27:49.177

note that project appears to be dead, with last update in 2013 – reducing activity – 2018-10-09T07:53:54.137

It's not quite there yet, but it looks promising at least. – qedi – 2009-12-07T17:02:45.667

5

If you wish to also ignore the order of child elements, I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

joh

Posted 2009-12-07T16:27:57.780

Reputation: 1 295

1

My Python script xdiff.py for comparing XML files ignores differences in whitespace or attribute order (in contrast to element order).

In order to compare two files 1.xml and 2.xml, you would run the script as follows:

xdiff.py 1.xml 2.xml

In the OP's example, it would output nothing and return exit status 0 (for no structural or textual differences).

In cases where 1.xml and 2.xml differ structurally, it mimics the unified output of GNU diff and returns exit status 1. There are various options for controlling the output, such as -a for outputting all context, -n for outputting no context, and -q for suppressing output altogether (while still returning the exit status).

Andreas Nolda

Posted 2009-12-07T16:27:57.780

Reputation: 11

0

I use Beyond Compare to compare all types of text based files. They produce versions for Windows and Linux.

Alan

Posted 2009-12-07T16:27:57.780

Reputation: 191

2Beyond Compare really sucks for this. It seems to just not be aware of XML elements and do mostly just text comparison. – Rob K – 2016-05-23T17:54:54.433

Beyond Compare has an XML plugin but I was never able to install it properly, so... Nyeah... I came to this page and got wiser... – Erk – 2019-03-14T15:02:56.440

1Plain text comparisons would say the two lines differed, whereas the OP wants them to be reported as the same. – ChrisF – 2009-12-07T16:33:06.867

4i.e. Canonically compare the XML. – Chris W. Rea – 2009-12-09T20:08:35.690

-1

Not sure whether (the dependence of) an online tool counts as a solution but, for what it's worth, I got good result in this online XML comparison tool. It simply works.

RayLuo

Posted 2009-12-07T16:27:57.780

Reputation: 131

-1

Our SD Smart Differencer compares documents based on structure as opposed to actual layout.

There's an XML Smart Differencer. For XML, that means matching order of tags and content. It should note that the text string in the specific fragment you indicated was different. It presently doesn't understand the XML notion of tag attributes indicating whether whitespace is normalized vs. significant.

Ira Baxter

Posted 2009-12-07T16:27:57.780

Reputation: 499

1In your SO profile you provide full disclosure about your employer; I'd have preferred a short disclaimer inside your answer as well :) BTW, I tried to download an evaluation copy, but the request form is 'smart' (via JS) enough to disable the combination XML with Smart Differencer (also the latter in combination with Python, although possible according to the SD product page)? – ᴠɪɴᴄᴇɴᴛ – 2014-11-27T14:03:25.433

1Ah. Thanks for the reminder. This is an answer from a time before there was a clear SO policy on this. I'm revising the answer to signal the relationship in SO policy compliant answer. – Ira Baxter – 2014-11-27T15:52:16.833

I'll check the download page; not all of our live products make into that list. Yes, these exist. – Ira Baxter – 2014-11-27T15:53:53.157

I checked the download page. Yes, the XML smart differencer is not there. I'll have the back-room guys work on fixing that; should be there in 1-2 weeks at most (they have a backlog, don't we all?) In meantime, if you want to try it, send email (see bio). – Ira Baxter – 2015-01-26T20:16:04.860

1Linked page has no word "XML" in it. – reducing activity – 2018-10-09T07:52:58.220

@MateuszKonieczny: oops, somehow that didn't get fixed, sorry. I just fixed it, it will be on the web in few days when we do our next web deploy. – Ira Baxter – 2018-10-09T16:02:41.390