Grep tool for XML

22

8

I am looking for a good tool to perform grep-like operations on XML - for example, extract certain attributes only.

Grep itself can't handle it - any DFA-equivalent tool can handle only non recursive matches, and mine may be recursive.

I have tried xgrep, but it is quite unstable, and I want a stable and reliable tool.

Any recommendations?

EDIT: I prefer open source tools that work well under Linux.

Adam Matan

Posted 2009-08-05T12:59:04.247

Reputation: 5 930

Question was closed 2015-02-14T03:46:48.657

Answers

21

XMLStarlet (Wikipedia) is a command line tool which comes close to grep.  It is open source software (MIT license) and works well on Linux and Windows.

The XMLStarlet website describes it as follows.

XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.

The Debian/Ubuntu package is named xmlstarlet. But beware: Contrary to what the manpage says, the binary is named xmlstarlet in Debian/Ubuntu and not xml.

There are also Windows binaries on SourceForge.

For a nice little introduction, see IBM's Start working with XMLStarlet.

Ludwig Weinzierl

Posted 2009-08-05T12:59:04.247

Reputation: 7 695

I tried to clone it, but it seems that the repository is broken. – Hola Soy Edu Feliz Navidad – 2018-05-04T15:35:30.653

Remove the trailing slash from the first link. – Bkkbrad – 2009-08-23T20:27:25.780

I can't get it to work... It never matches on any xpath except '/' (the whole document), which is pretty worthless :( – Hendy Irawan – 2012-01-13T16:29:07.797

@HendyIrawan - Are you sure it's not how you're trying to use xpath? (Like your XML has a default namespace that you're not accounting for?) – Daniel Haley – 2012-03-02T16:15:19.053

5

A tool that works under Linux is xml_grep. It fully understands XML and is not a line-by-line tool.

xml_grep is included as a stand-alone tool in the XML::Twig package. The grepping functionality is quite powerful as it supports XPath specifications.

Sample command-line (extracting posts edited after the middle of February from the triology Data Dump):

xml_grep -p --cond="row[@LastEditDate>'2010-02-14']"  posts.xml  > lateEditedPosts.xml

Installation is easy. Either

  • sudo cpan -i "XML::Twig", as described in the xml_grep cookbook referenced below.

or


More information:

The best introduction I have found for xml_grep is xml_grep cookbook, about two pages. Other:

Peter Mortensen

Posted 2009-08-05T12:59:04.247

Reputation: 10 992

I have fixed a broken link, but the triology Data Dump link is also broken. I will see what I can do. – Peter Mortensen – 2016-12-14T20:52:51.347

5

The XPath syntax in various languages is best for finding things in xml. In fact one of the tools recommended by the makers of xgrep is basically a Perl XML parser that accepts XPath input.

jweede

Posted 2009-08-05T12:59:04.247

Reputation: 6 325

0

I would advise NOT to use a grep-like tool on XML, but use a library to parse XML in stead.
What exactly do you need it for? Any programming language? I think the .NET built-in XML parser would fit the job easily if you're willing to write a program for it though.

Update: for Linux, a well known XML parser library is libxml2.

fretje

Posted 2009-08-05T12:59:04.247

Reputation: 10 524

0

XMLSpy is an amazing tool, if a bit spendy.

JP Alioto

Posted 2009-08-05T12:59:04.247

Reputation: 6 278