Scripting: what is the easiest to extract a value in a tag of a XML file?

14

7

I want to read a pom.xml ('Project Object Model' of Maven) and extract the version information. Here is an example:

<?xml version="1.0" encoding="UTF-8"?><project 
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mycompany</groupId>
    <artifactId>project-parent</artifactId>
    <name>project-parent</name>
    <version>1.0.74-SNAPSHOT</version>
    <dependencies>
        <dependency>
        <groupId>com.sybase.jconnect</groupId>
        <artifactId>jconnect</artifactId>
        <version>6.05-26023</version>
    </dependency>
    <dependency>
        <groupId>joda-time</groupId>
        <artifactId>joda-time</artifactId>
        <version>1.5.2</version>
    </dependency>
    <dependency>
        <groupId>com.sun.jdmk</groupId>
        <artifactId>jmxtools</artifactId>
        <version>1.2.1</version>
    </dependency>
    <dependency>
        <groupId>org.easymock</groupId>
        <artifactId>easymock</artifactId>
        <version>2.4</version>
    </dependency>       
</dependencies>
</project>

How can I extract the version '1.0.74-SNAPSHOT' from above?

Would love to be able to do so using simple bash scripting sed or awk. Otherwise a simple python is preferred.

EDIT

  1. Constraint

    The linux box is in a corporate environment so I can only use tools that are already installed (not that I cannot request utility such as xml2, but I have to go through a lot of red-tape). Some of the solutions are very good (learn a few new tricks already), but they may not be applicable due to the restricted environment

  2. updated xml listing

    I added the dependencies tag to the original listing. This will show some hacky solution may not work in this case

  3. Distro

    The distro I am using is RHEL4

Anthony Kong

Posted 2011-12-20T22:01:47.093

Reputation: 3 117

http://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash – Ciro Santilli 新疆改造中心法轮功六四事件 – 2015-10-07T10:57:49.950

Is this http://stackoverflow.com/questions/29004/parsing-xml-using-unix-terminal sufficient?

– bbaja42 – 2011-12-20T22:08:43.480

Not really. There are a lot of version tag in the xml (e.g. under dependencies tag). I only want '/project/version' – Anthony Kong – 2011-12-20T22:20:40.603

Which xml-related tools and libraries are available? Are jvm-based soltuions OK? – Vi. – 2011-12-20T23:22:14.773

So far I can tell xml2, xmlgrep and perl XML module are not present. Most unix command-line utilities are present. The distro is Redhat EL 4. – Anthony Kong – 2011-12-20T23:38:52.677

(I couldn't add a comment so I have to reply as an answer, overkill somewhat) Some great answers can be found here..... http://stackoverflow.com/questions/2735548/xslt-document-function-returns-empty-result-on-maven-pom/2737427#2737427

– JStrahl – 2013-01-18T10:12:10.267

Answers

17

xml2 can convert xml to/from line-oriented format:

xml2 < pom.xml  | grep /project/version= | sed 's/.*=//'

Vi.

Posted 2011-12-20T22:01:47.093

Reputation: 13 705

6

Other way: xmlgrep and XPath:

xmlgrep --text_only '/project/version' pom.xml

Disadvantage: slow

Vi.

Posted 2011-12-20T22:01:47.093

Reputation: 13 705

command updated to xml_grep – GAD3R – 2019-05-01T11:05:26.140

6

Using python

$ python -c 'from xml.etree.ElementTree import ElementTree; print ElementTree(file="pom.xml").findtext("{http://maven.apache.org/POM/4.0.0}version")'
1.0.74-SNAPSHOT

Using xmlstarlet

$ xml sel -N x="http://maven.apache.org/POM/4.0.0" -t -m 'x:project/x:version' -v . pom.xml
1.0.74-SNAPSHOT

Using xmllint

$ echo -e 'setns x=http://maven.apache.org/POM/4.0.0\ncat /x:project/x:version/text()' | xmllint --shell pom.xml | grep -v /
1.0.74-SNAPSHOT

kev

Posted 2011-12-20T22:01:47.093

Reputation: 9 972

cat (//x:version)[1]/text() when using xmllint also works! – kev – 2011-12-21T05:50:51.520

5

Clojure way. Requires only jvm with special jar file:

java -cp clojure.jar clojure.main -e "(use 'clojure.xml) (->> (java.io.File. \"pom.xml\") (clojure.xml/parse) (:content) (filter #(= (:tag %) :version)) (first) (:content) (first) (println))"

Scala way:

java -Xbootclasspath/a:scala-library.jar -cp scala-compiler.jar scala.tools.nsc.MainGenericRunner -e 'import scala.xml._; println((XML.load(new java.io.FileInputStream("pom.xml")) match { case <project>{children @ _*}</project> => for (i <- children if (i  match { case <version>{children @ _*}</version> => true; case _ => false;  }))  yield i })(0) match { case <version>{Text(x)}</version> => x })'

Groovy way:

java -classpath groovy-all.jar groovy.ui.GroovyMain -e 'println (new XmlParser().parse(new File("pom.xml")).value().findAll({ it.name().getLocalPart()=="version" }).first().value().first())'

Vi.

Posted 2011-12-20T22:01:47.093

Reputation: 13 705

This is awesome! Great idea! – Anthony Kong – 2011-12-21T00:06:12.337

4

Here's an alternative in Perl

$ perl -MXML::Simple -e'print XMLin("pom.xml")->{version}."\n"'
1.0.74-SNAPSHOT

It works with the revised/extended example in the questions which has multiple "version" elements at different depths.

RedGrittyBrick

Posted 2011-12-20T22:01:47.093

Reputation: 70 632

Slow, (although faster than xmlgrep) – Vi. – 2011-12-20T22:58:55.407

3

Hacky way:

perl -e '$_ = join "", <>; m!<project[^>]*>.*\n(?:    |\t)<version[^>]*>\s*([^<]+?)\s*</version>.*</project>!s and print "$1\n"' pom.xml

Relies on correct indentation of the required <version>

Vi.

Posted 2011-12-20T22:01:47.093

Reputation: 13 705

Thanks for the suggestion, but unfortunately it will not return what I want. Please see the updated pom model. – Anthony Kong – 2011-12-20T23:14:24.230

Returns "1.0.74-SNAPSHOT". Note that I changed the script after reading about multiple <version> things. – Vi. – 2011-12-20T23:17:58.000

Note: this solution is provided "just for fun" and is not intended to be used in actual product. Better use xml2/xmlgrep/XML::Simple solution. – Vi. – 2011-12-20T23:18:37.690

Thanks! even though it is 'just for fun' but it is probably the 'most suitable' solution by far because it has minimum number of dependencies: It only requires perl ;-) – Anthony Kong – 2011-12-20T23:22:31.470

What about doing it from Java? Using pom files implies having JVM installed. – Vi. – 2011-12-20T23:25:18.737

The background is that I am building a SIT (system integration test) script around the existing maven process. Part of it requires knowing the version of the maven project. I really want to keep it simple and scripting is the way to go. – Anthony Kong – 2011-12-20T23:36:32.790

There is also python script in other answer (plus my improved version of it). May be check it? – Vi. – 2011-12-20T23:40:35.777

3

Work out a very clumsy, one-liner solution

python -c "from xml.dom.minidom import parse;dom = parse('pom.xml');print [n for n in dom.getElementsByTagName('version') if n.parentNode == dom.childNodes[0]][0].toxml()" | sed -e "s/.*>\(.*\)<.*/\1/g"

The sed at the end is very ugly but i was not able to print out the text of the node with mindom alone.

Update from _Vi:

Less hacky Python version:

python -c "from xml.dom.minidom import parse;dom = parse('pom.xml');print [i.childNodes.item(0).nodeValue for i in dom.firstChild.childNodes if i.nodeName == 'version'].pop()"

Update from me

Another version:

    python -c "from  xml.dom.minidom import parse;dom = parse('pom.xml');print [n.firstChild.data for n in dom.childNodes[0].childNodes if n.firstChild and n.tagName == 'version']"

Anthony Kong

Posted 2011-12-20T22:01:47.093

Reputation: 3 117

2

XSLT way:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="text"/>

        <xsl:template match="/">
                <xsl:for-each select="*[local-name()='project']">
                    <xsl:for-each select="*[local-name()='version']">
                        <xsl:value-of select="text()"/>
                    </xsl:for-each>
                </xsl:for-each>
        </xsl:template>
</xsl:stylesheet>
xalan -xsl x.xsl -in pom.xml

Vi.

Posted 2011-12-20T22:01:47.093

Reputation: 13 705

If xsltproc is on your system, and it probably is as libxslt is on RHEL4, then you can use it and the above stylesheet to output the tag, i.e. xsltproc x.xsl prom.xsl. – fpmurphy – 2011-12-21T05:12:05.910

2

if "There are a lot of version tag in the xml" then you better forget about doing it with "simple tools" and regexps, that won't do.

try this python (no dependencies):

from xml.dom.minidom import parse

dom = parse('pom.xml')
project = dom.getElementsByTagName('project')[0]
for node in project.childNodes:
    if node.nodeType == node.ELEMENT_NODE and node.tagName == 'version':
        print node.firstChild.nodeValue

Samus_

Posted 2011-12-20T22:01:47.093

Reputation: 176

What exactly does this script do? – Simon Sheehan – 2011-12-22T01:41:29.883

it loads the XML as a DOM structure using Python's minidom implementation: http://docs.python.org/library/xml.dom.minidom.html the idea is to grab the <project> tag that is unique and then iterate over its child nodes (direct childs only) to find the tag <version> that we're looking for and not other tags with the same name in other places.

– Samus_ – 2011-12-22T15:17:40.800

1

awk works fine without using any extra tools.
cat pod.xml

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.networks.app</groupId>
  <artifactId>operation-platform</artifactId>
  <version>1.0.0</version>
  <packaging>tar.xz</packaging>
  <description>POM was created by Sonatype Nexus</description>
</project>

simple and legible way to get the value of <packaging> tag:

cat pod.xml | awk -F'[<>]' '/packaging/{print $3}'

user5723841

Posted 2011-12-20T22:01:47.093

Reputation: 121

1This does appear to work, but beware: What it does is set the field separator (FS) to the set of characters < and >; then it finds all lines with the word "packaging" in them and give you the third field. – SMerrill8 – 2020-02-11T23:44:42.287

1

Here is a one-liner using sed:

sed '/<dependencies>/,/<\/dependencies>/d;/<version>/!d;s/ *<\/\?version> *//g' pom.xml

chickenkiller

Posted 2011-12-20T22:01:47.093

Reputation: 261

1Relies on absence of parameters in elements and that extra <version>s can be only inside dependencies. – Vi. – 2011-12-21T16:33:55.993

0

Return_text_val=$(xmllint --xpath "//*[local-name()='$TagElmnt']" $FILE )

Here, try this:

$TagElmnt - TagName
$FILE - xml file to parse

Vijayababu

Posted 2011-12-20T22:01:47.093

Reputation: 1

0

I know your question says Linux but if you have the need to do this on Windows without the need of any 3rd party tools such that you can put it in a batch file, Powershell can extract any node from the your pom.xml file like so:

powershell -Command "& {select-xml //pom:project/pom:properties/pom:mypluginversion -path pom.xml -Namespace  @{pom='http://maven.apache.org/POM/4.0.0'} | foreach {$_.Node.Innerxml}}" > myPluginVersion.txt

Peter Lubczynski

Posted 2011-12-20T22:01:47.093

Reputation: 11

Powershell is now open source and runs on Linux and other platforms. We use it for building in preference to bash, cygwin and ming64. – Charlweed – 2019-08-01T21:54:42.170

0

sed -n "/<name>project-parent/{n;s/.*>\(.*\)<.*/\1/p;q}" pom.xml

The -n option avoids printing non-matching lines; first match (/.../) is on the line before the one with wanted text; the n command skips to next line, where s extracts relevant info thru a capturing group (\(...\)), and a backreference (\1). p prints out, q quits.

SΛLVΘ

Posted 2011-12-20T22:01:47.093

Reputation: 1 157

2Can you expand your answer to explain this? Thanks. – fixer1234 – 2015-10-27T01:42:03.270