Extract version string from filename

3

We have a list of filenames in file a.txt with their version string embedded in them. Eg:

gson-2.1
xmlParserAPIs-2.4.0
acrobat-1.1
orai18n-mapping-12.1.0.2
jdbc-se2.0
eclipse-core-runtime-20070801
trove-2.0.1
antisamy-1.3
javax.annotation
dojo-4342
org.json-0.0.1
castor-1.2-jdo

We tried cat a.txt | tr -d "[:alpha:]-_" | less, but it doesn't look right.

eg
2.1
2.4.0
1.1
1812.1.0.2   <--- wrong
2.0
20070801
2.0.1
1.3
.
4342
.0.0.1
1.        <--- wrong

Any help is appreciated.

chz

Posted 2015-06-09T02:34:45.673

Reputation: 339

Answers

3

It isn't possible to perfectly match all of those strings as you've listed them as there's no way to tell the difference between something like "orai18n-" and "-se2.0". If you create a regex that looks for strings of digits and dots that begin with a dash, you'll match all but the "java-se2.0" string pretty well:

sed 's/.*-\([0-9\.][0-9\.]*\).*/\1/'

(Depending on your version, you can use sed -r to allow the use of [0-9.]+)

This produces an output where all recognised versions are extracted and the entire string displayed where nothing suitable is found:

2.1
2.4.0
1.1
12.1.0.2
jdbc-se2.0
20070801
2.0.1
1.3
javax.annotation
4342
0.0.1
1.2

Smiling Dragon

Posted 2015-06-09T02:34:45.673

Reputation: 421

3

You can use grep:

grep -oP '(?<=-)([0-9]+\.?)+' a.txt

That extracts all version numbers. If a line contains no version (for example javax.annotation) nothing is printed.

The regex:

  • (?<=-): first look for a dash (-), but it should not be a part of the match
  • [0-9]+: search for numbers, they should appear at least one or multiple times
  • ([0-9]+\.?)+: at dot (.) can be present or not, and all that must occure at least once.

chaos

Posted 2015-06-09T02:34:45.673

Reputation: 3 704

Hi Chaos, thanks for responding. There's a slight problem. – chz – 2015-06-09T15:30:34.090

There are a few instances where the version string has the last "." eg "4.2.0." ; we piped it to sed 's/.$//g' to get 4.2.0 – chz – 2015-06-09T15:40:29.990

Hi chaos, how do we grep the filename without the file version in each string ? – chz – 2015-06-09T16:40:33.657

0

You can try the small script below:

cat a.txt | sed 's/[-/a-zA-Z]//g'

Gautam Jose

Posted 2015-06-09T02:34:45.673

Reputation: 128

5

Your post needs to be expanded. A good answer includes specific instructions (not just links to them) and an explanation as to how or why the answer addresses the OPs question. Please edit your post to add detail explaining how your solution addresses the OPs question.

– I say Reinstate Monica – 2015-06-09T13:13:33.557