0
How can I match and remove first and second pattern within xml tags using sed or awk?
Here is the example
<data>A78-1-1134-HI-1</data>
<data>T78-12-1346-AG-2</data>
<data>G78-4-2156-Ag-6</data>
<data>A78-10-1971-Hh-10</data>
This is the result I am trying to get:
<data>1134</data>
<data>1346</data>
<data>2156</data>
<data>1971</data
Can it be done in one line? This is what I tried:
sed 's/^.*<data>[[:alnum:]]-[0-9]-/<data>/g;s/-[a-Z].*<\/data>$//g'
Or removing just a first pattern, when I use sed to print then it works:
sed -n 's/^.*<data>.*[[:alnum:]]-[0-9]-/<data>/p' file.xml | grep data
But then this command will not work:
sed 's/^.*<data>.*[[:alnum:]]-[0-9]-/<data>/' file.xml
I am getting a correct printed data by using perl command from your #3 solutions. How do I change the command so it will make the change to xml. <data> tags are not the only tags in xml, and there are space in the front of <data> tag. – milan_K – 2013-04-20T19:03:56.120
@user2302372, see updated answer. – terdon – 2013-04-20T19:36:40.197
Perfect! That was exactly what I need. – milan_K – 2013-04-20T19:45:40.547