Merge only new entries from one xml file to another

2

I have 2 xml files. The 2nd file has some new entries including same entries as that of the 1st file. Examples of the 2 files are given below.

File 1

<SERVERNAME_ONE>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.0.231</ServerIP>
</SERVERNAME_ONE>

File 2

<SERVERNAME_ONE>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.1.21</ServerIP>
</SERVERNAME_ONE>
<SERVERNAME_TWO>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.13.231</ServerIP>
</SERVERNAME_TWO>

After merge

<SERVERNAME_ONE>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.0.231</ServerIP>
</SERVERNAME_ONE>
<SERVERNAME_TWO>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.13.231</ServerIP>
</SERVERNAME_TWO>

When i merge 2nd file with the 1st file,the merge should happen in such a way that only new entries in the 2nd file must be merged, i.e, already existing entries in 1st file must remain as they are. There is sdiff command to merge interactively. But i want to automate the merge process. How do i merge these files?

Mathew

Posted 2016-03-15T07:49:08.050

Reputation: 79

Answers

0

XMLs are can be and usually are tricky to be handled with the good old shell tools; one has to use XML parsers, looking for nodes. However, if and only if the format of your files is really as simple as written (line breaks are always there, the important tags are not nested in other tags and they start the lines) then it can be done with start-tag-to-end-tag pattern matching.

 $ cat mergexml.awk

FILENAME!=fn { ++fcnt; fn = FILENAME }

fcnt == 1 {
   print
   str = $0
   if ( inside ) {
      if ( str ~ "^ *</ *" tag " *> *$") {
         inside = 0
      }
   } else {
      gsub( /^ *< *| *> *$/, "", str)
      if ( str ~ /^[[:alnum:]_]+$/) {
         tag = str
         f1tags[tag] = ""
         inside = 1
      }
   }
}

fcnt == 2 {
   str = $0
   if ( inside ) {
      print
      if ( str ~ "^ *</ *" tag " *> *$") {
         inside = 0
      }
   } else {
      gsub( /^ *< *| *> *$/, "", str)
      if ( str ~ /^[[:alnum:]_]+$/) {
         tag = str
         if ( ! (tag in f1tags)) {
            inside = 1
            print
         }
      }
   }
}

$ awk -f mergexml.awk file1 file2
<SERVERNAME_ONE>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.0.231</ServerIP>
</SERVERNAME_ONE>
<SERVERNAME_TWO>
        <Protocol>FTP</Protocol>
        <ServerIP>192.168.13.231</ServerIP>
</SERVERNAME_TWO>

The order of the files in the command line is important.

Gombai Sándor

Posted 2016-03-15T07:49:08.050

Reputation: 3 325

@Sándor, when i run the above code im getting the following error awk: syntax error near line 7 awk: illegal statement near line 7 awk: syntax error near line 10 awk: bailing out near line 10 – Mathew – 2016-03-16T05:28:51.073

I tested this using gnu awk but I don't think there's anything in it that nawk or mawk would not know. But just to be sure, you can give it a try calling it this way: gawk -f mergexml.awk file1 file2 . On the other hand, if you put the files you tried with on some public place, I can also check if there's something in them we did not expect. – Gombai Sándor – 2016-03-16T08:37:24.243