Replace string in string but not when its somewhere between double square bracket

0

I have a long string from mysql field containing the data of a mediawiki page. I have to replace a string in that mediawiki page but not when the string is in mediawiki link. The mediawiki link is identified by a double square bracket.

e.g. Replace "Term" in "Here is the Term that has to be replaced" but do not replace "Here [[is the Term that]] must not be replaced"

The solution can be mysql or sed, awk or whatever. Please help. Thank you.

BSDGuy

Posted 2014-02-20T09:36:04.247

Reputation: 1

Is the exact term filling the entire space between the square brackets []? – Wally – 2014-02-20T15:27:52.093

No, it is not. There can be whitespaces or other charachters too. – BSDGuy – 2014-02-21T10:30:56.857

Thank yozu for your great Solution. I will try them and will report. – BSDGuy – 2014-02-21T10:32:13.957

Answers

0

I prefer to address this sort of thing by using a tool that can split the string using a multi-character delimiter. You can use your "excluded pattern" as the delimiter and then do replacements on elements which are not the delimiter. Because I like perl, I'll do a perl one-liner here. :)

First, because "perl" wasn't one of your suggested solutions, I'm guessing perl isn't something you're strong in. So I'll start with some things you need to know about perl to understand how this works:

If you put parens around the split pattern in the perl split function, the separator is retained as an additional element in the array returned by split. Using \[\[.*?\]\] gets us the smallest string contained between [[ and ]], so in the returned array, we can select elements which don't start with [[ and do the replacement on only those elements. With foreach and map, $_ will be a reference (pointer) to the array element, so changes to $_ change the array elements. Thus, after changing the array, we can just join the potentially-modified elements and delimiters - still in the right order - back together with empty characters. Also, I like using unless() while other people prefer if(!) (same with my preference of using q{} rather than '', because '' looks kinda like " and "" looks like '''' ;)). This isn't code golf, and I think it's more readable my way. :)

Oh, and just in case this is also new: perl -lne - the -l transparently handles newlines, which I guess we really don't care about here, but it's habit. The -n puts the code inside of a while(<>){}.

With all that said, here's a working (but nonsensical) example replacing every non-link "a" with "pie":

danny@host [/home/danny]
$ cat testfile
a b c d [[a]] b c d [[ moo a moo]] a
I like to eat [[meat]] on a plate
danny@host [/home/danny]
$ perl -nle'@l=split(/(\[\[.*?\]\])/); foreach (@l){s/a/pie/g unless(/^\[\[/)};
print join(q{}, @l)' testfile
pie b c d [[a]] b c d [[ moo a moo]] pie
I like to epiet [[meat]] on pie plpiete

dannysauer

Posted 2014-02-20T09:36:04.247

Reputation: 837

0

Just use pywibot's replace.py.

replace.py -exceptinside:link -regex "Term" "New term"

The other answers are both wrong (unreliable) and unnecessarily complicated.

Nemo

Posted 2014-02-20T09:36:04.247

Reputation: 1 050

-1

Here's some code that should work for you. Tested on bash on AIX:

#!/usr/bin/bash
#filename: test2.sh
searchandreplace() {
   thisline=$1
   echo $thisline | awk '
BEGIN { FS= "[" }
/\[/ {sub(/Term/,"foobar");print}
!/\[/ {print}
'
}

infile=test.in
cat $infile | while read line
do
   searchandreplace "$line"
done

Here's test.in:

"Here is the Term that has to be replaced" "Here [[is the Term that]] must not be replaced"
third line

Example when run: screenshot showing the script working

Wally

Posted 2014-02-20T09:36:04.247

Reputation: 499

I don’t understand how this can be considered to work correctly.  What it does is change (only) the first Term on any line that contains a [, and leaves lines that don’t contain a [ alone.  So Here is the Term that has to be replaced is left untouched, but Here [[is the Term that]] must not be replaced is changed to Here [[is the foobar that]] must not be replaced — exactly backwards!  [[Term1]] Term2 becomes [[foobar1]] Term2 — again, backwards.  Term Term Term [other becomes foobar Term Term [other.  … (Cont’d) – G-Man Says 'Reinstate Monica' – 2015-04-06T02:39:15.923

(Cont’d) …  Also (1) the FS="[" seems to have no effect, (2) there’s no reason to process the file one line at a time, and (3) this is a Useless Use of cat.

– G-Man Says 'Reinstate Monica' – 2015-04-06T02:40:32.290

This will not replace Term if it appears anywhere after a link on the line, and does not differentiate between a [ or [[. Of course, although it doesn't work in the general case, it may solve this specific problem. :) – dannysauer – 2014-02-20T16:34:41.883

@dannysauer as I understand how awk FS works, any new link starting with [[ will be a new "field" so it'll then search it again. Well, the OP can take it or leave it. Your idea did look shorter than mine, so I'd try it first myself if I were him. – Wally – 2014-02-20T16:58:25.413

Lots of people don't like perl, and it's always good to have multiple possible solutions. Especially solutions which don't contain a zillion slashes like my perl regexp. :) – dannysauer – 2014-02-20T20:56:16.043