Multi line find and replace

1

I recently moved about 30k images from Picasa to imgur . I need to replace all those links in my wordpress blog . I have exported all the posts to xml file ( it is 96mb) . I tried this powershell command (taken from one of su's question ) to test its working and it took about 5 mins to replace just one url .

(Get-Content test.txt) | ForEach-Object { $_ -replace "foo", "bar" } | Set-Content test2.txt

Any other way to replace thousands of url quickly ? Platform - Windows 7 . Can install any software needed .

Renuka

Posted 2014-11-14T11:22:31.320

Reputation: 151

You could use sed.exe: there are a number of Windows ports of this Unix utility, including in Microsoft's Services for Unix. To produce an edited copy of test1.txt in test2.txt, use sed <test.txt "s/foo/bar/g" >test2.txt. It should be pretty quick, as it simply reads each line and writes the edited version all in a single pass, but I don't know how it will perform if there are no new-lines in the source file, which would cause a problem for any text editor. – AFH – 2014-11-14T11:58:00.283

I have this port http://unxutils.sourceforge.net/ . And tried command from this question http://stackoverflow.com/questions/7555707/find-and-replace-a-url-with-grep-sed-awk . It didn't work for me . Your command and the one in the question look different .

– Renuka – 2014-11-14T12:04:13.927

That command is basically the same as mine, except that I used standard input instead of passing the file name as a parameter. The syntax of the edit strings may be quite complex, depending on how many special characters (like .) are used which are of special meaning in regular expressions. It might also be better to use a delimiter character other than /, as this will occur quite often in URLs (eg "s:foo:bar:g"). – AFH – 2014-11-14T12:18:33.007

I changed ' for that questions' answer to " ( like in your command " and it worked for a single replacement . How do I make it work multiple lines ? – Renuka – 2014-11-14T12:21:19.560

The reason for the different quotes is that single-quote in Unix has a special meaning which which it doesn't have in Windows. You will need to give some examples of what you want to replace by what in order to see what is going on. – AFH – 2014-11-14T12:29:02.243

I tried a single command like this

sed "s/http:\/\/www.picasa.com\/dogs.png/http:\/\/i.imgur.com  \/blabla.jpg/g" x.xml > y.xml

And it worked . For 30k urls I created a batch script like this

echo sed "s/http://www.picasa.com/dogs.png/http://i.imgur.com/blabla.jpg/g" x.xml > y.xml sed "s/http://www.picasa.com/cats.jpg/http://i.imgur.com/haha.jpg/g" x.xml > y.xml pause

This gives me error "sed is not recognized as internal or external command " . – Renuka – 2014-11-14T12:34:49.507

I can't make out what your command is meant to do, perhaps because of the format restrictions in comments. But your single command will replace every instance of http://www.picasa.com/dogs.png in x.xml (though the . should be \.). You can move the discussion to chat if you want, but I am going out shortly and won't be back for around 7 hours. – AFH – 2014-11-14T13:03:28.750

Let us continue this discussion in chat.

– Renuka – 2014-11-14T13:07:08.360

Answers

1

I think notepad++ can open a file that large without issues. If it can I would just do a find/replace all. You can use a regex in notepad++ if necessary. If that does not work just write a find replace script in your language of choice. I have a nice one I made in Python. I can share it if necessary.

ubiquibacon

Posted 2014-11-14T11:22:31.320

Reputation: 7 287

How ? Npp doesn't support multi line search and replace . I tried a plugin NppToolBucket but it takes only 200 lines each time in my case . Yes please share it . I can install python . – Renuka – 2014-11-14T11:38:32.807

NPP does support multi-line find/replace via their "Extended" mode and via regular expressions. Line ending characters are \n and \r. Depending on your file format your line endings could have both characters at the same time like \r\n. Here is my script if you want to try it out: find_replace.py. Read the comment at the top of the script.

– ubiquibacon – 2014-11-14T13:10:14.483

I renamed the file to fr.py .I get an error when I run the file http://i.imgur.com/sKEc034.jpg .

– Renuka – 2014-11-14T13:36:56.557

The print statement changed to a function in Python 3. Use Python 2.7.X or update the script, whichever is easiest for you. – ubiquibacon – 2014-11-14T13:45:47.593

I ran the file . It gave me options something grep.py ...... . I have a csv file where cells in A1 is to be replaced with B1 . I tried this command grep.py -c file.csv inputs xmlfile.xml . It gave me error grep.py is not recognized command . – Renuka – 2014-11-14T14:02:19.280

I used to have the file named grep.py and that is still in the help menu. You can name the file whatever you want, but whatever it is named that is what you have to call from the command line. Try running something like the command seen below. Put the script in the same folder as your XML file to make it easier on yourself so you don't have to type a bunch of paths out. You also don't have to use a CSV file, you can hard code it in the scripts STR_DICT variable: python find_replace.py -csv "map.csv" "your_file.xml". – ubiquibacon – 2014-11-14T14:38:01.863

Tried this . fr.py -csv csvfile.csv xmlfile.xml . Gave a error "sv doesnt exist ...aborting " . Removed sv . Just -c . It looked like it was woking then gave these errors . http://i.imgur.com/lnbUQ1i.jpg

– Renuka – 2014-11-14T14:53:37.530

Let us continue this discussion in chat.

– ubiquibacon – 2014-11-14T15:11:55.407

If this answer was helpful please mark it as such. – ubiquibacon – 2014-11-16T01:34:43.843

Sorry . Was not online . It worked . Log file was 10777 lines searched 1 files in 1:21:06.506 . Thanks again :) . – Renuka – 2014-11-16T10:44:00.090