This is mostly off-topic, but you could use
find -maxdepth 1 -type f -name '*.txt' | xargs python -c '
import fileinput
for line in fileinput.input(inplace=True):
print line.replace("blah", "blee"),
'
The main benefit here (over ... xargs ... -I {} ... sed ...
) is speed: you avoid invoking sed
10 million times. It would be faster still if you could avoid using Python (since python is kind of slow, relatively), so perl might be a better choice for this task. I'm not sure how to do the equivalent conveniently with perl.
The way this works is that xargs
will invoke Python with as many arguments as it can fit on a single command line, and keep doing that until it runs out of arguments (which are being supplied by ls -f *.txt
). The number of arguments to each invocation will depend on the length of the filenames and, um, some other stuff. The fileinput.input
function yields successive lines from the files named in each invocation's arguments, and the inplace
option tells it to magically "catch" the output and use it to replace each line.
Note that Python's string replace
method doesn't use regexps; if you need those, you have to import re
and use print re.sub(line, "blah", "blee")
. They are Perl-Compatible RegExps, which are sort of heavily fortified versions of the ones you get with sed -r
.
edit
As akira mentions in the comments, the original version using a glob (ls -f *.txt
) in place of the find
command wouldn't work because globs are processed by the shell (bash
) itself. This means that before the command is even run, 10 million filenames will be substituted into the command line. This is pretty much guaranteed to exceed the maximum size of a command's argument list. You can use xargs --show-limits
for system-specific info on this.
The maximum size of the argument list is also taken into account by xargs
, which limits the number of arguments it passes to each invocation of python according to that limit. Since xargs
will still have to invoke python quite a few times, akira's suggestion to use os.path.walk
to get the file listing will probably save you some time.
1It would be faster if you can avoid invoking
sed
for each file. I'm not sure if there's a way to open, edit, save, and close a series of files insed
; if speed is essential you may want to use a different program, perhaps perl or python. – intuited – 2011-03-14T05:40:56.193@intuited: it would be even faster to not do anything to the files at all ... seriously? if you want to change a pattern in a set of files you have to look into each file to see, if there is the pattern. if you know in advance that you can skip 'some' files, then its obvious faster to not even touch the files. and the startup time for
sed
is probably faster than launchingpython
orperl
as well, except if you do everything in that interpreter. – akira – 2011-03-14T09:41:16.067@akira: Are you saying that launching perl or python once for as many files as will fit on a command line is more expensive than launching sed once for each of those files? I would be really surprised if that were the case. —————— I guess you didn't understand that my suggestion is to invoke (start) the editing program once (or at least fewer times — see my answer), and have it open, modify and resave each of the files in turn, rather than invoking the editing program separately for each of those files. – intuited – 2011-03-14T17:21:37.440
your first comment does not reflect what you really wanted to say: "replace sed by python/perl" .. by just doing that and looking @ the commandline OP has given, an innocent reader could assume that "find . -exec python" is faster than "find . -exec sed" .. which is obviously not the case. in your own answer you call python much more often than it is actually needed. – akira – 2011-03-14T20:26:01.807
I think that akira misinterpreted your (intuited) suggestion. I believe that you were suggesting to bunch files together. I tried that with my xargs attempt, time to try it again :) – Sandro – 2011-03-14T20:47:02.767
@Sandro: your 'xargs -0 sed -i' calls sed already on nr_x of files and is not launched for each file. i find @intuited's first comment just misleading because he provides only half of what he has in mind. and his answer left out the interesting part (for others) as well. – akira – 2011-03-14T21:58:51.350
Sandro: Crazy! I think for the benefit of the community, you should explain how you ended up in this situation. How big is the directory entry itself? Probably several hundred megs. What filesystem are you using? The xargs option might work if you use -n to limit the number of args per sed run. – deltaray – 2011-03-15T00:41:14.613