Converting tabs to spaces in many files

11

4

I have a lot of files with tabs littered throughout, and I'd like to convert them all into spaces. I know about the expand command, but unfortunately I would have to type out every single file using it. Is there any easier way to do this on Linux?

person

Posted 2010-07-02T04:51:36.617

Reputation: 213

Answers

12

Try the following:

find ./ -type f -exec sed -i 's/\t/ /g' {} \;

If you want four spaces, try:

find ./ -type f -exec sed -i 's/\t/    /g' {} \;

Nicolas Raoul

Posted 2010-07-02T04:51:36.617

Reputation: 7 766

That will replace each tab by a single space. Since person mentioned using expand, I assume s/he wants the alignment of the text preserved. – garyjohn – 2010-07-02T05:27:28.953

You need to have 's/\t/ /g' to replace more than just one tab per line. – Daniel Andersson – 2012-03-28T20:49:21.013

1A substantial speedup if there are many files is doing "find ./ -type f -exec sed -i ’s/\t/ /g’ {} +" (that is, "+" instead of "\;"), if the find version supports it (and I haven't personally met any version that doesn't, but it's not a POSIX standard, so I guess it could happen on some systems. See "-exec command {} +" in the manual). Instead of launching one instance of sed for every file, this will build an argument list with as many file name arguments as the system supports (getconf ARG_MAX=2097152 on my system), just like xargs, and thus launch much fewer sed processes. – Daniel Andersson – 2012-03-29T06:43:13.673

6Note to any Mac users who find this: OS X's version of sed doesn't understand the \t tab escape sequence. You can replace it with a literal tab character, which you can enter in the shell by [Ctrl]+V, [Tab]. – Jeremy Banks – 2012-12-17T20:44:55.710

expand is probably better than sed for this, as explained in: http://stackoverflow.com/a/11094620/131824 – David Weinraub – 2013-11-16T16:09:22.117

6

There are lots of ways to do this. There are also lots of ways to shoot yourself in the foot while doing this if you're not careful or if you're new to Linux as you appear to be. Assuming that you can create a list of files that you want to convert, either by using something like find or manually with an editor, just pipe that list into the following.

while read file
do
   expand "$file" > /tmp/expandtmp
   mv /tmp/expandtmp "$file"
done

One way you can shoot yourself in the foot with that is to make a typo so that you wind up mv'ing an empty file to all of the file names you specify, thereby deleting the contents of all your files. So be careful and test whatever you do first on a small set of files that you have backed up.

garyjohn

Posted 2010-07-02T04:51:36.617

Reputation: 29 085

Don't forget expand -t 4 to expand tabs to 4 spaces. Also, this method can create trailing newlines. But otherwise it works. – mgold – 2014-09-13T19:02:48.023

3Make the mv conditional on the success of expand: expand ... && mv ... – Paused until further notice. – 2010-07-02T10:04:04.450

3

find . -type f -iname "*.js" -print0 | xargs -0 -I foo tab2space foo foo

-I foo creates a template variable foo for each input line, so you can refer to the input more than once.

-print0 and -0 tell both commands to use \0 as a line separator instead of SPACE, so this command works for paths with spaces.

Dustin Getz

Posted 2010-07-02T04:51:36.617

Reputation: 263

1

find -name \*.js -exec bash -c 'expand -t 4 "$0" | tee "$0"' {} \;

Cons:
files larger than the pipe buffer size (64KB) get truncated

Pros:
no temp files
files larger than the pipe buffer size get truncated

raylu

Posted 2010-07-02T04:51:36.617

Reputation: 485

0

I gave this problem a shot with the following requirements in mind:

  • Filter the files based on their names, to process for instance only .cpp or .json file
  • Support parallel processing. In case there are many files, this can provide a huge speed-up
  • The solution should fit in one line for easy use

The last requirement was the most difficult to fulfil because "expand" doesn't allow to modify the files in place.

I came up with the following solution:

find . -type f -regextype egrep -regex '.*\.(c|cpp|h|hpp)'  -print0 | xargs -0 -n 1 -P 10 -IFILE bash -c ' ( echo "Processing FILE..." && expand -t 4 "FILE" > /tmp/expand.$$ && mv /tmp/expand.$$ "FILE" ) || exit 255'

Here is some explanation:

  • "find" finds the files to process. "-regextype egrep" allows to filter them based on their name and a regular expression in the "egrep" format
  • the "-type f" parameter makes sure that we will match only regular files, not for instance directories or anything else special
  • the "-regexp" parameter is the regular expression itself, which matches in this case any file that ends with .c, .cpp, .h or .hpp (the whole name has to match, so "file.c2" wouldn't, which is what we want)
  • "-print0" instructs "find" to print the file paths on its standard output with the character 0 at the end of each path. Together with the option "-0" for "xargs", it allows to pass names containing return carriages from one tool to the other (even if it's a pretty rare situation...)
  • xargs starts a new process for each path ("-n 1"), but might run as much as 10 processes in parallel ("-P 10")
  • xargs uses the alias "FILE" to pass each file path to the command, which is a bash script
  • the bash script calls "expand" and saves the result in a temporary file which names contains the current process ID ($$), so that all processes running in parallel at a given file use different temporary files
  • the whole command uses the pattern ( command1 && command2 && command3 ) so that the process will stop if any subcommand returns an error
  • if there is any error from the previous "&&" chain, the bash script will return an exit code 255 that will cause xargs to stop immediately

ocroquette

Posted 2010-07-02T04:51:36.617

Reputation: 101

0

This is better:

find . -name *.java ! -type d -exec bash -c 'expand -t 4 "$0" > /tmp/e && mv /tmp/e "$0"' {} \;

oDarek

Posted 2010-07-02T04:51:36.617

Reputation: 1

3Why is this better? It's not a great idea to use /tmp/e because if anything else is using that file, this will mess it up. Like if two users wanted to use this at the same time. – Kevin Panko – 2014-03-08T17:35:25.830