sed replace all tabs and spaces with a single space

Question

I got a string like the following:

test.de.          1547    IN      SOA     ns1.test.de. dnsmaster.test.de. 2012090701 900 1000 6000 600

now I want to replace all the tabs/spaces inbetween the records with just a single space so I can easily use it with cut -d " "

I tried the following:

sed "s/[\t[:space:]]+/[:space:]/g"

and various varions but couldn't get it working. Any ideas?

Does your `cut` supports `-w` option? – Kondybas Aug 17 '14 at 09:10 — Kondybas, Aug 17 '14 at 09:10

score 53 · Accepted Answer · edited Aug 17 '14 at 08:51

53

Use sed -e "s/[[:space:]]\+/ /g"

Here's an explanation:

[   # start of character class

  [:space:]  # The POSIX character class for whitespace characters. It's
             # functionally identical to [ \t\r\n\v\f] which matches a space,
             # tab, carriage return, newline, vertical tab, or form feed. See
             # https://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes

]   # end of character class

\+  # one or more of the previous item (anything matched in the brackets).

For your replacement, you only want to insert a space. [:space:] won't work there since that's an abbreviation for a character class and the regex engine wouldn't know what character to put there.

The + must be escaped in the regex because with sed's regex engine + is a normal character whereas \+ is a metacharacter for 'one or more'. On page 86 of Mastering Regular Expressions, Jeffrey Friedl mentions in a footnote that ed and grep used escaped parentheses because "Ken Thompson felt regular expressions would be used to work primarily with C code, where needing to match raw parentheses would be more common than backreferencing." I assume that he felt the same way about the plus sign, hence the need to escape it to use it as a metacharacter. It's easy to get tripped up by this.

In sed you'll need to escape +, ?, |, (, and ). or use -r to use extended regex (then it looks like sed -r -e "s/[[:space:]]\+/ /g" or sed -re "s/[[:space:]]\+/ /g"

edited Aug 17 '14 at 08:51

Community

1

answered Sep 23 '12 at 18:24

Starfish

2,716
24
28

Does this remove tabs too? Can you explain why you use `\+` instead of just `+`? – Zulakis Sep 23 '12 at 18:27
Okay, I understand. [[:space:]] is equal to [ \t\r\n\v\f]. But can you please explain why you use `\+` – Zulakis Sep 23 '12 at 18:47
3

[[:space:]] is equivalent to '\s', so the shorter version is "s/\s\+/ /g" – 3molo Sep 23 '12 at 18:47
2

Basic regular expressions use a backslash prior to a plus sign when used to mean “one or more of the previous character or group”, source https://developer.apple.com/library/mac/#documentation/opensource/conceptual/shellscripting/RegularExpressionsUnfettered/RegularExpressionsUnfettered.html. – 3molo Sep 23 '12 at 18:51
Ahh, I understand! I did not know that there were different regex versions. Thanks – Zulakis Sep 23 '12 at 18:54
This seems wrong. at least on OSX I have to do `sed -E "s/[[:space:]]+/ /g" when we go to extended you dont need to escape the + sign for multiplicity anymore. – UpAndAdam Dec 05 '19 at 23:01

Benjamin W. · Answer 2 · 2018-10-26T17:55:27.293

8

You can use the -s ("squeeze") option of tr:

$ tr -s '[:blank:]' <<< 'test.de.          1547    IN      SOA     ns1.test.de. dnsmaster.test.de. 2012090701 900 1000 6000 600'
test.de. 1547 IN SOA ns1.test.de. dnsmaster.test.de. 2012090701 900 1000 6000 600

The [:blank:] character class comprises both spaces and tabs.

edited Oct 26 '18 at 17:55

answered Feb 01 '18 at 04:09

Benjamin W.

187
2
9

cognativeorc · Answer 3 · 2020-07-23T15:08:10.487

Here are some interesting methods I found via experiments (using xxd to see tabs).

echo -e \\033c
s=$(echo -e "a\t\tb\t\tc\t\td\t\te\tf")

echo 'original string with tabs:'
echo "$s"
echo "$s" | xxd

echo -e '\nusing: \techo "$s" | tr -s \\\\t " "'
echo "$s" | tr -s \\t " "
echo "$s" | tr -s \\t " " | xxd

echo -e '\nusing: \techo "$s" | sed '"'s/\\\\t/ /g'"
echo "$s" | sed 's/\t\+/ /g'
echo "$s" | sed 's/\t\+/ /g' | xxd

echo -e '\nusing: \techo ${s/ / }'
echo ${s/ / }
echo ${s/ / } | xxd

z=$(echo $s)
echo -e '\nusing: \tz=$(echo $s); echo "$z"'
echo "$z"
echo "$z" | xxd

echo -e '\nusing: \tread s < file.in; echo $s'
read s < file.in
echo $s
echo $s | xxd

echo -e '\nusing: \twhile read s; do echo $s; done'
while read s;
do
  echo $s
done < file.in

score -2 · Answer 4 · edited Sep 30 '15 at 16:58

-2

I like using the following alias for bash. Building on what others wrote, use sed to search and replace multiple spaces with a single space. This helps get consistent results from cut. At the end, i run it through sed one more time to change space to tab so that it's easier to read.

alias ll='ls -lh | sed "s/ \+/ /g" | cut -f5,9 -d" " | sed "s/ /\t/g"'

edited Sep 30 '15 at 16:58

RolandoMySQLDBA

16,364
3
47
80

answered Sep 29 '15 at 20:40

CNS Security miked

1
1

How does this answers the question? – Læti Apr 15 '17 at 19:18

sed replace all tabs and spaces with a single space

4 Answers4