Replace leading tabs and spaces with sed

0

I want to replace leading tabs and spaces with something like <TAB> and <SPACE> respectively. But I couldn't figure out how to do it in a single pass of sed because tabs and spaces in the original file can be intermixed, so simply doing one replacement and than another doesn't work.

Input example (tabs shown as ^):

^^line with tabs
  line with spaces
^ ^intermixed

Desired output:

<TAB><TAB>line with tabs
<SPACE><SPACE>line with spaces
<TAB><SPACE><TAB>intermixed

Amomum

Posted 2019-02-12T21:41:42.583

Reputation: 113

Answers

2

I know you said you want to use sed, which is often a wonderful tool. But where there are choices and loops, I find that awk outshines it.

#!/usr/bin/gawk -f
{ while (/^\s/) {
    if (sub(/^ /,"")) printf "<space>";
    if (sub(/^\t/,"")) printf "<tab>";
    }
  print;
}

If we create a file input.txt containing the input example, and name the script replace, it's run as follows, which produces the desired output.

replace input.txt

UPDATE: Oops. There's an infinite loop in that code. The sequence \s matches [ \t\n\r\f\v], so if there's a stray form feed, it'll spin forever. But [:blank:] matches just space and tab, so the second line should be this.

{ while (/^[[:blank:]]/) {

Ken Jackson

Posted 2019-02-12T21:41:42.583

Reputation: 281

This is a really elegant solution. – zx485 – 2019-02-13T01:26:20.393

That is very readable solution! I guess there is no point using sed when awk solution looks simpler. Thank you! – Amomum – 2019-02-13T20:21:20.770

0

One solution with sed, it splits the line to separate tabs and spaces at the start from the rest of the line, to avoid replacing any tabs and spaces in the text.

echo -e '\t\tline with\ttabs
  line with spaces
\t \tintermixed' | sed -r '

    # On the lines that start with tab or space.
    /^[\t ]/ {

        # Put the whole line in the hold space.
        h

        # Delete all tabs and spaces at the start of line.
        s/^[\t ]+//

        # Exchange pattern and hold spaces.
        # This saves the text part to the hold space and
        # bring back the original line to the pattern space.
        x

        # Now let in pattern space only tabs and spaces
        # at the start of line (the rest is on hold space).
        s/^([\t ]+).*/\1/

        # At least make the substitutions.
        s/\t/<TAB>/g
        s/ /<SPACE>/g

        # Add a \n (new line) at the end of pattern space,
        # then get the content of hold space and append it
        # to pattern space.   
        G

        # Delete the extra \n added above.
        s/\n//
    }'
<TAB><TAB>line with     tabs
<SPACE><SPACE>line with spaces
<TAB><SPACE><TAB>intermixed

Paulo

Posted 2019-02-12T21:41:42.583

Reputation: 606