How to replace first n no. of occurrence of a string using sed and/or awk?

1

0

I got a file alphabet where a has multiple occurrence in a line.

$ cat alphabet
a b c d e f g 
h i j k a a l
m n a p q r a
s t u v w a x
y z a k l q z

where

$ cat alphabet | grep -o a | wc -l
7

Now how can I replace only first 3 occurrence of a with Z so that my file looks like as follows

Z b c d e f g 
h i j k Z Z l
m n a p q r a
s t u v w a x
y z a k l q z

Neel

Posted 2015-12-11T21:46:17.033

Reputation: 234

Answers

2

awk '{
    for (i=1; i<=NF; i++) 
        if ($i == "a" && n < 3) {
            n++
            $i = "Z"
        }
    print
}' alphabet

Or, "one-liner"-ed

awk '{for (i=1;i<=NF;i++) if ($i=="a" && n++<3) $i="Z"; print}' alphabet

glenn jackman

Posted 2015-12-11T21:46:17.033

Reputation: 18 546

While this may answer the question, it would be a better answer if you could provide some explanation why it does so. – DavidPostill – 2015-12-12T00:22:29.100

1It's not very complicated code. Do you have any particular questions? – glenn jackman – 2015-12-12T04:50:51.807

No, but future readers of the answer might ... – DavidPostill – 2015-12-12T08:16:44.540

1Note that this will find only occurrences of a that are separate words.  While this is true of the example data, it is not specified as being true of the real data.  Also, if this changes a line that contains strings of multiple spaces (e.g., c   a   t        d o g), it will cause those spaces to collapse to single ones (c Z t d o g). – G-Man Says 'Reinstate Monica' – 2015-12-12T21:56:50.737

4

Perl to the rescue:

perl -pe '$c++ while $c < 3 && s/a/Z/' alphabet

choroba

Posted 2015-12-11T21:46:17.033

Reputation: 14 741

-1 He said using sed and/or awk, and you didn't even mention them – barlop – 2015-12-11T22:16:37.917

Compact and beautiful. You also have Perl in most cases you've got Bash – SΛLVΘ – 2015-12-12T05:41:29.060

2

Here, the sed way

sed -E ':a;N;$!ba;s#a#Z#;s#a#Z#;s#a#Z#' alphabet

Since sed normaly works on lines, any command to sed will act only on 1 line at a time. To be able to replace only the first 3 occurances we need to first make the whole file a single selection on which we will do our 3 replacements. Otherwise we will do 3 replacements on each line.

  • :a creates a label
  • N appends next line into patterns space
  • $! skips last newline
  • ba branches to label a

We have now selected the whole file and will be acting on that space instead on one line at a time, do 3 replacements of "a" with "Z".

The above command will only work on GNU sed, more general but a bit uglier version that should work on non-GNU sed:

sed -e ':a' -e 'N' -e '$!ba' -e 's#a#Z#' -e 's#a#Z#' -e 's#a#Z#' alphabet

EDIT: As suggested in the comments, adding version which uses g command to first replace all occurrences of 'a' with 'Z' and then replace all occurrences of 'Z' after 3th with 'a' again, which effectively leads to replacing only the first 3 occurrences of 'a'. This way you can change the last number to reflect the number of substitutions you need.

sed -e ':a;N;$!ba;s#a#Z#g;s#Z#a#g4' alphabet

EvilTorbalan

Posted 2015-12-11T21:46:17.033

Reputation: 254

In order to have a more general solution, I'd set number of chars (+1) to be replaced: ':a;N;$!ba;s#\n#<NL>#g;s#a#Z#g;s#Z#a#4g;s#<NL>#\n#g' i.e. replace all occurrences with "Z", then put back "a" in place starting from the 4th occurrence of "Z". – SΛLVΘ – 2015-12-12T07:54:22.173

1Actually I think s#\n#<NL>#g / s#<NL>#\n#g is not needed... – EvilTorbalan – 2015-12-12T10:09:35.317

@SalvoF I have considered this but in my tests it was not working in some versions of GNU sed when using gN > 4, probably a bug not sure... – EvilTorbalan – 2015-12-12T10:23:19.140

Which versions? I tried three different ones, one of which running on Vista (where I had to use double quotes instead of single, and to double newlines' backslash) – SΛLVΘ – 2015-12-12T11:12:04.750

1

The awk solution that has been posted assumes that all the occurrences of a are separate words.  While this is true of the example data, it is not specified as being true of the real data.  The following awk solution is more in the spirit of the perl solution that has been posted:

awk '{ while (changes < 3  &&  sub("a", "Z") > 0) changes++; print }' alphabet

This replaces (substitutes) occurrences of a with Z until the changes counter reaches 3.  Of course, to actually change the file, you will need to do something like

awk '{while (c < 3 && sub("a","Z")>0) c++; print}' alphabet > t && cp t alphabet && rm t

where t is a temporary file.

G-Man Says 'Reinstate Monica'

Posted 2015-12-11T21:46:17.033

Reputation: 6 509

On GNU awk, starting from version 4.10, you have in-place file editing

– SΛLVΘ – 2015-12-12T08:12:18.673