How can I substitute multiple instances of a character with the same number of instances of a different character in linux sed?

2

I need to substitute a repeating set of characters (2 or more) with the exact number of replacement characters. I need to do this either with sed, or within vi.

Examples

"abc,,,def" becomes "abc|||dev"
"1245d,,,,,22" becomes "1245d|||||22"

Thanks

xman

Posted 2017-06-02T06:07:19.707

Reputation: 21

So you need a global replacement of a character with another one, in a file? – Alex – 2017-06-02T06:16:40.757

Yes, but only where there are 2+ repetitions of it. – xman – 2017-06-02T06:52:39.483

This can be easily achieved with perl: perl -lape 's/,{2,}/"|" x length($&)/ge'. Too bad you can't use it (why?). It can be called from within vi, don't know whether that's admissible in your situation. – simlev – 2017-09-19T15:00:13.947

Answers

1

Pipe through sed, like

echo "abc,,,def" | sed 's/,/|/g'

but I would recommend to use

tr ',' '|'

in this case.

jvb

Posted 2017-06-02T06:07:19.707

Reputation: 1 697

This would not work. As mentioned in the question, it needs to match 2 or more reps of the character. – xman – 2017-06-02T06:51:43.123

thanks for clarification, my fault. So you want to replace only the parts in brackets: echo "a,bc,,,def" | sed -E 's/(,,+)/(\1)/g' -> a,bc(,,,)def - researching now... – jvb – 2017-06-02T07:06:24.317

partial "solution": echo "a,b,,c,,,c" | sed 's/,,,,/||||/g; s/,,,/|||/g; s/,,/||/g' (expand the pattern if needed). Not really a nice solution, but might be of use if the maximum number of repetitions is known. – jvb – 2017-06-02T08:57:08.650

True... your approach would work if we know the max number of reps. Would be good if there is a generic solution though. – xman – 2017-06-02T09:51:29.207

What about echo "a,b,,c,,,c" | sed -E 's/([^,]),([^,])/\1#\2/g; s/,/|/g; s/#/,/g'? This will convert every single "," to "#", then all remaining (=multiple) "," to "|", and then all saved "#" to ",". But it needs a character which is unused in the input stream (# here). – jvb – 2017-06-02T10:20:54.420

this would work. You are right in that it needs an unused character... which fortunately is available. – xman – 2017-06-09T03:22:37.023

1

Excuse me for not commenting, I don't have 50 reputation.

This solution will fail if there is more patterns, like abc,,,def,g.

sed -n 's/[^,],,/&/;tsubs;p;d;:subs s/,/|/g;p' <<<'abc,,,def
abc,,def
abc,,,def,g
abc,def'

Paulo

Posted 2017-06-02T06:07:19.707

Reputation: 606

0

This is virtually impossible, due to the way regular expressions work. As jvb already pointed out, a solution is simple (though not necessarily short) if the maximum number of consecutive source characters is known. If not, it is possible to change all source characters first, and then change back the single characters in a second step. However, this only works if the target character does not occur in the input stream, or if you can use a character known not to occur in the input stream as an intermediate target.

Furthermore, you need to take the corner cases of a single source character occuring a the beginning or the end of the line into account. Thus:

tr ',' '|' < file | sed 's/\([^|]\)|\([^|]\)/\1,\2/;s/^|\([^|]\)/,\1/;s/\([^|]\)|$/\1,/'

or

sed 's/,/|/g;s/\([^|]\)|\([^|]\)/\1,\2/;s/^|\([^|]\)/,\1/;s/\([^|]\)|$/\1,/' file

A solution using a language that has a notion of string length would be more robust.

Michael Vehrs

Posted 2017-06-02T06:07:19.707

Reputation: 255