Replace every 6th pipe in powershell

6

2

I realize I'm asking a similar question that was already asked and answered but I was not able to extrapolate the answer I needed since the regex and regex engine is different enough. I have hardware asset management logs which are pipe delimited but not are major delimited between endpoints. The logs look like this:

|STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3

What I would like to do is replace every 6th | with a carriage return to look like this:

|STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1
|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2
|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3

The closest I've gotten selects each endpoint but I'm not quite sure how to utilize it using powershell.

[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*\|[^\|]*

I'm familiar with the replace command in PS and I'm imagining the end result would be something to this effect:

$hosts = $hosts -replace "<highspeed_low_drag_velcro_snap_regex_here>","\r\n"

Thanks in advance!

Tensore

Posted 2018-01-03T02:23:30.723

Reputation: 63

@JakeGould I would consider this a bit of a special case, because the OP is specifically seeking a PowerShell solution. Not only is it a different regex engine (.NET vs the PCRE that Notepad++ uses), but the replacement text is actually specified differently too. – Bob – 2018-01-03T03:36:39.093

Answers

8

Ok, so this one's actually a little tricky. Arguably, regex isn't the best tool for the job, but it can do it.

-replace "(?<=^((\|[^|]*){5})+)\|","`n|"

I'll try to walk you through it:

  • Your text has a section you want to match and a section you want to replace. Traditionally, regex replaces the entire search string, so you would use a capture group to specify some part of the search string to be cloned to the replacement output. Another way is to use a lookaround, which is what I've done here. PowerShell (.NET) is one of the few regex languages that supports variable-length lookbehinds, so we're in luck.
  • The (?<=) section is a lookbehind. That means everything between the = and ) is matched but not replaced. So ^((\|[^|]*){5})+ is used as a condition - the replacement will only happen if this bit matches the text before the intended replacement.
  • The ^((\|[^|]*){5})*[^|]* section can be summed up as "from the start of the line (^), match sets of five |s, and then match the text up to the next |".
    • The start of the line ^ is important - otherwise it can match anywhere in the line and there's no guarantee of how many |s came before.
    • Because | has a special meaning in regex, it needs to be escaped: \|. It does not need to be escaped when within a character class ([]).
    • [^|]* means "text up to the next |" — more technically, "as many characters other than | as possible" — more technically "repeat the [^|] character class as many times as possible, where that character class matches any character other than |".
    • * means "zero or more repetitions of the previous character, as many as possible"
    • So (\|[^|]*) means match | followed by as many characters as possible up till the next |. This will match |text
    • {5} means repeat the previous token exactly 5 times. It's exactly equivalent to copy-pasting the preceding token 5 times. So this will match |text|text|text|text|text
    • ((\|[^|]*){5})+ is one or more repetitions of that entire group. So it can match |text|text|text|text|text, |text|text|text|text|text|text|text|text|text|text, etc. - in multiples of 5. The reason we use + instead of * is we don't want to match the empty group and replace the very first |.
    • And that makes the entire lookbehind, meaning it will only replace a | with exactly a multiple of 5 |s behind it, from the start of the line.
  • Following that up with a \| as the actual text to replace, preceded by the matched lookbehind.
  • Taking your example |STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1|STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2|STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3, it will match the following:

    |STATUS1|HOSTNAME1|IP1|MAC1|IS_WIRED1**|**STATUS2|HOSTNAME2|IP2|MAC2|IS_WIRED2**|**STATUS3|HOSTNAME3|IP3|MAC3|IS_WIRED3
    

You'll notice here (if you haven't already) that you're actually trying to replace every 5th | minus the first, not every 6th. But the lookbehind method handles the "minus the first" situation fairly cleanly.


And now the replacement string.

  • Because this is PowerShell, when we want \n, we actually want `n because the PowerShell escape character is `. Note that this is only necessary in the replacement string; in the regex itself you would still use \n to pass that literal sequence to the regex engine.
  • And because you have a leading | on every line, we need to add a new | after the new line. This works out because your original lines do not end with a |, therefore there is nothing to replace at the end of the lines, therefore we don't end up with an extra new line nor trailing |.

If you prefer the more traditional capture group method:

-replace "((?:[^|]+\|){4}[^|]+)\|","`$1`n|"

Figuring out how this works is left as an exercise to the reader ;) Tip: the $1 backreference has to be escaped (with `) because otherwise PowerShell interprets it as a shell variable.

Bob

Posted 2018-01-03T02:23:30.723

Reputation: 51 526

Worked like a glove! You also answered a few other questions about PS I had, you're a scholar and a gentleman! Thanks! – Tensore – 2018-01-03T13:18:21.440