0

I have various bash scripts that run, they scan log files for folks who try and make mischief, spam us and so on.

I have been struggling with this for some days trying to figure this out.

I have a text file with a list of with ip's

I use sed to scan the list and remove ip's such as our ip and other known ips and ip ranges. That get added to this list from people who perhaps make a mistake.

for example

In the fist line I am trying to match all 45.182.32.165 and any ip beginning with 45.

Ideally I would like remove the following 45.0.0.0/8 and or any of the net masks up to /24

sed -i '' '/^45.*.*.*/d'  /directory/blocked_subnets/somelist
sed -i '' '/^50.81.238.*/d'  /directory/blocked_subnets/somelist
sed -i '' '/^50.84..*/d'  /directory/blocked_subnets/somelist

These lines do work bit sometimes not as intended.

I have tried various regex's I have found on the net but they don't seem to work.

I was hoping someone who more experience in this can help me refine this sed -i to work properly.

The '' are because I am doing this on a freebsd machine.

Thank you in advance.

alexander.polomodov
  • 1,060
  • 3
  • 10
  • 14

3 Answers3

1

Thank you for your help.

Alas none of the above suggestions would work for me.

After much reading and experimentation

I found had to add the -r (to activate the regex) before the -i and this is the format I used for the regex, which seems to work.

sed -r -i '' '/^120[.]152[.][0-9]{1,3}[.][0-9]{1,3}/d' /path/to/some/file

to remove the ip 120.152.35.192 from the file "file"

I tested the regex on the "The Regedx Coach" it seems valid.

However I would welcome any additional input and suggestions to refine the above.

Regards

  • `-r` doesn't really "activate" the regex, it's indicating that you're using Extended Regular Expressions (as opposed to Basic Regular Expressions). This mostly affects what has to be escaped: quantifiers in BRE are `\{1,3\}`; in ERE, it's just `{1,3}`. The characters with changing escaping behaviour between BRE and ERE are `|`,`{}`,`()`, and `+`. For GNU sed, using `-r` (or the equivalent `-E`) or not using it only changes what has to be escaped and what not. – Benjamin W. Nov 05 '18 at 20:19
0

BTW: This works with BusyBox versions of the commands too.

For anybody else looking for an answer to this, you can grep -n the range, cut the line number, then pass that to the sed command to delete the line.

There are a couple of ways to do this (single line or one step at a time with variables), but here is the onliner:

grep -nm1 "^192.168.254.0/24" ./blocked_subnets.txt | cut -d \: -f 1 | xargs -n 1 -I {} sed -i "{}d" ./blocked_subnets.txt

if you use variables, you dont need to use xargs for the sed step:

N=$(grep -nm1 "^192.168.254.0/24" ./blocked_subnets.txt | cut -d \: -f 1)
sed -i "${N}d" ./blocked_subnets.txt

You need the starts with (caret) character to catch some corner cases, you also need the {} shell substitution in the 2nd example because the next character is a letter ([a-zA-Z], which means "$Nd" fails). The m1 stops grep after the 1st match.

EDIT:

Yes the dot is a wildcard characater match, but it matches the . in an IP address and we dont have to escape it either, so there is no pre-processing of the IP range (or IP address) needed, just verification, and grep will fix that too (might need to 2>/dev/null the grep though).


EXTRA:

If you are going to do the removal from ip directly (or a log that contains ip output when it was added), you need to change the caret (^) character to a space ( ) character in the grep match (to catch those same corner cases).

0

I think your problem is that the . is working as a wildcard matching any character, instead of that use \. which would match a literal ..

Unfortunately I don't know the differences with FreeBSD, but as this is a somewhat simple regex I imagine it should work. I would use as a generic regex (very simplified) for any IP the following:

'/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/d'
  • ^: Starts with
  • [0-9]: Any digit
  • +: The preceding match happens 1 or more times
  • \.: A literal dot

So now if I wanted to match what you required I would change each octet to match the network:

sed -i '' '/^45\.[0-9]+\.[0-9]+\.[0-9]+/d' /directory/blocked_subnets/somelist
sed -i '' '/^50\.81\.238\.[0-9]+/d' /directory/blocked_subnets/somelist
sed -i '' '/^50\.84\.[0-9]+\.[0-9]+/d' /directory/blocked_subnets/somelist

Also, consider that this will only work if each line of the list starts with the IP (Not even spaces before the IP). If this is not the case just remove the ^

See: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html

Jorge Valentini
  • 504
  • 3
  • 9