23

I've come across a situation where a client needs to blacklist a set of just under 1 million individual IP addresses (no subnets), and network performance is a concern. While I would conjecture that IPTables rules would have less of a performance impact than routes, that's just conjecture.

Does anyone have any solid evidence or other justification for favoring either IPTables or null routing as solution for blacklisting long lists of IP addresses? In this case everything is automated, so ease-of-use isn't really a concern.

EDIT 26-Nov-11

After some testing and development, it appears that none of these options are workable. It appears that both route lookups and iptables do linear searches through the ruleset, and take simply too long to process this many rules. On modern hardware, putting 1M items in an iptables blacklist slows the server down to about 2 dozen packets per second. So IPTables and null routes are out.

ipset, as recommended by Jimmy Hedman, would be great, except that it doesn't allow you to track more than 65536 addresses in a set, so I can't even try to use it unless someone has any ideas.

Apparently the only solution for blocking this many IPs is doing an indexed lookup in the application layer. Is that not so?


More Information:

The usage case in this instance is blocking a "known offenders" list of IP addresses from accessing static content on a web server. FWIW, doing blocking through Apache's Deny from is equally slow (if not more so) as it also does a linear scan.


FYI: Final working solution was to use apache's mod_rewrite in conjunction with a berkeley DB map to do lookups against the blacklist. The indexed nature of berkeley DBs allowed the list to scale with O(log N) performance.

tylerl
  • 14,885
  • 7
  • 49
  • 71
  • 5
    Isn't ipset (http://ipset.netfilter.org/) more less designed to handle this type of problem? – Jimmy Hedman Nov 25 '11 at 22:49
  • @JimmyHedman: You should make that an answer. And then add a suggestion to benchmark doing it all 3 ways :) – MikeyB Nov 26 '11 at 00:15
  • I'm curious if you can give a little more information about what problem you're trying to solve? Perhaps blocking 1M IP addresses is not the way to fix the problem? – SpacemanSpiff Nov 26 '11 at 22:00
  • It would help a lot to know why you want to block this many addresses, and wether you want to filter INPUT or FORWARD traffic. – Juliano Nov 26 '11 at 22:02
  • Here you can see how ipset make iptables rules about 11x faster than regular iptables rules. http://daemonkeeper.net/781/mass-blocking-ip-addresses-with-ipset/ Hope this help. – Alien Torres Nov 09 '15 at 16:45
  • Ipset is only limited by the amount of system memory you have available, the 65536 ip limit is only the default max, you can use the `maxelem` configuration to increase it, I am able to quickly and efficiently block over 170000 ip address *ranges*. – user3338098 Nov 20 '18 at 18:04

6 Answers6

15

try using iptables and building multi-level tree to decrease number of lookups.

iptables -N rules_0_0_0_0_2
iptables -N rules_64_0_0_0_2
iptables -N rules_128_0_0_0_2
iptables -N rules_192_0_0_0_2
iptables -N rules_0_0_0_0_4
iptables -N rules_16_0_0_0_4

iptables -A INPUT -p tcp --dport 80 -s 0.0.0.0/2 -j rules_0_0_0_0_2
iptables -A INPUT -p tcp --dport 80 -s 64.0.0.0/2 -j rules_64_0_0_0_2
iptables -A INPUT -p tcp --dport 80 -s 128.0.0.0/4 -j rules_128_0_0_0_2
iptables -A INPUT -p tcp --dport 80 -s 192.0.0.0/4 -j rules_192_0_0_0_2

iptables -A rules_0_0_0_0_2 -s 0.0.0.0/4 -j rules_0_0_0_0_4
iptables -A rules_0_0_0_0_2 -s 16.0.0.0/4 -j rules_16_0_0_0_4

and so on - adding nesting levels; obviously you'll need an automatic way of building the rules and you should have chains just for the networks where you have one or more offenders - in this way you can reduce number of lookups that have to be done quite significantly and i think it might actually work.

pQd
  • 29,561
  • 5
  • 64
  • 106
  • This does sound as if it could work, effectively reducing the time complexity of the firewall rule lookup from O(n) to O(log n), effectively building a search tree in iptables. – Per von Zweigbergk Nov 26 '11 at 22:44
  • If you do decide to go this route, it would probably be useful to use some algorithm to build a balanced binary search tree from the list of IP addresses, and then simply dumping the structure of that search tree as a set of IPtables rules. – Per von Zweigbergk Nov 26 '11 at 22:52
  • @Per von Zweigbergk - indeed... or just prune the tree early where there is no need to make deeper lookups. although loading that ungodly amount of rules will take a lot of time. – pQd Nov 27 '11 at 11:01
  • This is a very good approach. Obviously it requires a bit of processing to maintain, but it's the right idea, I think. – tylerl Nov 28 '11 at 05:41
  • @tylerl - i suggest you use iptables-save on some computer to see the format and your mechanism uses iptables-restore to load the rules - it's way way way quicker than invoking iptables for every single rule. – pQd Nov 28 '11 at 10:18
  • @pQd - Yes, and not even by a small margin. Adding 1M rules 1 at a time takes about 6 hours, while iptables-restore is almost immediate. – tylerl Nov 28 '11 at 18:50
  • @tylerl - out of curiosity - can you time loading the rules and post here the information? also can you give an idea what cpu do you have there. once you implement new firewall - can you tell us what's the load avg / rate of traffic / number of new connections per sec? thx! – pQd Nov 28 '11 at 23:24
  • 3
    @pQd after previous failures with iptables et al., I implemented a solution in apache using a [RewriteMap](http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritemap) with Berkeley database. But my my fastest mechanism using iptables in a loop loaded about 100 rules per second, while iptables-restore did the whole set in about 4 seconds. This is on a top-of-the-line desktop computer. The RewriteMap mechanism has an undetectably low impact on performance. – tylerl Nov 29 '11 at 02:52
11

This is exactly what ipset is for.

From its website http://ipset.netfilter.org/:

If you want to

  • store multiple IP addresses or port numbers and match against the collection by iptables at one swoop;
  • dynamically update iptables rules against IP addresses or ports without performance penalty;
  • express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets

then ipset may be the proper tool for you.

It is written by a netfilter core team member Jozsef Kadlecsik (who also wrote the REJECT target) so this is the best choice I can think of.

It is even included in the recent kernels.

cstamas
  • 6,607
  • 24
  • 42
  • As far as I've seen, ipset tops out at 65k IP addresses. Is there any set type that can handle 1M entries? – tylerl Nov 26 '11 at 17:57
  • 7
    If you use the hash:ip set type you can have very large number of addresses. I tried 1000000 and the performance was quite good. If you setup a -m state --state ESTABLISHED rule before checking your set you can ensure you only check the set on new connections which would increase performance when dealing with a large number of packets. – Matthew Ife Nov 27 '11 at 00:44
  • Its worth mentioning though that the memory usage of an ipset with 1M addresses starts to get large. According to [this post on the netfilter mailing list](http://marc.info/?l=netfilter&m=144061333701525&w=3) the current 2015 formula for ipset hash size is `H * 40byte + (N/4 + N%4) * 4 * element size` which comes to about 64MB for 1M addresses in a 1.5m slot hash. Using the apache/berkdb solution stores the data on disk and only loads the pages of the active addresses. – M Conrad Dec 22 '15 at 01:52
5

I have not tested this myself, but when I heard your problem description I instantly thought "pf" (as from OpenBSD).

pf has the concept of address tables which may just be what you're looking for.

According to some very cursory research I did, it would seem that this has the potential to scale better than ipset. According to the PF FAQ's chapter on Runtime Options, out of the box without tuning, pf supports a total of 1,000 tables, with a total of 200,000 entries across all tables by default. (100,000 if the system has <100MB physical memory). This leads me to believe that it's at least worth considering trying to test this to see if it works on any kind of useful level.

Of course, I'm assuming you're running your servers on Linux, so you'd have to have a seperate firewall box running some OS with pf (like OpenBSD or FreeBSD). You might also be able to improve throughput by doing away with any kind of stateful packet filtering at all.

Per von Zweigbergk
  • 2,615
  • 2
  • 17
  • 27
  • Converting the client's server architecture to BSD isn't really an option. Thinking out of the box at least, though. – tylerl Nov 28 '11 at 05:35
  • 2
    You wouldn't have to convert the entire server architecture to BSD, it would be sufficient to build a firewall to put in front of the real server. (Easy enough to do in a VM.) – Per von Zweigbergk Nov 28 '11 at 05:53
2

Have you investigated using a FIB_TRIE instead of FIB_HASH.

FIB_TRIE should scale much better for your prefix counts. (/32s null routes are still prefixes, just very specific)

You might need to compile your own kernel to use it, but it help.

FIB_TRIE Notes

Joel K
  • 5,765
  • 2
  • 29
  • 34
2

For posterity: according to ipset docs the default size of a set is 65536, this can be changed by options.

maxelem This parameter is valid for the create command of all hash type sets. It does define the maximal number of elements which can be stored in the set, default 65536. Example: ipset create test hash:ip maxelem 2048.

I put this here since I can't comment yet.

plitter
  • 131
  • 2
1

Some helpful notes for anyone who stumbles across this problem in the future:

First of all, don't analyze any traffic that you don't need to. If you're blocking TCP traffic, for example, only filter the SYN packets, that way you only hit the filter once per connection. You can use -m state if you want, but connection tracking has its own overhead that you may want to avoid if performance is an issue.

Second, putting a million rules into iptables takes a long time: several minutes. If you need to track that many entities, you'd probably best keep it out of netfliter. The sheer size of the ruleset does make a difference.

Third, the goal is to avoid linear scans; but unfortunately, both iptables and iproute2 are inherently linear. You can divide your rules out binary-tree-style into a large number of chains which limits the number of rules you have to consult, but even still, iptables is not well suited for this size of problem. It will work, but it's a waste of valuable resources.

Fourth, and most importantly, pushing your workload to userspace is not a bad idea. This allows you to write your own tight code or use an off-the-shelf solution that is tuned to your problem set. My own solution to this problem, as mentioned, was to use BDB lookups triggered through apache's mod_rewrite system. This had the added benefit of triggering only one lookup per session, and only after a valid request had been submitted. In this case, performance was extremely fast and the cost of the blocklist was almost negligible.

You can do similar userspace manipulation with iptables by using the -j QUEUE target in conjunction with libnetfilter_queue. This tool is powerful, simple, and poorly documented. I would recommend reading up as much as possible from as many sources as you can find, as there's a lot of interesting material scattered across the web that is not part of any official documentation.

tylerl
  • 14,885
  • 7
  • 49
  • 71