0

I reroute certain websites through a proxy with a proxy.pac file.

It basically looks like this:

if (shExpMatch(host, "www.youtube.com"))
    { return "PROXY proxy.domain.tld:8080; DIRECT" }
if (shExpMatch(host, "youtube.com"))
    { return "PROXY proxy.domain.tld:8080; DIRECT" }

At the moment about 125 sites are rerouted using this method. However, I plan on adding quite a few more domains to it, and I'm guessing it will eventually be a list of 500-1000 domains.

It's important to not reroute all traffic through the proxy.

What's the best way to keep this file optimized, performance-wise ?

Thanks

Tuinslak
  • 1,435
  • 7
  • 30
  • 54

3 Answers3

1

If you are just checking for equality use '==' for comparison. The shExpMatch function allows for shell expressions (* and ? in their DOS shell meanings) so the second argument has to be parsed. The script runs in the browser, once (or less) per request so performance is not an issue but it makes the code clearer if you write what yuo mean.

I would also use a variable to hold the proxy expression. It probably won't save run-time storage as the repeated literal is probably re-used, but it will make the code easier to read.

J Austin
  • 11
  • 1
  • I realize your answer is a few years old, but "runs in the browser" is ambiguous for Android. The .pac can be configured for a wifi, and theoretically all mobile apps are affected by it. – Fuhrmanator Sep 25 '18 at 19:34
0

As usual: hashes or trees.

I'd use hashing: extract the first one (or more chars) of the requested domain name (stripping "www." as well) to select corresponding pattern list.

poige
  • 9,171
  • 2
  • 24
  • 50
  • Could you give me an example? – Tuinslak Mar 10 '11 at 11:32
  • Most straight one: host[0] would give you the very first char of it. Now you can use it as `switch` key: switch (host[ 0 ]) { case 'g': if ... else if () and so on (if's are from you example). – poige Mar 10 '11 at 13:05
0

I think performance depends on what browser/program is using the .pac file, but you can find some best practices here that include:

  • The speed of file execution depends on the way arguments are constructed in the PAC file, not on the total length of the file. PAC files execute commands serially. Therefore:
    • Do not use excessive exclusion functions, as this may cause slowness.
    • Place arguments or exceptions that have a high probability of being executed at the beginning of the file. For example, private IP address lookups should be at the beginning.
  • Avoid using complex regular expressions to make a PAC file smaller. This can make it less efficient.
  • Check simple rule exceptions first.
  • Place high-probability checks near the top.
  • Minimize the use of regular expressions.
  • Use efficient regular expressions, and avoid capturing matches that will not be used.
  • Because return is immediate, avoid using else with if statements.
  • Single-line if() statements do not require begin { and end } brackets.
  • Carefully consider the use (overuse) of isResolvable(), dnsResolve(), and isInNet() due to potential DNS performance issues.
  • Try and group similar exceptions into a bigger if loop. For example, instead of checking against 10 hosts with xyz.google.com in a big OR statement, convert it into an outer if statement that is applied in if host has *.google.com and then test against the 10 hosts.
  • Check for IP addresses in a separate if loop.
  • Every open curly bracket must have a corresponding close curly bracket, and every open parenthesis must have a corresponding close parenthesis. One of the most common mistakes in building PAC files is losing count of opening and closing parentheses and brackets.
  • Avoid using external or global variables and functions.

Also, there's a http://pacparser.manugarg.com/ tool that can be used to validate .pac files.

Fuhrmanator
  • 101
  • 2