Why would a phishing blacklist be hashed?

Question

I was planning to comment on the question at Finding phishing sites to certain domain and suggest searching Google's blacklist for keywords or substrings of a domain name in question.

But then I found https://stackoverflow.com/questions/44663025/how-to-test-if-a-url-is-a-phishing-in-command-line-using-google-safe-browsing which says the blacklist is hashed, which makes it useless for the first question above. There's a link to https://developers.google.com/safe-browsing/v4/urls-hashing which explains the hashing, but doesn't explain why the list is hashed.

Why hashed? What security problem does that solve? It seems all hashing does is cripple the list's potential usefulness.

according to https://developers.google.com/safe-browsing/v4/ , talking about non-hashed: Drawbacks: Privacy: URLs are not hashed, so the server knows which URLs you look up. — dandavis, Jul 24 '18 at 17:10
The options I'm contrasting are downloading a hashed vs. non-hashed list. Not sending a hashed vs. non-hashed URL to Google for online lookup. — Sad IT admin, Jul 24 '18 at 20:12
It's possible that domains that were not supposed to be public (or no longer be public, having pulled the indexed url down via request to google) could be included on the list, which is a potential legal liability and privacy invasion for those who expose it. It also prevents potentially malicious code in subdomains from gumming up the gears of the machine crunching it, no matter the OS/lang/context. — dandavis, Jul 24 '18 at 20:49

Steffen Ullrich · Accepted Answer · 2018-07-24T16:38:38.723

Why hashed? What security problem does that solve?

I could imagine several reasons for this decision:

Using the shorted hashes saves a lots of space.
While it is possible to easily lookup specific URL's it is impossible to find out the whole list and then maybe reverse engineer parts on how Google creates this list in the first place. This makes it harder to work around the blacklist in a generic way, i.e. create URL's which have the highest chance to not get accidentally detected.
It takes efforts to create this list and Google might just want to protect their intellectual property.

It seems all hashing does is cripple the list's potential usefulness.

Usefulness for whom? Note that this blacklist is intellectual property by Google and is obviously not intended to be open source. If this would be Google's intention then they could simply publish the whole list and provide the tools needed to build the hashed version from it. The same way you could ask why Cisco does let you query OpenDNS for free but does not offer the list of all domains considered malicious by OpenDNS for download.

Usefulness for the asker of the first question, for example. Does nobody bother maintaining a worthwhile open blacklist, because Google's closed blacklist is the de facto standard? — Sad IT admin, Jul 24 '18 at 17:03
@SadITadmin: There are several closed blacklists and for most you need to somehow pay. Creating a high quality blacklist and keeping it current is not simple and there are lots of money and knowledge involved in doing this. The issue is more that nobody gets paid to make such a list freely available to all, because who would do this payment? — Steffen Ullrich, Jul 24 '18 at 17:14

score 1 · Answer 2 · answered Jul 25 '18 at 03:08

There's a number of reasons for this. Most of them have already been mentioned, but here's one more:

Phishing sites are often hosted on compromised web sites or servers. Making their domains or URLs publicly visible (by storing them in a blacklist in cleartext) would put the publishers of the blacklist in the position of calling attention to these compromised sites, inviting other attackers to target those sites.

Why would a phishing blacklist be hashed?

2 Answers2