What are the heuristics for a malicious url?

Question

I know that there is nothing like a safe website ,any website which is safe today could be hacked and its visitors could be served with latest malware.

What I'm curious about is what checks can we do on a url to judge/guess it as malicious using a pencil and paper, no third party help (like siteadvisor,virustotal,Domaintools), no peeking into website content and no using browsers.

Just looking at url and guessing.(Short URL could be expanded and then looked upon).

So, we have to guess at whether a URL is evil without connecting to it OR using any third party tools? Can we even refer to blacklists? You can't do any sort of analysis (that I'm aware of, at least) without information. — KnightOfNi, Apr 07 '15 at 11:11
So you are asking how can we use our opinion to decide whether or not a URL is safe based on existing knowledge and experience? Questions seems fairly irrelevant and unhelpful. — Aaron Dobbing, Apr 07 '15 at 11:20
@KnightOfNi Blacklists are nothing but urls one has analyzed beforehand so doesn't fit my criteria.These heuristic checks cannot be used alone to determine maliciousness but could be used along with other techniques mentioned above ... — rebel87, Apr 07 '15 at 11:21
Again, this will still be down to personal opinion? E.G - This doesnt fit in with the style of URL segments i am used to from this domain, this subdomain "asdliuoyasd.example.com" looks odd etc. — Aaron Dobbing, Apr 07 '15 at 11:25
@AaronDobbing thanks for clearing my doubts.So this means that there can't be any heuristics ? — rebel87, Apr 07 '15 at 11:32
I don't see how there can be a valid usable answer to this question other than common sense. That is all. — Aaron Dobbing, Apr 07 '15 at 11:37

score 4 · Accepted Answer · answered Apr 07 '15 at 13:26

I worked on this problem for an email scanning system, and can say that the lexical properties of URLs for maliciousness are minimal, especially with the constraints you are imposing.

It's true that malicious URLs often "Look random", but that's because your experience has transformed "imgur.com/gallery/lBKRZ" into "harmless image server gallery", but "is1.ecds.girfc.com/ljbm17vkel" is scarily nonsensical... until you learn that it is Image Server 1 on the East Coast Data Store for Getty Images Royalty Free Collection.

It is possible to assign heuristic responses based solely on the value of the URL, but in practice the weighting of the URL value tends to be so small that it fades into inconsequentiality when compared to content heuristics. For instance, take this URL:
super-zakonym.ru

What's the alarming part of this URL? The mix of English and Russian? The fact that it translates to "Super Legit"? The fact that the Russian is misspelled?

Or is it simply that it is a RU TLD?

Thanks @JKimball my doubts are clear now.Url heuristics alone can't work , content scanning and other techniques are required... — rebel87, Apr 08 '15 at 06:09

What are the heuristics for a malicious url?

1 Answers1