Finding/Predicting BotNet Generated Domains Without DGA Knowledge

Question

I will pose an academic perspective in my question, but I welcome any and every observation. So a botnet would generate 1000s of domain/day and an actual attacker will register a few, which will be used with a certain probability. Point is, given a dataset of say a million domains and without the knowledge of DGA (Domain Generation Algorithm), is there any technique/research available that predicts the likelihood of a domain being a bot generated one? I have my own few ideas to make one:

Exclude Dictionary Word Domains
Include Seeming Random/Garbage/Arbitrarily Large Domain Names
Cross Check DNS information for possible registration information etc.

Sadly, I couldn't find any standard research/text on this area. Any information would be helful.

The lazy approach would be to train a classifier with lists of known legitimate and fake domains. But there are plenty of legitimate uses for seemingly random domain names and plenty of possibility for a bot to use a dictionary based approach. You will have a lot of false positives/negatives. — Hector, Nov 15 '17 at 14:34

score 3 · Accepted Answer · edited Jun 16 '20 at 09:49

3

All of your three ideas are not applicable:

Exclude Dictionary Word Domains

There are DGAs that use dictionary word domains, for example matsnu
Include Seeming Random/Garbage/Arbitrarily Large Domain Names

There are absolutely legitimate domain names that have been generated by an DGA, especially in this time of "serverless" hosting with ephemeral domain names.
Cross Check DNS information for possible registration information etc.

There is not way to tell legitimate and illegitimate customers of domain registrars that offer anonymity.

There is however a GitHub repository that collects outputs from various DGAs to try and use machine learning for recognition of plausible and probable malicious DGA generated domains.

edited Jun 16 '20 at 09:49

Community

1

answered Nov 15 '17 at 14:33

Tobi Nary

14,302
8
43
58

So apart from that repository, technically there is no research in this area? – Jishan Nov 15 '17 at 14:41
There is probably a lot of research in the area. – Tobi Nary Nov 15 '17 at 14:49

Finding/Predicting BotNet Generated Domains Without DGA Knowledge

1 Answers1