0

I will pose an academic perspective in my question, but I welcome any and every observation. So a botnet would generate 1000s of domain/day and an actual attacker will register a few, which will be used with a certain probability. Point is, given a dataset of say a million domains and without the knowledge of DGA (Domain Generation Algorithm), is there any technique/research available that predicts the likelihood of a domain being a bot generated one? I have my own few ideas to make one:

  1. Exclude Dictionary Word Domains
  2. Include Seeming Random/Garbage/Arbitrarily Large Domain Names
  3. Cross Check DNS information for possible registration information etc.

Sadly, I couldn't find any standard research/text on this area. Any information would be helful.

Jishan
  • 193
  • 8
  • 1
    The lazy approach would be to train a classifier with lists of known legitimate and fake domains. But there are plenty of legitimate uses for seemingly random domain names and plenty of possibility for a bot to use a dictionary based approach. You will have a lot of false positives/negatives. – Hector Nov 15 '17 at 14:34

1 Answers1

3

All of your three ideas are not applicable:

  1. Exclude Dictionary Word Domains

    There are DGAs that use dictionary word domains, for example matsnu

  2. Include Seeming Random/Garbage/Arbitrarily Large Domain Names

    There are absolutely legitimate domain names that have been generated by an DGA, especially in this time of "serverless" hosting with ephemeral domain names.

  3. Cross Check DNS information for possible registration information etc.

    There is not way to tell legitimate and illegitimate customers of domain registrars that offer anonymity.

There is however a GitHub repository that collects outputs from various DGAs to try and use machine learning for recognition of plausible and probable malicious DGA generated domains.

Tobi Nary
  • 14,302
  • 8
  • 43
  • 58