I will pose an academic perspective in my question, but I welcome any and every observation. So a botnet would generate 1000s of domain/day and an actual attacker will register a few, which will be used with a certain probability. Point is, given a dataset of say a million domains and without the knowledge of DGA (Domain Generation Algorithm), is there any technique/research available that predicts the likelihood of a domain being a bot generated one? I have my own few ideas to make one:
- Exclude Dictionary Word Domains
- Include Seeming Random/Garbage/Arbitrarily Large Domain Names
- Cross Check DNS information for possible registration information etc.
Sadly, I couldn't find any standard research/text on this area. Any information would be helful.