6

For anyone who wants to study malware, there are a few websites that publish malware samples and/or URL feeds for anyone to retrieve and study.

As opposed to performing such file-based analysis, I want to perform analyses which take features of the URL and the referring page into account. This requires me to crawl individual URLs till something interesting is found, such a document or an executable file.

However, I, as an individual who does not have any access which may be helpful to this purpose (such as access to email inboxes of an organization, or links to security researchers and/or companies), how can I crawl the internet to find malware?

  • Does it have to be crawling or are you okay with the malware being delivered to you (by compromised machines trying to get more computers into their botnet)? Also what kind of malware are you looking for? Windows? Web (PHP)? Linux rootkits? – André Borie Mar 31 '17 at 11:41
  • @AndréBorie I'm actually interested in working with the features of the URL, so any method that gives me those URLs would work. (Once I get the URL, I can crawl it for analysis.) As for the types of malware, I'm interested in all forms of it. –  Mar 31 '17 at 11:45
  • 1
    My suggestion was more about setting up a honeypot like something that mimicks a Wordpress vulnerability (and wait for malicious PHP files to be submitted to it) or an SMTP server that accepts all mail without rejecting even the obvious spam (bad reverse DNS, no DKIM, etc). But I'm afraid I can't help you much if you're actually looking for URLs pointing to malware. – André Borie Mar 31 '17 at 11:47
  • @AndréBorie you can still post an answer involving those things; for example, many professional researchers have had success with email setups just like the one you describe. –  Mar 31 '17 at 14:47
  • I think you need to take the honeypot route that @AndréBorie mentioned. Crawling the web isn't difficult, but how do you plan to know that the files or links you're looking at are infected? Either register with sites that already provide access to archives of malware, or set up a honeypot to potentially catch new ones. – Stephan Apr 03 '17 at 18:39

3 Answers3

4

Try starting with public spam email messages.

For example, some accounts on mailinator.com receive a lot of spam. There is also a lot of public spam available on untroubled.org.

Sjoerd
  • 28,707
  • 12
  • 74
  • 102
2

There are a number of lists available that contain links to URLs that hosted malware in the recent past. Some of these lists are public while other data sources are only available to selected researchers or partners.

For a start you can have a look at the public data that can be found e.g. at https://isc.sans.edu/suspicious_domains.html. To quote from the website:

There are many suspicious domains on the internet. In an effort to identify them, as well as false positives, we have assembled weighted lists based on tracking and malware lists from different sources. ISC is collecting and categorizing various lists associated with a certain level of sensitivity.

This is followed by a list of links that point to different websites which I will include for reference:
Malware Domain List.com
Domain Blocklist From Malwaredomains
Abuse.ch Ransomware Domain Blocklist
Threatexpert.com Malicious URLs
Zeus Command And Control Server from Abuse.ch

With the domains you get from these lists you can then start your analysis for malicious files.

If you are looking for URLs from where the malicious files are served have a look at the first linked website https://www.malwaredomainlist.com/mdl.php

Denis
  • 3,653
  • 2
  • 17
  • 16
  • Domains don't directly serve malware on visiting their homepages, so this isn't too useful. However what I do want to ask: how do these sources get lists of domains? If I can replicate their setups, it could be helpful for my project? –  Apr 04 '17 at 10:27
  • The first link to MalwareDomainList.com contains specific links. From what you said, which was "I want to perform analyses which take features of the URL and the referring page into account. This requires me to crawl individual URLs till something interesting is found, such a document or an executable file.", I understood that you don't want a direct link to the served file but a list of domains that somewhere/somehow host malware. So if you don't want the domain and neither the link to the files. Could you clarify what exactly you are looking for so I can modify my answer? – Denis Apr 04 '17 at 10:34
  • I'm looking for full URLs as I need _both_ the sample and its URL. The lists of domains are useful for other things, but unfortunately they're not a great help for what I'm doing. –  Apr 04 '17 at 10:39
  • Please have a look at my edit at the end of the answer and especially the first linked domain (www.malwaredomainlist.com). Is that the kind of list you are looking for? – Denis Apr 04 '17 at 10:41
  • yes, that's exactly what I need! –  Apr 04 '17 at 10:49
  • @user2064000 "I want to perform analyses which take features of the URL and the referring page into account." sounds interesting, could you provide some details how do you want to achieve this ? – cyzczy Apr 04 '17 at 11:28
0

Having a list of malicious domains (e.g. malwaredomainlist.com) is not enough to ensure that you will get the full payload from each site.

Many sites, including malicious sites, will send different responses depending on your user agent. In some cases, the malicious site will look for a DirectX runtime to exploit. A quick and safe solution might be to create VMs for Windows and Android with Virtualbox. To save time, be sure to save a copy of each VM after a clean install.

In the Windows VM, autohotkey is a great solution for analyzing window titles / contents and simulating human responses. You could write an IE/Edge scraper with just Autohotkey (be sure to enable DirectX). In the Android VM, any scripting language (kivy/python-for-android with Twisted library comes to mind) would enable you to hunt for redirects to APK files and download them without installing (which may be beneficial).

brirus
  • 176
  • 2