What is a possible attack?
The way hacking works is by using an attack vector such as a PHP script letting you write a file to the server and then execute it.
Say you have a website which allows for uploading files to:
/var/www/my-app/files/here.php
And then you did not protect that folder meaning that you can execute that file:
http://www.example.com/files/here.php
At that point, the hacker has full control of your machine. This here.php
script probably lets him upload more PHP, do a git clone super-bad-stuff.git
, etc. and possibly find ways to gain root access on the machine.
Three Main Steps
What the hacker had to do is check your web app and see whether a certain attack vector exists.
1. Get Info
At first the hacker wants to learn more about your server and web app. With knowledge about your app, the hacker can decide on which script to run to gain access. So determining the names and, if possible, the version of your web server and web app. is important.
The Apache server on its own may be a vector of attack, although in most cases it's going to be an Apache module or a web app such as Wordpress. This is because Apache is much less likely to have a vector of attack (although I've seen one where you could crash the server and it had a small window of time when you could connect in such a way that you could gain a certain amount of access rights not otherwise available...)
So... the first script tries to connect to port 80 and/or 443. If that fails, maybe try a few more like 8080 or 8443...
Once connected, the script sends a first request, in most cases a GET
. Just a simple GET
from the home page is likely to indicate a lot of things. The HTTP Server: ...
header, as you mentioned, is likely going to tell us the name of the server and, if not changed, the version and OS it is running on:
Server: Apache/2.4.18 (Ubuntu)
Note: Here I show the default for Ubuntu 16.04. If you check on a different Ubuntu version, the Apache server will have a different version. So even though the Server: ...
did not directly reveal the version of Ubuntu you are running, the Hacker already knows it from this. Also if you are running Apache, you are likely to have Linux. If you run ASP, you have to have MS-Windows.
The HTML header will also often have information about the Webapp. For example, you often get a meta tag describing the service:
<meta name="generator" content="wordpress 1.2.3"/>
Now you have yet another piece of information! Again, many of the tags in a file will indicate which version of the webapp you are running. Nothing too complicated, just time consuming to get it all right.
2. Headers to Vector of Attack
With all of this data, the hacker is not unlikely to be able to pinpoint one or two possible vectors of attack out of his database of thousands of possible attacks, although if you keep your software up to date, a good firewall, etc. probably zero known possible vectors of attack at the moment.
This work is done using another script run on the hacker's backend. This script does a form of lookup in a database. How that works probably varies greatly between the various hacker scripts available. It may even be a manual task for some.
If a vector of attack exists, then the database query returns the name of the next script to run against your server. This one is the one which will penetrate your server using the known vector of attack.
Note: when the query script fails finding an attack vector then either everything stops or some of the existing scripts are still run against your machine to see whether it was patched or not. After all, you may be lying about the version you are running...
2.1. More data necessary?
At times, the database search may ask for more data. For example, the hacker now knows you are running Wordpress, but nothing enlighten him about the version. To determine the version, he needs to access another page. If that page exists, then we know it's at least version 1.2. Another request to another page/data file may tell use that it's at least version 1.2.5 (i.e. a theme added a PNG image at that version.) The possibilities are large and wide. If a certain theme does not exist, it may mean you have a newer version. If a certain script return 2 instead of 3 when accessed in this or that way, then it's version x.y.z, etc.
If you ask me, for most CMS, it's pretty much impossible to hide their version. If you do a diff between two versions, you are very likely to find a way to write a query such that you'll get the version information. It's work for the hacker, but it's not that complicated.
3. Running the Attack
With the name of the script to be run, now you are ready to penetrate the service you found. The complexity of this script may vary greatly. The example of attack on Apache I mentioned above can take a very long time. You have to purposefully crash the server and reconnect at the right time. I've seen such at attack on one of my servers, it lasted weeks (I did not see it immediately...) and the script never worked. Some other scripts will gain access in well under 1 second.
Once the penetration has occurred, the hacker installs its own software and takes over the computer. From there on, it's very much like the hacker has gained full access to your machine (as if he owned it.)
How are the best hackers organized?
The best hackers will have a server where they run a genuine spider. This spider will check millions of websites in an attempt to determine the software they are running and save that information in their database.
This is step 1. above.
They may check just the home page, all the pages they can find as Google does, it will vary greatly. They may also try to access pages that mean certain tools are installed on your system (i.e. myPhpAdmin.php
) but in most cases this first step has to look clean so they should limit themselves to a regular spidering scheme.
This is fantastic because you could have an amorphous website which has a static home page and then Wordpress for the blog, Drupal for your book, phpBB for your forum, etc. All of that gets registered in that database.
Why do that leg work to just attack one website?
Well... Actually, it's much more than that. Remember Step 2. above. This is the search for a possible attack vector against a website. In most cases, assuming people keep their website up to date, that should be zero. However, a hacker may find a new way to penetrate a certain type of website (say Wordpress 1.2.3) Now the hacker already has thousands if not millions of websites in his database, he can do a cross product search between the new vector of attack and all the websites that already exist in his database. If he finds matches, he can start Step 3. on all of those. People who already upgraded to Wordpress 1.2.4 will be safe. People who still are on Wordpress 1.2.3 will have their website hacked.
Many of the servers that get hacked do get destroyed. But some hackers will use the hacked site to add a few pages, send spam emails, etc. Do rather nondestructive albeit unwanted work with your machine. The smartest would try to use a minimal amount of your CPU and bandwidth so they don't get detected (not easily) and can keep using your computers for months...