623

This is a Canonical Question about Server Security - Responding to Breach Events (Hacking)
See Also:

Canonical Version
I suspect that one or more of my servers is compromised by a hacker, virus, or other mechanism:

  • What are my first steps? When I arrive on site should I disconnect the server, preserve "evidence", are there other initial considerations?
  • How do I go about getting services back online?
  • How do I prevent the same thing from happening immediately again?
  • Are there best practices or methodologies for learning from this incident?
  • If I wanted to put a Incident Response Plan together, where would I start? Should this be part of my Disaster Recovery or Business Continuity Planning?

Original Version

2011.01.02 - I'm on my way into work at 9.30 p.m. on a Sunday because our server has been compromised somehow and was resulting in a DOS attack on our provider. The servers access to the Internet has been shut down which means over 5-600 of our clients sites are now down. Now this could be an FTP hack, or some weakness in code somewhere. I'm not sure till I get there.

How can I track this down quickly? We're in for a whole lot of litigation if I don't get the server back up ASAP. Any help is appreciated. We are running Open SUSE 11.0.


2011.01.03 - Thanks to everyone for your help. Luckily I WASN'T the only person responsible for this server, just the nearest. We managed to resolve this problem, although it may not apply to many others in a different situation. I'll detail what we did.

We unplugged the server from the net. It was performing (attempting to perform) a Denial Of Service attack on another server in Indonesia, and the guilty party was also based there.

We firstly tried to identify where on the server this was coming from, considering we have over 500 sites on the server, we expected to be moonlighting for some time. However, with SSH access still, we ran a command to find all files edited or created in the time the attacks started. Luckily, the offending file was created over the winter holidays which meant that not many other files were created on the server at that time.

We were then able to identify the offending file which was inside the uploaded images folder within a ZenCart website.

After a short cigarette break we concluded that, due to the files location, it must have been uploaded via a file upload facility that was inadequetly secured. After some googling, we found that there was a security vulnerability that allowed files to be uploaded, within the ZenCart admin panel, for a picture for a record company. (The section that it never really even used), posting this form just uploaded any file, it did not check the extension of the file, and didn't even check to see if the user was logged in.

This meant that any files could be uploaded, including a PHP file for the attack. We secured the vulnerability with ZenCart on the infected site, and removed the offending files.

The job was done, and I was home for 2 a.m.


The Moral - Always apply security patches for ZenCart, or any other CMS system for that matter. As when security updates are released, the whole world is made aware of the vulnerability. - Always do backups, and backup your backups. - Employ or arrange for someone that will be there in times like these. To prevent anyone from relying on a panicy post on Server Fault.

gunwin
  • 6,330
  • 3
  • 18
  • 22
  • 7
    I know how you feel - we've been very fortunate with "helpful" hackers on this site, where they tell us what they've done! I'm looking forward to great answers to this question, just in case we get "not-so-helpful" guests in the future. – Jarrod Dixon May 08 '09 at 09:03
  • 190
    Call a professional to help out! – moinudin Jan 02 '11 at 21:32
  • 108
    I don't want to sound smart-assy or unsympathetic (I'm neither), and of course I am ignorant of the details of your situation, but if you are the only person responsible for a 500-600 site setup, there might be a fundamental flaw in how this server is run. Some companies employ a dedicated sysadmin who doesn't do anything else all day but maintain servers - a task which is *not* automatically within a programmer's scope, even though it may seem that way. Maybe that's something worth considering when the crisis is over. Anyway, right now, best of luck in getting the situation at hand solved. – Pekka Jan 02 '11 at 22:20
  • 2
    Dont necessarily assume you have a full blown kernel root kit and that your root password is compromised. Its possibly just a sneaky bash/perl script, and it is possible to clean it without formating despite what the choir harps on about here... http://serverfault.com/questions/639699/how-can-i-find-out-more-about-this-perl-process – Hayden Thring Oct 26 '14 at 12:12
  • 1
    @HaydenThring - maybe it's not "a full blown kernel roo kit" or similar. Better to *presume* it is, than *hope* it is not. – warren Apr 19 '21 at 18:45
  • I'm an Senior System Engineer with 12 years of experience and I always advice to backup the important data to a safe location and reinstall the machine or server. There might forensic researchers who can clean your machine but you never know where the threat actor has placed entry points for himself which you don't know about. And harden the new machine properly the CIS baseline are a good starting point. – Ace Jul 02 '22 at 17:03

13 Answers13

1036

It's hard to give specific advice from what you've posted here but I do have some generic advice based on a post I wrote ages ago back when I could still be bothered to blog.

Don't Panic

First things first, there are no "quick fixes" other than restoring your system from a backup taken prior to the intrusion, and this has at least two problems.

  1. It's difficult to pinpoint when the intrusion happened.
  2. It doesn't help you close the "hole" that allowed them to break in last time, nor deal with the consequences of any "data theft" that may also have taken place.

This question keeps being asked repeatedly by the victims of hackers breaking into their web server. The answers very rarely change, but people keep asking the question. I'm not sure why. Perhaps people just don't like the answers they've seen when searching for help, or they can't find someone they trust to give them advice. Or perhaps people read an answer to this question and focus too much on the 5% of why their case is special and different from the answers they can find online and miss the 95% of the question and answer where their case is near enough the same as the one they read online.

That brings me to the first important nugget of information. I really do appreciate that you are a special unique snowflake. I appreciate that your website is too, as it's a reflection of you and your business or at the very least, your hard work on behalf of an employer. But to someone on the outside looking in, whether a computer security person looking at the problem to try and help you or even the attacker himself, it is very likely that your problem will be at least 95% identical to every other case they've ever looked at.

Don't take the attack personally, and don't take the recommendations that follow here or that you get from other people personally. If you are reading this after just becoming the victim of a website hack then I really am sorry, and I really hope you can find something helpful here, but this is not the time to let your ego get in the way of what you need to do.

You have just found out that your server(s) got hacked. Now what?

Do not panic. Absolutely do not act in haste, and absolutely do not try and pretend things never happened and not act at all.

First: understand that the disaster has already happened. This is not the time for denial; it is the time to accept what has happened, to be realistic about it, and to take steps to manage the consequences of the impact.

Some of these steps are going to hurt, and (unless your website holds a copy of my details) I really don't care if you ignore all or some of these steps, that's up to you. But following them properly will make things better in the end. The medicine might taste awful but sometimes you have to overlook that if you really want the cure to work.

Stop the problem from becoming worse than it already is:

  1. The first thing you should do is disconnect the affected systems from the Internet. Whatever other problems you have, leaving the system connected to the web will only allow the attack to continue. I mean this quite literally; get someone to physically visit the server and unplug network cables if that is what it takes, but disconnect the victim from its muggers before you try to do anything else.
  2. Change all your passwords for all accounts on all computers that are on the same network as the compromised systems. No really. All accounts. All computers. Yes, you're right, this might be overkill; on the other hand, it might not. You don't know either way, do you?
  3. Check your other systems. Pay special attention to other Internet facing services, and to those that hold financial or other commercially sensitive data.
  4. If the system holds anyone's personal data, immediately inform the person responsible for data protection (if that's not you) and URGE a full disclosure. I know this one is tough. I know this one is going to hurt. I know that many businesses want to sweep this kind of problem under the carpet but the business is going to have to deal with it - and needs to do so with an eye on any and all relevant privacy laws.

However annoyed your customers might be to have you tell them about a problem, they'll be far more annoyed if you don't tell them, and they only find out for themselves after someone charges $8,000 worth of goods using the credit card details they stole from your site.

Remember what I said previously? The bad thing has already happened. The only question now is how well you deal with it.

Understand the problem fully:

  1. Do NOT put the affected systems back online until this stage is fully complete, unless you want to be the person whose post was the tipping point for me actually deciding to write this article. I'm not going to link to that post so that people can get a cheap laugh, but the real tragedy is when people fail to learn from their mistakes.
  2. Examine the 'attacked' systems to understand how the attacks succeeded in compromising your security. Make every effort to find out where the attacks "came from", so that you understand what problems you have and need to address to make your system safe in the future.
  3. Examine the 'attacked' systems again, this time to understand where the attacks went, so that you understand what systems were compromised in the attack. Ensure you follow up any pointers that suggest compromised systems could become a springboard to attack your systems further.
  4. Ensure the "gateways" used in any and all attacks are fully understood, so that you may begin to close them properly. (e.g. if your systems were compromised by a SQL injection attack, then not only do you need to close the particular flawed line of code that they broke in by, you would want to audit all of your code to see if the same type of mistake was made elsewhere).
  5. Understand that attacks might succeed because of more than one flaw. Often, attacks succeed not through finding one major bug in a system but by stringing together several issues (sometimes minor and trivial by themselves) to compromise a system. For example, using SQL injection attacks to send commands to a database server, discovering the website/application you're attacking is running in the context of an administrative user and using the rights of that account as a stepping-stone to compromise other parts of a system. Or as hackers like to call it: "another day in the office taking advantage of common mistakes people make".

Why not just "repair" the exploit or rootkit you've detected and put the system back online?

In situations like this the problem is that you don't have control of that system any more. It's not your computer any more.

The only way to be certain that you've got control of the system is to rebuild the system. While there's a lot of value in finding and fixing the exploit used to break into the system, you can't be sure about what else has been done to the system once the intruders gained control (indeed, its not unheard of for hackers that recruit systems into a botnet to patch the exploits they used themselves, to safeguard "their" new computer from other hackers, as well as installing their rootkit).

Make a plan for recovery and to bring your website back online and stick to it:

Nobody wants to be offline for longer than they have to be. That's a given. If this website is a revenue generating mechanism then the pressure to bring it back online quickly will be intense. Even if the only thing at stake is your / your company's reputation, this is still going generate a lot of pressure to put things back up quickly.

However, don't give in to the temptation to go back online too quickly. Instead move with as fast as possible to understand what caused the problem and to solve it before you go back online or else you will almost certainly fall victim to an intrusion once again, and remember, "to get hacked once can be classed as misfortune; to get hacked again straight afterward looks like carelessness" (with apologies to Oscar Wilde).

  1. I'm assuming you've understood all the issues that led to the successful intrusion in the first place before you even start this section. I don't want to overstate the case but if you haven't done that first then you really do need to. Sorry.
  2. Never pay blackmail / protection money. This is the sign of an easy mark and you don't want that phrase ever used to describe you.
  3. Don't be tempted to put the same server(s) back online without a full rebuild. It should be far quicker to build a new box or "nuke the server from orbit and do a clean install" on the old hardware than it would be to audit every single corner of the old system to make sure it is clean before putting it back online again. If you disagree with that then you probably don't know what it really means to ensure a system is fully cleaned, or your website deployment procedures are an unholy mess. You presumably have backups and test deployments of your site that you can just use to build the live site, and if you don't then being hacked is not your biggest problem.
  4. Be very careful about re-using data that was "live" on the system at the time of the hack. I won't say "never ever do it" because you'll just ignore me, but frankly I think you do need to consider the consequences of keeping data around when you know you cannot guarantee its integrity. Ideally, you should restore this from a backup made prior to the intrusion. If you cannot or will not do that, you should be very careful with that data because it's tainted. You should especially be aware of the consequences to others if this data belongs to customers or site visitors rather than directly to you.
  5. Monitor the system(s) carefully. You should resolve to do this as an ongoing process in the future (more below) but you take extra pains to be vigilant during the period immediately following your site coming back online. The intruders will almost certainly be back, and if you can spot them trying to break in again you will certainly be able to see quickly if you really have closed all the holes they used before plus any they made for themselves, and you might gather useful information you can pass on to your local law enforcement.

Reducing the risk in the future.

The first thing you need to understand is that security is a process that you have to apply throughout the entire life-cycle of designing, deploying and maintaining an Internet-facing system, not something you can slap a few layers over your code afterwards like cheap paint. To be properly secure, a service and an application need to be designed from the start with this in mind as one of the major goals of the project. I realise that's boring and you've heard it all before and that I "just don't realise the pressure man" of getting your beta web2.0 (beta) service into beta status on the web, but the fact is that this keeps getting repeated because it was true the first time it was said and it hasn't yet become a lie.

You can't eliminate risk. You shouldn't even try to do that. What you should do however is to understand which security risks are important to you, and understand how to manage and reduce both the impact of the risk and the probability that the risk will occur.

What steps can you take to reduce the probability of an attack being successful?

For example:

  1. Was the flaw that allowed people to break into your site a known bug in vendor code, for which a patch was available? If so, do you need to re-think your approach to how you patch applications on your Internet-facing servers?
  2. Was the flaw that allowed people to break into your site an unknown bug in vendor code, for which a patch was not available? I most certainly do not advocate changing suppliers whenever something like this bites you because they all have their problems and you'll run out of platforms in a year at the most if you take this approach. However, if a system constantly lets you down then you should either migrate to something more robust or at the very least, re-architect your system so that vulnerable components stay wrapped up in cotton wool and as far away as possible from hostile eyes.
  3. Was the flaw a bug in code developed by you (or a contractor working for you)? If so, do you need to re-think your approach to how you approve code for deployment to your live site? Could the bug have been caught with an improved test system, or with changes to your coding "standard" (for example, while technology is not a panacea, you can reduce the probability of a successful SQL injection attack by using well-documented coding techniques).
  4. Was the flaw due to a problem with how the server or application software was deployed? If so, are you using automated procedures to build and deploy servers where possible? These are a great help in maintaining a consistent "baseline" state on all your servers, minimising the amount of custom work that has to be done on each one and hence hopefully minimising the opportunity for a mistake to be made. Same goes with code deployment - if you require something "special" to be done to deploy the latest version of your web app then try hard to automate it and ensure it always is done in a consistent manner.
  5. Could the intrusion have been caught earlier with better monitoring of your systems? Of course, 24-hour monitoring or an "on call" system for your staff might not be cost effective, but there are companies out there who can monitor your web facing services for you and alert you in the event of a problem. You might decide you can't afford this or don't need it and that's just fine... just take it into consideration.
  6. Use tools such as tripwire and nessus where appropriate - but don't just use them blindly because I said so. Take the time to learn how to use a few good security tools that are appropriate to your environment, keep these tools updated and use them on a regular basis.
  7. Consider hiring security experts to 'audit' your website security on a regular basis. Again, you might decide you can't afford this or don't need it and that's just fine... just take it into consideration.

What steps can you take to reduce the consequences of a successful attack?

If you decide that the "risk" of the lower floor of your home flooding is high, but not high enough to warrant moving, you should at least move the irreplaceable family heirlooms upstairs. Right?

  1. Can you reduce the amount of services directly exposed to the Internet? Can you maintain some kind of gap between your internal services and your Internet-facing services? This ensures that even if your external systems are compromised the chances of using this as a springboard to attack your internal systems are limited.
  2. Are you storing information you don't need to store? Are you storing such information "online" when it could be archived somewhere else. There are two points to this part; the obvious one is that people cannot steal information from you that you don't have, and the second point is that the less you store, the less you need to maintain and code for, and so there are fewer chances for bugs to slip into your code or systems design.
  3. Are you using "least access" principles for your web app? If users only need to read from a database, then make sure the account the web app uses to service this only has read access, don't allow it write access and certainly not system-level access.
  4. If you're not very experienced at something and it is not central to your business, consider outsourcing it. In other words, if you run a small website talking about writing desktop application code and decide to start selling small desktop applications from the site then consider "outsourcing" your credit card order system to someone like Paypal.
  5. If at all possible, make practicing recovery from compromised systems part of your Disaster Recovery plan. This is arguably just another "disaster scenario" that you could encounter, simply one with its own set of problems and issues that are distinct from the usual 'server room caught fire'/'was invaded by giant server eating furbies' kind of thing.

... And finally

I've probably left out no end of stuff that others consider important, but the steps above should at least help you start sorting things out if you are unlucky enough to fall victim to hackers.

Above all: Don't panic. Think before you act. Act firmly once you've made a decision, and leave a comment below if you have something to add to my list of steps.

Rob Moir
  • 31,664
  • 6
  • 58
  • 86
  • 8
    +1 for an excellent post to have on hand to get people started in a direction. I know how common it is for amateur server admins to get into this panic mode the first time they have a 'hack' happen to them. It's a *huge* mistake to be in that spot, but it happens. The hope would be that this wouldn't happen to the same person, twice. – Andrew Barber Jan 02 '11 at 21:56
  • 3
    @Steven You wouldn't pull all of it off in an hour unless you had laid plans beforehand - and even then its likely to be a struggle. Part of laying those plans is setting realistic goals; if the business can't bear to be offline for an hour you budget and plan for that - I have a lot of sympathy for people who are dumped in the sharp end of things but for the business - well if it was *that* important to the company then they would have considered this threat and mitigated it beforehand. – Rob Moir Jan 02 '11 at 23:24
  • @Steven, sorry about that and in that case, thank you. I wrote the first draft of this reply some time ago when I was a Microsoft security MVP. – Rob Moir Jan 02 '11 at 23:34
  • 35
    +1 "...but this is not the time to let your ego get in the way of what you need to do." This is important for Sys Admins to understand sometimes. No matter how knowledgeable you are, there are always those (sometimes malicious) who are more knowledgeable or clever than you. – Grahamux Jan 03 '11 at 01:39
  • 11
    Great answer. I'm not sure why everyone is treating the "call law enforcement" step as optional though. If you're responsible for other people's data (and worried about litigation) this should be one of the first things on your list of things to do. – wds Jan 03 '11 at 09:14
  • 1
    @wds - absolutely, if you're responsible for other people's data then you need to notify both them and law enforcement imho, to keep your liability to a minimum. – Rob Moir Jan 03 '11 at 10:51
  • 8
    Very good write up, just one gotcha - "make a full and frank disclosure to anyone potentially affected at once." Honourable, but not always correct. In responding to a compromise, you may need to cut some governance corners, and your company will generally cut you some slack, however... disclosure or not, specifically when there are Data Protection implications may well be a matter above your pay grade and could have legal implications. It may be better to suggest that you immediately inform the person responsible for data protection (if that's not you) and URGE a full disclosure. – TheoJones Jan 08 '13 at 14:44
  • Brilliant. What do I do if a virtual machine has been compromised and I don't have physical access to the box? It would be quite difficult to unplug from the network. What's the least damaging way of diagnosing how the box was hacked in that instance? – Giles Roberts Jan 20 '14 at 17:42
  • 5
    @GilesRoberts virtual machine hosts typically have a control panel that lets you manipulate the settings of their guests, and even remote control them without using RDP or SSH to actually log into the guest. You should be able to isolate the guest using the host's controls for doing so then use its remote viewing tools to investigate the guest at your leisure. – Rob Moir Jan 20 '14 at 17:49
  • `"Remember what I said previously? The bad thing has already happened. The only question now is how well you deal with it."` seems to an wise advise also applicable to "a life". Like a father/mother – Deckard Aug 13 '15 at 02:42
  • Sidenote: This is heavily focused on webservers with public-facing webapps. The response steps are very thorough and applicable for other kinds of "public-ish" servers, but not all of them apply, e.g. when you have a known time of infection through social engineering exploits to shared user systems – Vogel612 Jun 24 '16 at 19:40
209

It sounds like are in slightly over your head; that's ok. Call your boss and start negotiating for an emergency security response budget. $10,000 might be a good place to start. Then you need to get somebody (a PFY, a coworker, a manager) to start calling companies that specialize in security incident response. Many can respond within 24 hours, and sometimes even faster if they have an office in your city.

You also need somebody to triage customers; Doubtless, somebody already is. Somebody needs to be on the phone with them to explain what is going on, what is being done to handle the situation, and to answer their questions.

Then, you need to...

  1. Stay calm. If you are in charge of incident response, what you do now needs to demonstrate the utmost professionalism and leadership. Document everything you do, and keep your manager and executive team apprised of major actions you take; this includes working with a response team, disabling servers, backing up data, and bringing things online again. They don't need gory details, but they should hear from you every 30 minutes or so.

  2. Be realistic. You aren't a security professional, and there are things you don't know. That's ok. When logging in to servers and looking at data, you need to understand your limits. Tread gently. In the course of your investigation, make sure you don't stomp on vital information or change something that might be needed later. If you feel uncomfortable or that you are guessing, that's a good place to stop and get an experienced professional to take over.

  3. Get a clean USB stick and spare hard drives. You will collect evidence here. Make backups of everything you feel may be relevant; communication with your ISP, network dumps, etc. Even if law enforcement doesn't get involved, in case of lawsuit you will want this evidence to prove that your company handled the security incident in a professional and appropriate manner.

  4. Most important is to stop loss. Identify and cut off access to compromised services, data, and machines. Preferably, you should pull their network cable; if you cannot, then pull the power.

  5. Next, you need to remove the attacker and close the hole(s). Presumably, the attacker no longer has interactive access because you pulled the network. You now need to identify, document (with backups, screenshots, and your own personal observational notes; or preferably even by removing the drives from the affected servers and making a full disk image copy), and then remove any code and processes he left behind. This next part will suck if you don't have backups; You can try to untangle the attacker from the system by hand, but you will never be sure that you got everything he left behind. Rootkits are vicious, and not all are detectable. The best response will be to identify the vulnerability he used to get in, make image copies of the affected disks, and then wipe the affected systems and reload from a known good backup. Don't blindly trust your backup; verify it! Repair or close the vulnerability before the new host goes on the network again, and then bring it online.

  6. Organize all of your data into a report. At this point the vulnerability is closed and you have some time to breath. Don't be tempted to skip this step; it is even more important than the rest of the process. In the report, you need to identify what went wrong, how your team responded, and the steps you are taking to prevent this incident from occurring again. Be as detailed as you can; this isn't just for you, but for your management and as a defense in a potential lawsuit.

That's a sky-high review of what to do; most of the work is simply documentation and backup handling. Don't panic, you can do that stuff. I strongly recommend you get professional security help. Even if you can handle what's going on, their help will be invaluable and they usually come with equipment to make the process easier and faster. If your boss balks at the cost, remind him that it's very small when compared to handling a lawsuit.

You have my consolations for your situation. Good luck.

blueben
  • 3,487
  • 1
  • 15
  • 15
  • 20
    +1 Great answer. It sounds like the OP doesn't have a pre-defined "emergency response" and your post, among other good things, should point them towards getting that set up. – Rob Moir Jan 02 '11 at 22:29
  • For the curious, I guess PFY here is a BOFH reference. – tripleee Jun 28 '21 at 06:26
111

CERT has a document Steps for Recovering from a UNIX or NT System Compromise that is good. The specific technical details of this document is somewhat out of date, but a lot of the general advice still directly applies.

A quick summary of the basic steps is this.

  • Consult your security policy or management.
  • Get control (take the computer offline)
  • Analyze the intrusion, get logs, figure what went wrong
  • Repair stuff
    • Install a clean version of your operating system!!! If the system has been compromised you cannot trust it, period.
  • Update systems so this can't happen again
  • Resume operations
  • Update your policy for the future and document

I would like to specifically point you to section E.1.

E.1. Keep in mind that if a machine is compromised, anything on that system could have been modified, including the kernel, binaries, datafiles, running processes, and memory. In general, the only way to trust that a machine is free from backdoors and intruder modifications is to reinstall the operating

If you don't have a system already in place like tripwire there is no possible way for you to be 100% certain that you have cleaned up the system.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
67
  1. Identify the problem. Read the logs.
  2. Contain. You've disconnected the server, so that's done.
  3. Eradicate. Reinstall the affected system, most likely. Don't erase the hard drive of the hacked one though, use a new one. It's safer, and you might need the old one to recover ugly hacks that weren't backed up, and to do forensics to find out what happened.
  4. Recover. Install whatever's needed and recover backups to get your clients online.
  5. Follow-up. Figure out what was the problem, and prevent it from happening again.
Jakob Borg
  • 1,453
  • 1
  • 10
  • 13
52

Robert's "bitter pill" answer is spot-on but completely generic (well, as was your question). It does sound like you have a management problem and in dire need of a full-time sysadmin if you have one server and 600 clients but that doesn't help you now.

I run a hosting company which provides a bit of hand-holding in this situation, so I deal with lots of compromised machines, but also deal in best practice for our own. We always tell our compromised clients to rebuild unless they're not absolutely sure of the nature of a compromise. There is no other responsible route in the long term.

However, you are almost certainly just the victim of a script kiddy who wanted a launching pad for DoS attacks, or IRC bouncers, or something completely unrelated to your customers' sites and data. Therefore as a temporary measure while you rebuild, you might consider the bringing up a heavy outbound firewall on your box. If you can block all outbound UDP and TCP connections that aren't absolutely necessary for your sites' functioning, you can easily make your compromised box useless to whoever is borrowing it from you, and mitigate the effect on your provider's network to zero.

This process might take a few hours if you've not done it before, and have never considered a firewall, but might help you restore your clients service at the risk of continuing to give the attacker access to your clients data. Since you say that you have hundreds of clients on one machine, I'm guessing that you're hosting small brochure web sites for small businesses, and not 600 ecommerce systems full of credit card numbers. If that's the case this may be an acceptable risk for you, and get your system back online faster than auditing 600 sites for security bugs before you bring anything back. But you will know what data is there, and how comfortable you would be taking that decision.

This is absolutely not best practice, but if that's not what has been happening at your employer so far, wagging your finger at them and asking for tens of thousands of pounds for a SWAT team for something they might feel is your fault (however unjustified!) doesn't sound like the practical option.

Your ISP's help here is going to be pretty crucial - some ISPs provide a console server and network boot environment (plug, but at least you know what kind of facility to look for) which will let you administer the server while disconnected from the network. If this is at all an option, ask for it and use it.

But in the long term you should plan on a system rebuild based on Robert's post and an audit of each site and its setup. If you can't get a sysadmin added to your team, look for a managed hosting deal where you pay your ISP for sysadminning help and 24-hour response for this kind of thing. Good luck :)

Matthew Bloch
  • 1,054
  • 8
  • 11
41

You need to re-install. Save what you really need. But keep in mind that all your runnable files might be infected and tampered with. I wrote the following in python: http://frw.se/monty.py which creates MD5-sumbs of all your files in a given directory and the next time you run it, it checks if anything has been changed and then output what files changed and what changed in the files.

This could be handy for you, to see if weird files are changed regularly.

But the only thing you should be doing now, is removing your computer from internet.

Filip Ekberg
  • 557
  • 7
  • 14
37

NOTE: This is not a recommendation. My specific Incident Response protocol probably would not does not apply unmodified to Grant unwin's case.

In our academic facilities we have about 300 researchers who only do computation. You have 600 clients with websites so your protocol will probably be different.

The first steps in our When a Server Gets Compromised Protocol is:

  1. Identify that the attacker was able to gain root (elevated privileges)
  2. Unplug the affected server(s). Network or power? Please see a separate discussion.
  3. Check all other systems
  4. Boot the affected server(s) from a live cd
  5. (optional) Grab the images of all system drives with dd
  6. Start doing the post-mortem forensics. Look at logs, figure out the time of the attack, find files that were modified on that time. Try to answer the How? question.

    • In parallel, plan and execute your recovery.
    • Reset all root and user passwords before resuming the service

Even if "all backdoors and rootkits are cleaned-up", don't trust that system - re-install from scratch.

Aleksandr Levchuk
  • 2,415
  • 3
  • 21
  • 41
  • 25
    -1 Unplug the server from power? You have just lost half of your forensic data! – Josh Brower Jan 03 '11 at 12:51
  • @Josh, I adjusted my answer - now it's neutral on the What to Unplug question. – Aleksandr Levchuk Jan 03 '11 at 19:48
  • 5
    RAM forensics (e.g. /dev/shm) can be helpful. I prefer unplugging the power cable (but try to log-in and `rsync` /proc right before). We may also introduce frequent VM snapshots so RAM forensics would be possible. The reasons for going for the power cable are (1) When you do forensics in a hacked system, you are "stepping all over the crime scene"; (2) The root kit keeps running - not so hard for the malicious to execute something (e.g. system wipe-out) on **Network Link Down** event. Kyle Rankin in his nice Intro to Forensics talk (http://goo.gl/g21Ok) recommended pulling the power cable. – Aleksandr Levchuk Jan 03 '11 at 20:02
  • 4
    There is no one size fits all IR protocol--Some orgs may need to keep the compromised system online for a while longer, for whatever reason. (RAM & temp log forensics, interacting with the intruders, etc) My point is that it would be better to recommend a generic IR protocol (like Jakob Borgs above) rather than one that starts with "Pull the power plug of the compromised server." – Josh Brower Jan 03 '11 at 21:15
31

I'd say @Robert Moir, @Aleksandr Levchuk, @blueben and @Matthew Bloch are all pretty much spot-on in their responses.

However, the answers of different posters differ - some are more on a high-level and talk about what procedures you should have in place (in general).

I'd prefer to separate this out into several separate parts 1) Triage, AKA How to deal with the customers and the legal implications, and identify where to go from there (Listed very well by Robert and @blueben 2) Mitigation of impact 3) Incident response 4) Post-mortem forensics 5) Remediation items and architecture changes

(Insert boilerplate SANS GSC certified response statement here) Based on past experiences, I'd say the following:

Regardless of how you are handling the customer responses, notifications, legal, and future plans, I'd prefer to focus on the main issue at hand. The original question of the OP really only pertains directly to #2 and #3, basically, how to stop the attack, get customers back online ASAP in their original state, which has been well covered in responses.

The rest of the responses are great and cover a lot of identified best-practices and ways to both prevent it from happening in the future as well as better respond to it.

It really depends on the budget of the OP and what sector of industry they are in, what their desired solution is etc.

Maybe they need to hire a dedicated onsite SA. Maybe they need a security person. Or maybe they need a fully managed solution such as Firehost or Rackspace Managed, Softlayer, ServePath etc.

It really depends on what works for their business. Maybe their core competency isn't in server management and it doesn't make sense for them to try to develop that. Or, maybe they are a pretty technical organization already and can make the right hiring decisions and bring on a dedicated team fulltime.

Zachary Hanna
  • 411
  • 3
  • 3
31

In my limited experience, system compromises on Linux tend to be more 'comprehensive' than they are on Windows. The root kits are much more likely to include replacing system binaries with customized code to hide the malware, and the barrier to hot-patching the kernel is a bit lower. Plus, it's the home OS for a lot of malware authors. The general guidance is always to rebuild the affected server from scratch, and it is the general guidance for a reason.

Format that puppy.

But, if you can't rebuild (or the-powers-that-be won't let you rebuild it against your strenuous insistence that it needs it), what do you look for?

Since it sounds like it has been a while since the intrusion was discovered, and a system restore has been done, it is very probable that the traces of how they got in have been stomped over in the stampede to restore service. Unfortunate.

Unusual network traffic is probably the easiest to find, as that doesn't involve running anything on the box and can be done while the server is up and doing whatever. Presuming, of course, your networking gear allows port-spanning. What you find may or may not be diagnostic, but at least it is information. Getting unusual traffic will be strong evidence that the system is still compromised and needs flattening. It might be good enough to convince TPTB that a reformat really, truly, is worth the downtime.

Failing that, take a dd-copy of your system partitions and mount 'em on another box. Start comparing contents with a server at the same patch-level as the compromised one. It should help you identify what looks different (those md5sums again) and may point to overlooked areas on the compromised server. This is a LOT of sifting through directories and binaries, and will be rather labor intensive. It may even be more labor intensive than a reformat/rebuild would be, and may be another thing to hit up TPTB about actually doing the reformat it really needs.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
27

After getting to work and taking a look at the server we managed to figure out the problem. Luckily, the offending files were uploaded to the system on a Sunday, when the office is closed and no files should be created, apart from logs and cache files. With a simple shell command to find out which files have been created on that day we found them.

All the offending files seem to have been within the /images/ folder on some of our older zencart sites. It seems there was a security vulnerability that allowed (using curl) any idiot to upload non-images into the image upload section in the admin section. We deleted the offending .php files, and fixed the upload scripts to dissallow any file uploads that aren't images.

In retrospect, it was quite simple and I raised this question on my iPhone on the way into work. Thank you for all your help guys.

For the reference of anyone that visits this post in the future. I would not recommend pulling the power plug.

gunwin
  • 6,330
  • 3
  • 18
  • 22
  • Grant, I'm glad it worked out very smoothly for you. It was something minor - much less serious than many of us assumed. This discussion taught me a lesson about communicating, gave many good tips and food for thought on indecent responses. – Aleksandr Levchuk Jan 04 '11 at 07:50
  • There are cases where you would be better of unplugging the power cable, I know that from experience. – Aleksandr Levchuk Jan 04 '11 at 07:53
  • 3
    Thanks for coming back and letting us know how you got on - as you can see, your question generated quite a lot of discussion. I'm glad you don't seem to be too badly hit by this and that your solution was quite simple in the end. – Rob Moir Jan 04 '11 at 09:26
  • 5
    This should be a comment (or included as text in your question), not an answer to your question. – Techboy Jan 04 '11 at 12:12
  • 5
    @Techboy: It seems he's not yet associated his SO and SF accounts, so he cannot edit his question. @Grant: You can associate your accounts through the "Accounts" panel on your user page. – Hippo Jan 05 '11 at 17:54
  • yet again what starts as "***PANIC***" turns into "oh, was that all" :) – gbjbaanb Jan 09 '11 at 13:26
  • 1
    without a baseline configuration how do you know not running a rootkit? – The Unix Janitor Jan 10 '11 at 00:51
18

I have little to contribute to the extensive technical answers but please also take note some of these:

Non-technical actions:

  • Report the incident internally.
    If you don't have a incident response plan already that may appear a CYA technique, but the IT department is not the only and often not even the best place to determine the business impact of a compromised server.
    Business requirements may trump your technical concerns. Don't say "I told you so" and that the priority of business concerns is the reason you're having this compromised server in the first place. ("Leave that for the after-action report.")

  • Covering up a security incident up is not an option.

  • Reporting to local authorities.
    ServerFault is NOT the place for legal advise, but this is something that should be included in an incident response plan.
    In some localities and/or regulated industries it is mandatory to report (certain) security incidents to either local law enforcement, regulating bodies or to inform effected customers/users.
    Regardless, neither the decision to report nor the actual report are made solely in the IT department. Expect involvement from management and the legal and corporate communications (marketing) departments.
    You should probably not expect too much, the internet is a big place where borders have little meaning, but the cyber crime departments that exist in many police departments do solve digital crimes and may bring the guilty to justice.

HBruijn
  • 72,524
  • 21
  • 127
  • 192
16

I think it all boils down to this:

If you value your job, you had better have a plan, and revise it regularly.

Failing to plan is planning to fail, and it's no truer anywhere else than in systems security. When <redacted> hits the fan, you'd better be ready to deal with it.

There's another (somewhat cliched) saying that applies here: Prevention is better than cure.

There have been a number of recommendations here to get experts in to audit your existing systems. I think this is asking the question at the wrong time. This question should have been asked when the system was put in place, and the answers documented. Also, the question shouldn't be "How can we stop people from breaking in?" It should be "Why should people be able to break in at all?" Auditing for a bunch of holes in your network will only work until new holes are found and exploited. On the other hand, networks that are designed from the ground up to only respond in certain ways to certain systems in a carefully choreographed dance will not benefit from an audit at all and the funds will be a waste.

Before putting a system on the internet, ask yourself - does this need to be 100% internet facing? If not, don't. Consider putting it behind a firewall where you can decide what the internet sees. Even better, if said firewall allows you to intercept the transmissions (via a reverse proxy or a pass-through filter of some kind) look at using it to allow only legitimate actions to occur.

This has been done - there is (or was) an internet banking setup somewhere which has a load-balancing proxy facing the internet that they were going to use to vector attacks away from their pool of servers. Security expert Marcus Ranum convinced them to take the opposite approach, by using the reverse proxy to allow only known valid URLs through and send everything else to a 404 server. It stood the test of time surprisingly well.

A system or network based around default permit is doomed to fail once an attack you didn't foresee happens. Default deny gives you far greater control over what gets in and what doesn't, because you're not going to let anything on the inside be seen from the outside unless it damn well needs to be.

That said, all of this is no reason to get complacent. You should still have a plan in place for the first few hours after a breach. No system is perfect and humans make mistakes.

Aaron Mason
  • 703
  • 6
  • 19
16

A nice onliner helped me recently to find out how an attacker could compromise a system. Some crackers try to hide their traces by forging the modification time on files. By changing the modification time the change time is updated (ctime). you can see the ctime with stat.

This one liner lists all files sorted by ctime:

find / -type f -print0 | xargs -0 stat --format '%Z :%z %n' | sort -nr > /root/all_files.txt

So if you roughly know the the time of the compromise, you can see which files are changed or created.


ah83
  • 1,062
  • 9
  • 8