11

Recently (but it is also a recurrent question) we saw 3 interesting threads about hacking and security:

How do I deal with a compromised server?.
Finding how a hacked server was hacked
File permissions question

The last one isn't directly related, but it highlights how easy it is to mess up with a web server administration.

As there are several things, that can be done, before something bad happens, I'd like to have your suggestions in terms of good practices to limit backside effects of an attack and how to react in the sad case will happen.

It's not just a matter of securing the server and the code but also of auditing, logging and counter measures.

Do you have any good practices list or do you prefer to rely on software or on experts that continuously analyze your web server(s) (or nothing at all)?

If yes, can you share your list and your ideas/opinions?

UPDATE

I received several good and interesting feedback.

I'd like to have a simple list, so that can be handy for the IT Security administrators but also for the web factotum masters.

Even if everybody gave good and correct answers, at the moment I prefer the one of Robert as it's the most simple, clear and concise and the one of sysadmin1138 as it's the most complete and precise.

But nobody consider the user perspective and perception, I think it's the first that have to be considered.

What the user will think when will visit my hacked site, much more if you own sensible data about them. It's not just a matter of where to stock data, but how to calm angry users.

What about data, medias, authorities and competitors?

tmow
  • 1,187
  • 8
  • 20
  • 3
    Start with http://security.stackexchange.com/ . Though there are some good answers here already.... – AviD Jan 03 '11 at 21:58
  • Good point! I didn't know there is one, I thought the full list is in the footer of each stack websites. – tmow Jan 04 '11 at 08:24
  • I think beta sites dont appear on fullfledged sites. And, fullfledged sites arent on beta footers either :) – AviD Jan 06 '11 at 21:45

7 Answers7

11

There are two big areas to focus on:

  1. Making it hard to get in.
  2. Creating policies and procedures to calmly and efficiently handle the event of someone getting in past point 1.

Making it hard to get in

This is a very complex topic, and a lot of it focuses around making sure you have enough information to figure out WTF happened after the fact. The abstract bullet points for simplicity:

  • Keep logs (see also, Security Information Event Management)
    • Any authorization attempts, both successful and failing, preferably with source information intact.
    • Firewall access logs (this may have to include per-server firewalls, if in use).
    • Webserver access logs
    • Database server authentication logs
    • Application-specific usage logs
    • If possible, the SIEM can throw alerts on suspicious patterns.
  • Enforce proper access controls
    • Ensure rights are set correctly everywhere, and avoid 'lazy-rights' ("oh just give everyone read") where possible.
    • Periodic audits of ACLs to ensure that procedures are actually being followed, and temporary troubleshooting steps ("give everyone read, see if it works then") have been correctly removed after troubleshooting has finished.
    • All firewall pass-through rules need to be justified, and audited periodically.
    • Webserver access controls need to be audited as well, both webserver and filesystem ACLs.
  • Enforce change-management
    • Any changes to the security environment need to be centrally tracked and reviewed by more than one person.
    • Patches should be included in this process.
    • Having a common OS build (template) will simplify the environment and make changes easier to track and apply.
  • Disable guest accounts.
  • Ensure all passwords are not set to defaults.
    • Off-the-shelf applications may setup users with predefined passwords. Change them.
    • A lot of IT appliances ship with user/password pairs that are very well known. Change those, even if you log into that thingy only once a year.
  • Practice least-privilege. Give users the access they actually need.
    • For Admin users, a two-account setup is wise. One regular account used for email and other office tasks, and a second for elevated-priv work. VMs make this easier to live with.
    • Do NOT encourage regular use of generic administrator/root accounts, it's hard to track who was doing what when.

Creating policies and procedures to calmly and efficiently handle the event of someone getting in

A security-event policy is a must have for all organizations. It greatly reduces the "running around with our heads cut off" phase of response, as people tend to get irrational when faced with events such as these. Intrusions are big, scary affairs. Shame at suffering an intrusion can cause otherwise level-headed sysadmins to start reacting incorrectly.

All levels of the organization need to be aware of the policies. The larger the incident, the more likely upper management will get involved in some way, and having set procedures for handling things will greatly assist in fending off "help" from on high. It also gives a level of cover for the technicians directly involved in the incident response, in the form of procedures for middle-management to interface with the rest of the organization.

Ideally, your Disaster Recovery policy has already defined how long certain services may be unavailable before the DR policy kicks in. This will help incident response, as these kinds of events are disasters. If the event is of a type where the recovery window will NOT be met (example: a hot-backup DR site gets a realtime feed of changed data, and the intruders deleted a bunch of data that got replicated to the DR site before they were noticed. Therefore, cold recovery procedures will need to be used) then upper management will need to get involved for the risk-assessment talks.

Some components of any incident response plan:

  • Identify the compromised systems and exposed data.
  • Determine early on whether or not legal evidence will need to be retained for eventual prosecution.
    • If evidence is to be retained do not touch anything about that system unless absolutely required to. Do not log in to it. Do not sift through log-files. Do. Not. Touch.
    • If evidence is to be retained, the compromised systems need to be left online but disconnected until such time as a certified computer forensics expert can dissect the system in a way compatible with evidence handling rules.
      • Powering off a compromised system can taint the data.
      • If your storage system permits this (discrete SAN device) snapshot the affected LUNs before disconnection and flag them read-only.
    • Evidence handling rules are complex and oh so easy to screw up. Don't do it unless you've received training on them. Most general SysAdmins do NOT have this kind of training.
    • If evidence is being retained, treat the loss of service as a hardware-loss disaster and start recovery procedures with new hardware.
  • Pre-set rules for what kinds of disasters requires what kinds of notice. Laws and regulation vary by locality.
    • Rules pertaining to 'exposure' and 'proven compromise' do vary.
    • Notification rules will require the Communications department to get involved.
    • If the required notice is big enough, top-level management will have to be involved.
  • Using DR data, determine how much "WTF just happened" time can be spent before getting the service back on line becomes a higher priority.
    • Service-recovery times may require the work of figuring out what happened to be subordinated. If so, then take a drive image of the affected device for dissection after services are restored (this is not an evidentiary copy, it's for the techs to reverse engineer).
    • Plan your service-recovery tasks to include a complete rebuild of the affected system, not just cleaning up the mess.
    • In some cases service-recovery times are tight enough that disk images need to be taken immediately after identifying a compromise has occurred and legal evidence is not to be retained. Once the service is rebuilt, the work of figuring out what happened can start.
  • Sift through logfiles for information relating to how the attacker got in and what they may have done once in.
  • Sift through changed files for information relating to how they got in, and what they did once they got in.
  • Sift through firewall logs for information about where they came from, where they might have sent data to, and how much of it may have been sent.

Having policies and procedures in place before a compromise, and well known by the people who will be implementing them in the event of a compromise, is something that just needs doing. It provides everyone with a response framework at a time when people won't be thinking straight. Upper management can thunder and boom about lawsuits and criminal charges, but actually bringing a case together is an expensive process and knowing that beforehand can help damp the fury.

I also note that these sorts of events do need to be factored into the overall Disaster Response plan. A compromise will be very likely to trigger the 'lost hardware' response policy and also likely to trigger the 'data loss' response. Knowing your service recovery times helps set expectation for how long the security response team can have for pouring over the actual compromised system (if not keeping legal evidence) before it's needed in the service-recovery.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • I've chosen your answer because it's the most complete, and it's what companies, like the one we work for, they use and continuously improve, but I wonder how can be simplified also for normal webmasters, that have to find a solution asap, much more without huge amount of money. – tmow Jan 04 '11 at 08:50
  • Still not sure between your and Robert answer. – tmow Jan 04 '11 at 09:39
  • This is a great answer, wish I could +2 instead of just +1 – Rob Moir Jan 04 '11 at 10:07
7

How proper helpdesk procedures can help

We need to consider how customers are dealt with here (this applies to both internal and external customers contacting a helpdesk).

First of all, communication is important; users will be angry about the disruption to business, and may also be concerned about the extent/consequences of any information breaches that may have taken place as part of an intrusion. Keeping these people informed will help manage their anger and concern, both from the point of view that sharing knowledge is good, and from the perhaps slightly less obvious point of view that one thing they will need to hear is that you are in control of the situation.

The helpdesk and IT management need to act as an "umbrella" at this point, sheltering the people doing the work to determine the extent of the intrusion and restore services from countless enquires that disrupt that work.

  1. Try and post realistic updates to customers, and work with them to determine the urgency for bring a service back online. Being aware of customer needs is important, but at the same time don't allow them to dictate an unworkable schedule to you.
  2. Make sure your helpdesk team know what information can and can not be released, and that they should not encourage rumours and speculation (and absolutely should not discuss anything that may prejudice any legal action your organisation might take or face).
  3. One positive thing the helpdesk should do is record all calls pertaining to the intrusion - this can help measure the disruption caused by both the intrusion itself and the processes that followed to deal with it. Putting both a time and an financial cost on the intrusion and the mitigation can be very helpful both with refining future strategies, and obviously might prove useful with any legal actions. ITIL incident vs. problem recording can help here - both the intrusion itself and the mitigation can be recorded as separate problems, and each caller tracked as an incident of one or both problems.

How deployment standards can help

Deploying to a set template (or at least a checklist) helps too, along with practising change control/management over any customisations/upgrades to your deployment template. You can have several templates to account for servers doing different jobs (e.g. a mail server template, a web server template, etc).

A template should work for both OS and apps, and include not just security but all settings you use, and should ideally be scripted (e.g. a template) rather than applied manually (e.g. a checklist) to eliminate human error as much as possible.

This helps in a number of ways:

  • Enables you to restore / rebuild quicker in the event that an intrusion does take place (note that you should not deploy from this template 'as is' because you know it is vulnerable, but it lets you get back to your "last known good configuration" which needs to undergo further hardening before live deployment... and don't forget to update your deployment template once you're sure its properly locked down, either)
  • Gives you a "baseline" to compare a hacked server to
  • Reduces un-necessary errors that might lead to an intrusion in the first place
  • Helps with change and patch management because when it becomes apparent that you need a patch/upgrade or procedural change for security (or any other reasons for that matter) it makes it easier to see what systems need the change, makes it easier to write tests to see if the change is applied correctly, etc).
  • If everything is as consistent as is possible and sensible it helps make unusual and suspicious events stick out that bit further.
Rob Moir
  • 31,664
  • 6
  • 58
  • 86
  • 1
    +1. Yes, it's correct, but then, if everything happen it means that your template it's not as safe as you thought, so you cannot use it to deploy a new website. You need at least a maintenance page notifying the customers about a temporary issue and better to host it somewhere else (another server, another IP, and redirection from the old). I think we should always take in consideration the worst case. – tmow Jan 04 '11 at 08:39
  • 2
    @tmow - you're right but the template allows you to restore a system to your "known" configuration quickly, which you then need to modify before you deploy the server again. I will amend the answer to reflect that because it should have mentioned it, you're absolutely right there. – Rob Moir Jan 04 '11 at 09:19
  • 1
    thanks. Don't forget the user perspective and perception. – tmow Jan 04 '11 at 09:41
  • @tmow added a bit about users and putting the support desk to work helping with that end of things. – Rob Moir Jan 09 '11 at 11:24
4

For most of our servers we rely on host and network firewalls, anti virus/spyware software, network IDS, and host IDS for the majority of our prevention. This along with all of the general guidelines such as minimum privs, uninstalled non essential programs, updates, etc. From there we use products such as Nagios, Cacti, and a SIEM solution for various base lining and notifications of when events occur. Our HIDS (OSSEC) does a lot of SIEM type logging as well which is nice. We basically try to do block stuff as much as possible, but then log centrally so if something does happen we can analyze and correlate it.

  • All correct, I think nothing more is needed, but again, when it happens, because it happens, what exactly you do, what do you need to react fast? Analyzing thousands of lines of logs, much more in a stressing situation, will not provide a quick workaround or temporary solution to at least inform the users. – tmow Jan 04 '11 at 08:44
  • When something does occur, that is when you need procedures in place and an incident respond team that has been trained and knows what to do. I know analyzing thousands of lines of logs is a daunting task, but with training and the correct tools you will be able to narrow this down quite a bit. It's still going to suck in the end, but might be the only solution. You also need to make sure you have a good understanding with management and how to control any announcements of incidents. Also, good backup procedures could minimize how long you are down if it the system is completely unrecoverable. –  Jan 04 '11 at 14:47
  • I'm used to grind some billions of lines of logs per day and what I know is that before to understand what the heck was happened, is much more important to fix or workaround, that can be even a temporary server with just a static page explaining to the users blah, blah, ..., blah and apologizes. This is the first step, then you think about what and when you can reestablish the service (or part of it) and finally you investigate and put in place any countermeasures. – tmow Jan 06 '11 at 13:26
4

What you really want can fall down into 3 basic areas:

  1. Standard System Configuration
  2. System/Application Monitoring
  3. Incident Response

If you have any information (assurance|security) staff available, then you should definitely talk to them. While Incident Response is often the sole purview of said office, the rest should be a joint development effort across all affected parties.

At the risk of self-pimping, this answer to a related question should index a lot of useful resources for you: Tips for Securing a LAMP Server.

Ideally, you should have the smallest number of supported OSes, and build each one using a base image. You should only deviate from the base as much as is required to provide whatever services that server provides. The deviations should be documented, or may be required if you have to meet PCI/HIPAA/etc. or other compliances. Using deployment and configuration management systems can help out a lot in this respect. The specifics will depend a lot on your OS, cobbler/puppet/Altiris/DeployStudio/SCCM, etc.

You should definitely perform some kind of regular log review. Given the option a SIEM can be very helpful, but they also have the downside of being expensive both in purchase price and build-out costs. Check out this question from the IT Security SE site for some comments on log analysis: How do you handle log analysis? If this is still too heavy, even common tools such as LogWatch can provide some good context for what's going on. The important piece, though, is just taking the time to look at the logs at all. This will help you get acquainted with what constitutes normal behavior, so that you can recognize abnormal.

In addition to log review, monitoring the state of the server is also important. Knowing when changes occur, whether planned or not, is crucial. Utilizing a local monitoring tool such as Tripwire can alert the admin to changes. Unfortunately, much like SIEMs and IDSes has the downside of being expensive to tune and/or purchase. Moreover, without good tuning, your alert thresholds will be so high that the any good messages will be lost in the noise and become useless.

Scott Pack
  • 14,717
  • 10
  • 51
  • 83
  • I agree on almost everything but this apply mainly to medium and large companies. Small sized companies won't need or want such expensive structure. – tmow Jan 04 '11 at 08:29
3

A proper Security Information and Event Management (SIEM) policy in place will go a long ways to making your security life easier.

GregD
  • 8,713
  • 1
  • 23
  • 35
2

I'm not a security expert, so I mainly defer to them; but starting with the Principal of Least Privilege almost always makes their job significantly easier. Applying this like a healing salve works well for many aspects of security: file permissions, runtime users, firewall rules, etc. KISS never hurts either.

Chris S
  • 77,337
  • 11
  • 120
  • 212
2

Most of the solution mentioned here applicable at the host and network level but we often forget insecure web applications. Web applications are the most commonly over looked security hole. By the way of web application an attacker can gain access to your database or host. No firewall, IDS, firewall can protect you against those. OWASP maintains a list of Top 10 most critical vulnerabilities and offers fixes for them.

http://www.scribd.com/doc/19982/OWASP-Web-Security-Guide

Sameer
  • 4,070
  • 2
  • 16
  • 11