Incident response and recovery from a security breach with unknown attack vector

Question

Security breaches, hacks, “cyber” attacks or server compromises happen quite frequently, unfortunately, such as Quora in December 2018, Facebook in September 2018, Equifax in September 2017, Exactis in June 2018, MyFitnessPal in March 2018, and thousands of others.

With regulations that dictate mandatory breach notifications, such as the GDPR, we can expect that the number of cases will continue to grow. Moreover, it’s likely that there are tons of cases that are simply unknown to the public, have been unreported, or even undetected by the companies themselves.

Yet to my surprise, what I never hear about is a company going offline for a while because they are still investigating and haven’t yet found out how the attackers got in, or a company falling victim to another attack again within a few days or weeks.

(That is, except from some rare cases, of course. Though preventing another (spear) phishing attack is, of course, not something you can just apply a patch for, it’s a process.)

There may be compromises following attack vectors which are hard to detect, e.g. rowhammer, spectre, certain zero-days, an infected package manager opening reverse shells, etc. If attackers got access to a company’s network by using credentials they obtained from a phishing attack against an employee, and they cleaned up their traces and deleted logs, it might be imaginable that neither a company nor their forensic advisors are able to detect how the attack succeeded.

Do NOT put the affected systems back online until this stage is fully complete [...] Examine the 'attacked' systems to understand how the attacks succeeded in compromising your security. Make every effort to find out where the attacks "came from" [...]

– Rob Moir

We should expect some companies to go offline for a few hours while they do their analysis and recovery, some for a few days, and some never coming back online, shouldn’t we?

Are all companies, even those with horrific vulnerabilities, excellent at intrusion detection and forensic analysis, or did I just miss something?

"what I never hear about is a company going offline for a while because they are still investigating and haven’t yet found out how the attackers got in, or a company falling victim to another attack again within a few days or weeks." - you have not been paying close enough attention to the news — schroeder, Jan 29 '19 at 17:08
Why would a system shut down if it got compromised? What would be the point? The data is already out there. And it is trivial enough to clone a system to run forensics on. Also, you have a logic error in your reasoning. By the time the hack is announced by the org, they have already performed their investigation. So your supposition that there would necessarily be downtime is flawed. — schroeder, Jan 29 '19 at 17:12
@schroeder Thanks! Regarding your first comment, I already assumed that there could have been news that I didn’t get. That’s why I ended the question with “or did I just miss something?”. Regarding your second comment, that’s precisely what the one quote in my question is about. It’s from a highly-upvoted answer on this network. Regarding “By the time the hack is announced by the org, they have already performed their investigation”: At least since we’ve had the GDPR (in Europe), that’s clearly wrong, unless you can always complete the entire investigation in a few days. — caw, Jan 29 '19 at 19:39

score 6 · Accepted Answer · answered Dec 04 '18 at 06:49

You are viewing this from a security perspective and (I assume) from the perspective of a security practitioner. Assuming the company is in the business of making money, they will shut down the bare minimum required time because it will affect profits. IT will seldom have much input into whether or not they can remain offline while doing root cause analysis. Hell, most of the breach investigations I've been involved in, the systems are simply restored from backups, no investigation is done into what happened until they've been hit for the umpteenth time in a few months, and we are given almost nothing to work with in finding the root cause. Management cares about dollars, and though processes may change with things like GDPR I doubt this will change too much, except more IT staff will be fired after a breach. As for your question about 'excellent intrusion detection' capabilities, these are generally lacking too. If you read through the Verizon Threat Report: http://www.verizonenterprise.com/industry/public_sector/docs/2018_dbir_public_sector.pdf on page 10 it gives a break down on 'discovery', and its generally in the months to detect a compromise.

Thank you! My assumption was that either (a) *some* companies would have to discontinue their services temporarily (i.e. go offline) after an attack because they are *aware* of a breach (e.g. due to user data published online) but *don’t know* yet how it happened, or (b) *some* companies would have to suffer another data breach shortly afterwards again due to the same (undetected) vulnerability. I rarely see either happen. Regarding that interesting Verizon report, I can only see 8 pages (not 10), but on page 3 it does indeed say “68% of breaches took months or longer to discover”. — caw, Dec 04 '18 at 07:34
Management cares about money (or reputation when money is not a factor, like for governments) therefore it cares about security inasmuch as it affects the business. If they don't care enough then their IT security teams are to blame for nor raising the right level of awareness. With GDPR management will care more because the high fines and reputational impact are bound to affect any type of business. — Enos D'Andrea, Jan 27 '19 at 11:51

Tom K. · Answer 2 · 2018-12-04T13:22:48.843

There is a misconception in your question regarding the attack techniques and exploits that are used in these spectacular and widely-known security incidents.

Exploits are usually categorized with metrics (upon others) like severity and complexity. Usually(!) the more complex an attack the harder the forensics that are involved to figure out what exactly happened. The attacks you mention are indeed hard to execute and therefore hard to investigate. But the more important point here is: these are - according to the affected companies - not the ones that were used by attackers in these popular incidents.

Take the Equifax breach for example. Quote from Wikipedia:

Equifax said the breach was facilitated using a flaw in Apache Struts (CVE-2017-5638). A patch for the vulnerability was released March 7, yet the company failed to apply the security updates before the attack occurred 2 months later. However, this was not the only point of failure: contributing factors included the insecure network design which lacked sufficient segmentation, potentially inadequate encryption of personally identifiable information (PII), and ineffective breach detection mechanisms.

These describe no fancy attack vectors at all. The Apache Struts flaw was pretty well known by then and Struts itself was (to my knowledge) a widely-used framework. Whenever a CVE like this gets published, it gets tested by attackers around the globe immediately. Equifax IT should've patched their systems as soon as possible, but they did not. On the other hand, patching your servers takes some time, but it doesn't take forever. So you can limit the availability of your service for a while and then slowly ramp it up again, once the servers are updated.

In addition: If Equifax would have had proper segmentation, encryption or detection - all three extremely well known and must-have security-enhancing techniques - the breach would've been half bad. But they did not.

My point is: attackers don't need complex exploits chained together to create a new Stuxnet to hack large corporations like Equifax or Quora. A two month old RCE exploit is good enough.

To add a bit of speculation on my part, to answer your question "Why can they react so fast" from a different angle. I guess most of these companies knew about these security holes. And knowing about the vulnerabilities in your network makes forensics a lot easier.

Thank you! My question was obviously not restricted to Spectre and rowhammer *only*, so there’s no misconception regarding the attack techniques here, I’d say. But what you describe – most breaches being a result of simple vulnerabilities which are easy to detect after the fact, e.g. well-known but unpatched CVEs or unchanged default passwords – is a good theory, I think. It still doesn’t explain *all* cases, of course. (But perhaps more than one might think.) Finally, as you say that Equifax should “have had proper segmentation, encryption or detection”, having them is not binary, is it? — caw, Dec 04 '18 at 19:54
1. Wrt to the misconception: I think a lot of people have this misconception that hacking big companies relies on 0days and fancy exploits, so I think this is an important Argument. 2. it is not binary, but either one of those techniques would have either slowed the attackers down or would've mitigated the attack altogether. — Tom K., Dec 04 '18 at 20:21
@caw As you now have put a bounty on this question: what is missing from this answer that makes it "not good enough" for you? ;) — Tom K., Jan 24 '19 at 14:52
As per the bounty’s caption, it’s just to “draw attention” to this question because I think there could be more evidence, best practices or real-world examples. — caw, Jan 24 '19 at 15:28
Just to underline this again, I think this answer and the arguments it contains are *one* very important perspective on the issue – just as I noted in my first comment. So one should keep in mind that, after all, most (successful) attacks are *not* as sophisticated as one might think, could have been prevented easily and are discoverable just as easily. — caw, Jan 30 '19 at 22:05

atdre · Answer 3 · 2020-04-23T15:39:21.277

Yes, Intrusion Detection and Digital Forensics have components that can be automated for quick triage on large-scale installations in very diverse technology infrastructures and complex global organizations.

Incident Response and Crisis Management are more-difficult, which often include the onus to Pull The Plug, or go offline -- especially during a root-level intrusion (aka Administrator-level, often described as Domain takeover or Domain Administrator compromise). It used to be that going offline was more common, but the Digital Forensics and Incident Response (DFIR) platforms began changing around 2012.

What these new DFIR platforms added, in terms of disruptive technology, was the ability to do Live Response. Before Commercial platform support for Live Response (e.g., FireEye HX -- originally named MIR or Mandiant Incident Response, CbER, Crowdstrike Falcon, etc), most DFIR platforms were focused either on Digital Forensics (e.g., EnCase used by corporations, or FTK used by governments especially the FBI) or Incident Response (e.g., Belkasoft RAM Capturer or Sysinternals Autoruns) -- not both. One exception was the F-Response platform, which began shipping circa 2009 (an early adopter of these techniques). The term, DFIR, wasn't used or popularized until at least 2013 -- so this is all still a very new concept for most cybersecurtiy / Infosec / IT shops.

More recently, there are new commercial solutions (e.g., Velocidex) popping up around DFIR Live Response platforms (often based on free, open-source solutions such as what was formerly-known as Google Rekall, a fork of the also-free/open-source Volatility Framework). However, there are also many solutions that are trying to indicate that they share similarities with these platforms, even though they are closer to classic Anti-Virus (AV) platforms. The official terminology is Endpoint Protection Platform (EPP), with solutions from SOPHOS, Symantec, and Mcafee. However, some platforms that are clearly EPP (e.g., Cylance, SentinelOne) try to use the terminology NG-AV (for next-generation anti-virus) or, worse, Endpoint Detection and Response (EDR), which spoils what the DFIR Live Response platforms originally attempted to disrupt.

Commercial EDR platforms are often focused more on Detection than Response, meaning that they are closer to classic AV. A true Live Response platform will enable at least 2 primary capabilities:

Perform Host Isolation, meaning the ability for a system to go offline while allowing the responders the ability to access the host remotely.
Provide a full-system Memory Dump from an isolated host across a variety of Operating Systems without degradation of performance and while retaining superior stability. If a kernel panics (i.e., the whole operating system crashes), then it often completely ruins the ability to retain a memory capture. The memory dump must include higher-order bits that contain the system's MBR or GPT in order to detect and respond to potential rootkits in firmware such as BIOS or UEFI matter. Often, this means that the platform installs a driver, and drivers must be carefully coded in order to prevent system crashes.

Very few DFIR service providers retain the talent and automation pipeline necessary to perform quick triage in large-scale installations even when they have successfully rolled out an EDR or DFIR Live Response platform for their clients, enhanced them, integrated them -- during the incident or crisis (post-breach), or before (pre-breach). Some of them include The Cowen Group, FireEye's Mandiant, Crowdstrike, Verizon Business, and Stroz Friedberg. There are some new players, such as Endgame, and some tied to specific industries such as Trustwave in the payment card industry. You'll see their names come up in breaking news stories around major data breaches.

In many cases, the org that suffered the news breaking major data breach already had one of these DFIR service providers (or a competitor) on retainer -- meaning that they've been paying them monthly or yearly just to keep the door open in case a crisis occurs. Sometimes you'll hear this specific offering referred to as a Compromise Assessment. These are definitely the cream-of the crop in terms of speedy and high-quality intrusion detection and digital forensics analysis!

You'll see these DFIR service providers tools (or portions thereof) and techniques in books, resources, and even open-source solutions. For example, The Cowen Group is related to TriForce (a patented digital forensics technique), FireEye has the FLARE VM, Crowdstrike has Falcon Orchestrator and VxStream Sandbox, Verizon Business released VERIS, and Stroz Friedberg has their own Github with lightgrep and acquired fsrip. Some of these were through acquisitions and others through spinoffs.

It's not just the private industry that has innovated -- clearly much of the work of MITRE, CIRCL, CERT-BDF, CERT-Tools, ANSSI-FR, and CSE-CST has been prescient to all of the above.

Others are just cool in their own right, such as Gransk, PUNCH-Cyber, and SkadiVM.

Thanks! Well, first and foremost, I see tons of product names. Is this just for large companies and the enterprise sector or is intrusion detection really something where it’s not about understanding (relatively simple) underlying concepts but shopping for the best commercial product that (magically?) does its job? — caw, Jan 24 '19 at 20:34
There's a lot of open-source solutions mentioned throughout the post (with links). Maybe what you need is a map of capabilities to both commercial and open-source? Or describe your use case and I can make some suggestions. Sounds like a new question -- perhaps post one here and point us towards it? — atdre, Jan 24 '19 at 20:56
Ideally, it’s a mix of free (but not open-source), open-source, and commericial tools (some of which are expensive and some of which are dirt cheap). Check out Eric Zimmerman’s new course on Battlefield Forensics — https://www.sans.org/course/battlefield-forensics-and-data-acquisition — where he lists his course outline detailing methods and tools. Eric publishes many many open-source DFIR tools that are heralded as better than their commercial equivalents. — atdre, Jan 25 '19 at 15:14

score 1 · Answer 4 · answered Jan 29 '19 at 01:00

1

Consider this from a standpoint of possible permutations of scenarios:

a) compromised: compromise known. shut down.
b) compromised: compromise known. keep running.
c) compromised: compromise not known. shut down.
d) compromised: compromise not known. keep running.

e) not compromised: compromise not known. shut down.
f) not compromised: compromise not known. keep running.

Once you clean any machines you find infected, you may or may not still be compromised elsewhere, but you don't know, so 'a' and 'b' don't apply anymore.

At face value, 'c' and 'e' are ridiculous. Why would you shut down if you didn't know you were compromised? You can't prove a negative (i.e. you can't prove you're not compromised; you may just never do the right test, or not know about the vuln, etc etc etc). Or you may actually not be compromised!

You're essentially proposing this, though. "Shut down until you know what the vector is" sounds good on paper, but there's no guarantee you'll ever know, or that one vector you find is the only one used, or that no backdoors were opened in other areas...

There's an infinite number of rabbit holes you can go down in DFIR, but the key principle with any investigation is to only act on the evidence you have.

answered Jan 29 '19 at 01:00

Angelo Schilling

681
3
11

1

Thanks! That makes sense. I wasn’t suggesting to shut down in the “compromised: compromise not known” cases, though, but in the “compromised: compromise known: attack vector not known” cases, which *must* exist, right? – caw Jan 29 '19 at 12:49
Most larger online services operate with [high availability](https://en.wikipedia.org/wiki/High_availability) which would allow many systems to come offline while still providing services to users. This also allows for systems to be brought offline and new systems spun up to take their place. If we go to a big vendor, systems are coming down and spinning up based on demand without us ever knowing... – they Jan 29 '19 at 20:06
@caw the problem is that if the vector is unknown, it could be literally anything. And you don't want to act without evidence justifying the action. If a machine is known to be compromised, "nuke it from orbit"(and rebuild) is usually what's done, but you can't do that for the entire environment; it's just not worth the time and money. "But if they're still in the system it could cost you more in the future" is true, and that's why after breaches, most companies hire third party investigators, so there is an external entity validating your results. – Angelo Schilling Jan 29 '19 at 22:59
@AngeloSchilling Thanks, that’s reasonable, and I agree. So the point really was: What if “nuke it from orbit” (and rebuild) is not enough (because you don’t know why it happened)? If you don’t know the attack vector they used and thus couldn’t fix the vulnerability, even after “nuke it from orbit”, the attacker would be able to use the same technique again and get in. But it seems that doesn’t happen in practice, probably because of good third-party investigators. – caw Jan 30 '19 at 01:05
1

to be clear, it totally DOES happen where the attacker still has access. The 3rd party investigators are trying to find the source of the breach, certainly, but their primary purpose is to cover your ass from a liability standpoint as much as possible; you don't want someone to claim you didn't do your due diligence after a breach. – Angelo Schilling Jan 30 '19 at 18:04
@AngeloSchilling Thank you, due diligence and limiting liability are important points here. So even if what this question is about *does* happen, i.e. your services stay online or come back online while still having (some of) the vulnerabilities that had been exploited, the third-party investigators, if they didn’t discover these, will at least cover your ass (to some degree) and you will “only” lose (more) trust if it happens again. – caw Jan 30 '19 at 22:10

score 0 · Answer 5 · answered Jan 28 '19 at 22:12

0

Are all companies, even those with horrific vulnerabilities, excellent at intrusion detection and forensic analysis, or did I just miss something?

Huh?

Many companies with horrific breaches are not good at IDS. In many of those cases, the breach existed for many months prior to it being discovered.

Moreover, many studies exist which show while security is indeed on the radar of executives, what they are doing about it is a completely different story. So I'm not sure where you get this assumption that companies, in general, have robust IDS or even forensic capabilities (<- they don't; that's why they have to contract with consulting firms).

answered Jan 28 '19 at 22:12

MGoBlue93

185
7

Thanks! The assumption that “all companies […] [are] excellent at intrusion detection and forensic analysis” is not an assumption that I hold personally, but was the assumption for the argument, with the emphassis on “or did I just miss something”. The point is that we should have seen a lot more cases with multiple breaches at the same company in quick succession or cases where a company shuts down for weeks – *unless* they are really excellent with their IDS, forensics and recovery processes. – caw Jan 29 '19 at 12:47
> The point is that we should have > seen a lot more cases with multiple > breaches at the same company in quick > succession or cases where a company > shuts down for weeks I'm not sure what the "argument" is anymore. You have too many assumptions clouding the discussion. Why *should* we have seen multiple breaches in quick succession? Not if an adversary has established persistence -- which is exactly what happened in my of the cases you cite. Moreover, a company isn't going to shut down. Sony, Target, Equifax certainly didn't. Where does a shutdown expectation come from??? – MGoBlue93 Jan 29 '19 at 16:45
Sorry for the confusion. The reasoning is that either companies are perfect in their execution of IDS, forensics and recovery, or some of them will ultimately have compromised services and data breaches without the company having a full understanding of how that happened. Now if they don’t have a full understanding of how that happened, they can either shut down for some time (until they have a better understanding), which we don’t really see anywhere, or they could stay online (or go online again immediately), in which case they would risk getting hacked again after a few months. – caw Jan 29 '19 at 19:44
There's no confusion... "The reasoning is that either companies are perfect in their execution of IDS, forensics and recover". What org has a perfect IDS solution? – MGoBlue93 Jan 30 '19 at 18:50
"some of them will ultimately have compromised services and data breaches without the company having a full understanding of how that happened." Yes. As the other poster tried to explain to you, there's no truth test for proving what you don't know. – MGoBlue93 Jan 30 '19 at 18:51
"hey can either shut down for some time" You're going around in circles here. There's no expectation of shutting down. They may take a server offline, redeploy stuff, etc., but you've mentioned multiple times in here "shutting down" without explaining how that should actually be an expectation. – MGoBlue93 Jan 30 '19 at 18:53
`many studies exist which show while security is indeed on the radar of executives, what they are doing about it is a completely different story` - can you show some? And how does that answer the question? – Tom K. Feb 04 '19 at 09:10
@Tom K., it was an attempt to ferret out of the OP where this shutdown expectation comes from. By way of comparison I work for a well-known consulting firm. I'm in a different office every couple of months in front of a lot of different leaders -- I've never heard of a company shutting down due to an incident but this poster has a death grip on shutting down. – MGoBlue93 Feb 06 '19 at 01:07
@Tom K., regarding your research comment, it's not that hard of a find... www.ey.com/Publication/vwLUAssets/Global_Information_Security_Survey_2016/$FILE/REPORT%20-%20EY's%2019th%20Global%20Information%20Security%20Survey.pdf, https://assets.kpmg/content/dam/kpmg/pdf/2016/04/11491-CEO-Cyber-Report-web-FINAL.pdf – MGoBlue93 Feb 06 '19 at 01:11

Incident response and recovery from a security breach with unknown attack vector

5 Answers5