64

In a seminar, one of the Authors of Beehive: Large-Scale Log Analysis for Detecting Suspicious Activity in Enterprise Networks said that this system can prevent actions like Snowden did.

From their articles' conclusions;

Beehive improves on signature-based approaches to detecting security incidents. Instead, it flags suspected security incidents in hosts based on behavioral analysis. In our evaluation, Beehive detected malware infections and policy violations that went otherwise unnoticed by existing, state-of-the-art security tools and personal.

Can Beehive or a similar system prevent Snowden type action?

Johnny
  • 1,051
  • 5
  • 19
kelalaka
  • 5,409
  • 4
  • 24
  • 47
  • 47
    Simple answer: No, most certainly not. Snowden was someone who had privileged access and had the authority and reason to mass-download content (he was a sysadmin). – forest Nov 07 '18 at 11:29
  • 3
    But in the training case, they model everybody according to their behavior. So, after the training, a mass download will be a behavioral change that will produce an alert signal. – kelalaka Nov 07 '18 at 11:31
  • 1
    @kelalaka Not if mass-downloading was taking place during the training. – TripeHound Nov 07 '18 at 11:36
  • 9
    Unless mass-downloading is 1) not common and 2) it's not possible to just throttle the download. – forest Nov 07 '18 at 11:43
  • 16
    Why "mass download" is even considered suspicious. there are will be some sorts of constant "mass" downloads during everyday usage, was my first thought. What is mass download? 1 MB? 500 MB ? 5 GB? 500 GB? ... – Croll Nov 07 '18 at 12:48
  • 8
    @Croll If your organisation has one million files, any one person probably doesn't need to access anywhere close to that many in order to do their job (most files won't be related to their work). If somebody starts trying to download all one million over a day or two, that's suspicious. Even a small percentage of that one million could be suspicious. 1% of one million is 10,000 files. How many people working for your organisation need to access 10,000 files over the span of 48 hours to do their job? Very few (if any). – Anthony Grist Nov 07 '18 at 16:56
  • 3
    At my job as a programmer, I have a little "award" for touching 0.0001% of our code over my three years here, which is more than many employees. – Mooing Duck Nov 07 '18 at 23:01
  • @forest But the data ended up on an insecure drive. Would it be possible to see where the mass-download was going (or mass-copy-paste). – Carl Nov 08 '18 at 12:18
  • 3
    @AnthonyGrist - I regularly deal with large numbers of files - e.g. 79,000 or there abouts for one batch process today. And yes I have to move them around the system zip archive them etc.. – Charemer Nov 08 '18 at 13:52
  • 1
    @forest I don't see your point. Because he was a sysadmin **any** access to confidential files should have triggered an alert! His job is not reading the documents, but making the system work. Surely the only situation in which that kind of behaviour would happen legitimately is if they have to transfer a data storage to a different system, in which case the SOC is notified and they can dismiss the huge red flags triggered during those operations. – Bakuriu Nov 08 '18 at 21:24
  • 3
    There's a conflict between the title and body of this question. The title says "Can Beehive detect a Snowden-like actor?" while the question at the end says "Can Beehive or a similar system prevent Snowden type action?" These are two different questions, as detecting something means it has happened, and preventing something means it never happens in the first place. – barbecue Nov 09 '18 at 02:06

5 Answers5

143

A backup operator will have the permission and behavioral markers of someone that moves lots of data around. Like any sysadmin where there's no dedicated backup operator in place.

Snowden was a sysadmin. He would knew all the protection protocols in place. He could just impersonate anyone, from any area, download things, impersonate the next one, and keep doing that.

If it's common knowledge that there's no bulletproof protection against a dedicated attacker, imagine a trusted internal dedicated attacker with sysadmin privileges.

ThoriumBR
  • 50,648
  • 13
  • 127
  • 142
  • 182
    TL;dr: you can't protect yourself against yourself. – Braiam Nov 07 '18 at 13:14
  • 1
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/85510/discussion-on-answer-by-thoriumbr-can-beehive-detect-a-snowden-like-actor). – Jeff Ferland Nov 08 '18 at 19:59
23

Anomaly detection systems like Beehive make it easier than before to dig through lots of data and detect suspicious behavior. This means that it is possible for an analyst to focus on the more relevant data, process more data in shorter time and also use more detailed input data for the analysis. This way the chance is higher than before that somebody can detect unwanted behavior.

It is claimed (and I have no reason to doubt this claim) in the Beehive paper that the system can detect more incidents than the usually used systems - but it is not claimed that the system can detect every incident or even how much of all incidents it could detect. Thus, it might be that other systems only detect 10% of all incidents and Beehive detects 20%, which is good but not really satisfactory.

Could such a system detect somebody like Snowden? This depends a lot on how much and what kind and what detail of data is collected for analysis, how strict the existing security policies are in the first place so that policy violations can be logged and how much the illegal (as seen by the NSA) activities of Snowden differed from his usual work activity. The more it differs the more likely it can be detected by anomaly detection system. But the more similar illegal and legal activities are in terms of the logged data, the less likely is that illegal activities will be reported as anomaly.

In other words: it could help to detect some Snowden type actions but it will not detect all Snowden type actions. And preventing such actions would be even more difficult, more likely is a more early detection after some harm was already done and thus limiting the impact.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • 3
    And the false positives... Wow, imagine you got promoted to a System Admin position and then suddenly you have federal agents show up at your door... – Nelson Nov 08 '18 at 08:18
  • 8
    @Nelson Federal agents will be at your door long before that if you're in the running for a sysadmin position. Get ready for looooads of profiling and interviews. – Lightness Races in Orbit Nov 08 '18 at 11:03
18

Snowden's intent was data exfiltration and he was also a system admin. So, he had access to large amounts of data normal users didn't and would have a different pattern of how he interacts with the network. If Beehive was in place, it may have logged that he was doing something but anyone who has an intent of data exfiltration would've known how to bypass alerting: make your pattern of data exfiltration "normal" from the time the system started getting trained and it wouldn't be flagged as anomalous activity. Snowden could've had pattern of dumping 16GB a day to a USB thumb drive but as long as he didn't do sudden change in his techniques, Beehive wouldn't have flagged him.

I'm working on some custom ways at work to detect this kind of pattern. But, right now I don't know of anything automated that'll do a good job.

RG1
  • 191
  • 2
10

No it can't.

And the quote that you pulled clearly explained why not, and how people came to claim that it could.

What Beehive might be able to do is tell you that a Snowden-style attack has taken place. (even thoguh goin by @ThoriumBR a SNOWDEN would not have been prevented)

What you (or that guy) claims is that it could PREVENT such an attack, which is far, far different. Beehive is crawling logs and (maybe, didn't read too much) combining that with some advanced analysis. Which means that even if your analysis-and-flagging system is running in real-time it would probably be too late.

[Just imagine where Beehive comes in:

Suspicious action -> security program -> log -> beehive extracts data -> beehive analysis -> flag thrown -> intervention?

This is far too late (and it assumes that the logs are evaluated in real-time]

Logs are for retroactive investigation, not real-time intervention.

What you could do is produce a pseudo-log for any action, have that analysed by Beehive and only upon being greenlit the action is executed. The enormous overhead and noticeable delay would make that approach a really hard sell to any manager though. [also, not using logs but build in evaluating-mechanisms in your platform would be far better]

Hobbamok
  • 227
  • 1
  • 7
  • 7
    And the false positives. Job promotions will be a nightmare, as will department changes. – Nelson Nov 08 '18 at 08:20
  • As a sysadmin, could one simple alter the logs? – paulj Nov 08 '18 at 14:51
  • 1
    @paulj Not if the logs are sent to a remote server or forward-sealed, but that only applies to logs that were already generated. A sysadmin could, of course, forge any _subsequent_ logs. – forest Nov 09 '18 at 08:28
  • 1
    Incidentally (and unrelatedly), modern file systems do have [pseudo-logs](https://en.wikipedia.org/wiki/Journaling_file_system), which are finalized much more quickly than something like Beehive could match – jpaugh Nov 09 '18 at 17:50
6

First of all, there is a very important distinction between being able to detect a "Snowden-like" actor and being able to prevent one. As far as I have seen, Beehive makes no claims about preventing one, but rather seems to promise the ability to give you alerts that suspicious activity is happening in your network. Sure, not as good, but still considered a "holy grail" in some research communities.

With that said, I'm extremely doubtful that Beehive is able to meet those expectations. Machine learning can do quite well at extracting complex patterns from large piles of data with reliable identities. For example, differentiating between pictures of cats and dogs is extremely reliable; we can all do it 99+% of the time, yet if I had to tell what's the exact algorithm for taking in 100x100 pixels and determining cat vs dog, I have no idea how I would do that. But I can supply you with 100,000 of such images and let ML methods figure out a rule that reliably differentiates between the two based on the values of 100x100 pixels. If I do things right, the rules created by ML should even work on new images of cats and dogs, assuming no huge changes in the new data (i.e., if I only used labs and tabby cats in the training data, then try to get it to identify a terrier...good luck). That's ML's strength.

Determining "suspicious behavior" is a much more difficult issue. We don't have 100,000's of samples of confirmed bad behavior, and we don't even really have 100,000's of samples of confirmed good behavior! Worse yet, what was a good ML method that worked yesterday doesn't work today; unlike cats and dogs in photos, adversaries try really hard to trick you. Most people I know working on ML for cyber security have accepted that the idea of purely automated detection is beyond our grasp at the moment, but perhaps we can build tools to automate very specific repetitive tasks that a security analyst needs to do over and over, thus making them more efficient.

With that said, the authors of Beehive seem to have skipped that lesson and claim that they've solved this problem. I'm highly suspicious of the performance, especially given that the methods they suggest are the first one a ML researcher might think to try and have routinely been rejected as not useful. For example, they suggest using PCA to identify outliers in logs. This, and variations of it, has been tried 100s of times and the result is always that the security analyst shuts off the "automated detection" because they get so many false positives that it costs way more time than it saves.

Of course, in all these methods, the devil is the details and the details of these types of methods never really get exposed in published work ("we used PCA to look for outliers in server logs" is an extremely vague statement). It's always possible that they have some super clever way of preprocessing the data before applying their methods that didn't make it into the paper. But I'd be willing to bet my right arm that no user of Beehive will be able to reliably differentiate between "Snowden-like" behavior and non-adversarial real world use of a network in real time.

Cliff AB
  • 241
  • 1
  • 4