58

In 2013, a Citibank employee had a bad performance review that ticked him off. The results were devastating:

Specifically, at approximately 6:03 p.m. that evening, Brown knowingly transmitted a code and command to 10 core Citibank Global Control Center routers, and by transmitting that code, erased the running configuration files in nine of the routers, resulting in a loss of connectivity to approximately 90 percent of all Citibank networks across North America.

Now, there is a question about securing a network against attacks from the inside, but that question explicitly excludes insiders going rogue. There is also a question about protecting a database against insiders, but that's concerning high-tier problems.

I also read What is the procedure to follow against a security breach?, but most answers on there act based on the insider being an employee that got fired. I'm asking about someone who hasn't been fired yet. They might have had a poor performance review, but haven't yet been terminated yet. They might just be unhappy about something their partner did, or they might have gotten upset about something.

The problem I'm describing here is a large company where a user who is unhappy about their job snaps on a certain day and issues system-breaking commands tha they have full privileges to issue. Things like wiping machines, physically damaging essential infrastructure,... purely technical interference, nothing like leaking emails or secrets. The aim is just to do as much damage as possible to the infrastructure to go out with a bang.

The article gives a few cursory mentions of things to do, but nothing really concrete. What things can be done to prevent sudden rogue insiders from negatively impacting essential infrastructure using techniques they're privileged to do?

Nzall
  • 7,313
  • 6
  • 29
  • 45
  • 102
    Treating your employees well might be one strategy. The money you'll spend on making them happy will be much less that the amount you'll spend on disaster recovery if they are unhappy. – André Borie Jul 28 '16 at 13:12
  • 1
    Relevant: http://serverfault.com/questions/753268/linux-productive-sysadmins-without-root-securing-intellectual-property/753419#753419 – Sobrique Jul 28 '16 at 15:00
  • Do you have a [big 5](https://www.youtube.com/watch?v=y4GB_NDU43Q)? – TRiG Jul 28 '16 at 15:16
  • 13
    Quis custodiet ipsos custodes? It can't be turtles all the way down. – Jared Smith Jul 28 '16 at 15:38
  • 6
    @JaredSmith The Two-Man Rule only requires one more turtle. Though it does require a *turtle*, and those may not move fast enough for your business – Kyeotic Jul 28 '16 at 20:28
  • 2
    @Tyrsius the real problem is, as gowenfawr hinted, is that people who are smart enough to do technical damage are assumed to be smart enough to not commit easily traceable crimes that will get them sued and jailed, making the two man rule unnecessarily onerous. And generally they are. But boy howdy the exceptions.... – Jared Smith Jul 28 '16 at 21:32
  • Indoctrination? – Hack-R Jul 28 '16 at 22:04
  • Poor guy... he couldn't tolerate his prison-like job, now he gets one which is *really* prison-like. – peterh Jul 31 '16 at 02:48
  • We had a similar experience with our first web host back in 2000. A disgruntled employee scrambled their switch routing tables (as I was told) and they were unable to resolve it for days. It put them out of business! We were only able to retrieve our files when I convinced the host to hook our server up to their ADSL line. – Quinn Comendant Jul 31 '16 at 09:45
  • 2
    One straightforward way is to do or design random chaos testing of your live systems. If you have a team spending time finding all the ways your system(s) aren't covering for catastrophic failure points, you will likely identify failure points such as this. Give some folks a, ah, mission to find ways the company being shutdown and I bet you find some good procedures/policy updates. – enderland Jul 31 '16 at 12:22
  • 3
    @AndréBorie I think that is a bit of a naive statement to make. Even if you "treat them well" in a company like Citibank that hires ~250k people you are bound to have lots of bad apples who will think they are mistreated no matter what. – David says Reinstate Monica Jul 31 '16 at 15:59
  • 1
    @DavidGrinberg true, but that doesn't mean you shouldn't treat them well to at least reduce the risk of someone ruining your business. Of course, technical measures to prevent rogue employees should also be used but in moderation, so they don't make it too hard for employees to do their legitimate work. – André Borie Jul 31 '16 at 18:36
  • I wouldn't call a partial network outage devastating. I would reserve that word for worse incidents such as cases where data was irrecoverably lost or corrupted. Given the rarity of such events, the significant cost of protecting against it, and the limited damage - I would say it is probably not worth protecting against. As long as you are protected against permanent loss (or corruption) of data, you have probably gone as far as can reasonably be expected. – kasperd Jul 31 '16 at 20:30
  • Find people with integrity. – Mark Buffalo Jul 31 '16 at 20:57
  • @AndréBorie I agree that employees treated well make fewer mistakes and could report if something is amiss, but is it sufficient protection against them going rogue? – Jesvin Jose Aug 01 '16 at 09:31
  • @aitchnyu whenever possible technical solutions should be used as well, but sometimes these solutions would end up reducing their productivity so much that it will cost more in lost time to have those policies in place rather than to take the risk (and maybe have such an incident every 10 years). – André Borie Aug 01 '16 at 15:25

9 Answers9

41

What things can be done to prevent sudden rogue insiders from negatively impacting essential infrastructure using techniques they're privileged to do?

In practice, very little. But to explain why, let me talk about what you can do.

The issue here is that the user is "privileged" - they have been granted the power legitimately.

There are some things that can be done to limit the power given to legitimate users, even privileged administrators:

  • Control over available commands using something like sudo or PowerBroker.
  • Dual control (the "two-man rule" @paj28 describes
  • Workflow controls (which are often a form of dual control)

Now, these controls are used far less than they could be. Why? Because privileged users are trusted by definition. So I say very little not because there are no controls, but because the cost-benefit ratio of such controls when applied to trusted personnel is not enough to justify it.

Also note that the attack vector here was "in the plumbing" - if Citibank has dual controls, they're probably focused on things like funds transfers, whereas this attack came in at the knees and just took the underlying network down. These vital-but-quiet systems often have smaller circles of privileged users and less excessive controls.

The real failure here was not that there were not technical controls, but that the personnel controls failed miserably. It is standard practice to revoke access of privileged employees before they are terminated. Whoever decided that no such precaution was necessary when introducing conflict with a privileged employee used poor judgement.

(The company also employed punitive controls - the attacker is now sentenced to almost 2 years in prison and must pay nearly $80k. As the article points out, those things don't fix any of this.)

gowenfawr
  • 71,975
  • 17
  • 161
  • 198
  • 34
    I don't think the personnel controls failed here. Nowhere does it say that the employee was terminated. it was just a poor performance review. – Nzall Jul 28 '16 at 13:08
  • 4
    @Nzall personnel management should encompass behavior, reviews, reprimands, and terminations - it's a failure if it's only ever thought of with regards to terminations. Trust of people should not a binary (yes-or-no) thing. – gowenfawr Jul 28 '16 at 13:10
  • 32
    I agree with personnel management being more than just terminations, but the person was still employed, so he needed those privileges to do his job. He wasn't terminated yet, so revoking his privileges could actually be seen as interfering in the ability for him to do his job. – Nzall Jul 28 '16 at 13:22
  • 3
    @Nzall temporarily revoking his privileges and sending him home for the weekend to consider his review would have given him the cooling down period to avoid what ended up happening... trust is relative; if you trust the guy with three bad reviews as much as the guy with three glowing reviews, then what's the point of having reviews? – gowenfawr Jul 28 '16 at 13:25
  • 26
    The guy with three glowing reviews can crack all the same as the guy with three bad reviews. Making trust dependent on how someone performs sounds dangerous and might appear as nepotism in some cases. – Nzall Jul 28 '16 at 14:45
  • 6
    -1 For just pretending as if the employee was terminated. The employee was given a review only. I agree with @Nzall comments. Your second comment is one possible answer but then how do you decide when to revoke privileges? Do you do it at every negative review or at the second or the third? And how do you avoid the bitterness and the toxic environment the employer himself may be slowly creating? – Fixed Point Jul 28 '16 at 23:20
  • 4
    What happens if I am a perfectly good-intentioned employee who gets a bad review for some reason...and then my privileges are revoked because the employer trusted me before but not anymore. Without such a "demotion", I may have tried to figure out what was wrong with my performance and work harder to fix it. But now, I feel a little bitter and resentment and now I may want to get back at the big evil corporation. "This could result in a self-fulfilling prophecy" is my point. – Fixed Point Jul 28 '16 at 23:25
  • 1
    This is the answer. There really isn't much you can do, no matter what at some point there is a person holding all the right permissions to make a move like this. There is no real way around it. – coteyr Jul 29 '16 at 02:58
  • Almost certainly the user in this case thought the system had failed and that he thought he should have gotten better and likely felt singled out. Perhaps oversight on reviews would have been a good idea. If a review needs to be described to the manager's manager before affecting the employee then singling out is harder and appears harder. This is assuming the employee was at least somewhat mentally stable, a medically paranoid person should probably not be managing all the networks. Background checks and health screenings should help there. – Sqeaky Jul 29 '16 at 15:06
  • @gowenfawr - What are you suggesting that they do? Black out everyone's permissions the day before releasing performance reviews? And leave it blacked out for how long? And with what to prevent an employee who received a poor review from simply waiting for the blackout to end before exacting their revenge? I think the bottom line is that if you don't trust your employee(s) to _not_ take revenge over a poor performance review, it's time to terminate; and if it's not time to terminate then it's also not time to arbitrarily revoke their privileges, temporarily or otherwise. – aroth Aug 01 '16 at 05:17
  • With regards to the whole "terminated employees should have access revoked [..] 'but he wasn't terminated'" discussion, the article notes that the guy said, "they was firing me." So perhaps one issue is the corporate culture that caused the employee to believe that they were going to be fired (eg, if the company had a reputation of firing those who had poor performance reviews). An alternative interpretation is that if the company was going to fire him, they should have done so immediately instead of giving him a warning that he perhaps couldn't recover from (guessing here). – Kat Aug 10 '16 at 22:25
33

Two-man rule - configure your systems so that all privileged access requires two people.

This could be a physical control - privileged access can only come from the NOC, and inside the NOC people physically enforce the rule.

More practical would be a scripting system. Sys-admins don't directly have root access, but they can submit scripts to be run as root. They will only be run after a separate person has reviewed and approved the script. There would still need to be a method for SSH access in an emergency - and the two-man rule could be maintained in that case using physical controls.

The NSA implemented this after the Snowden leaks. I have never seen a full two-man system in any of the commercial or government systems I have audited - although I have seen various partial attempts.

Update - there's more information on how to implement this on a separate question.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • 15
    But bear in mind that (robust) 2 man control is EXTREMELY high overhead on productivity. We reckoned on about a factor of 10, given the respective interrupts a decent SA will be processing during a given day. – Sobrique Jul 28 '16 at 15:04
  • 2
    @Sobrique - Not sure about 10x but I agree overhead is high - at least in a naive setup. If we were really smart about things you could bring the overhead down. e.g. read-only access to logs is ok for single sys-admin to initial diagnostics, they just need a second pair of eye to issue commands. – paj28 Jul 28 '16 at 15:25
  • 1
    The largest problem with a two man system is that it increases overhead. Lest assume that it's only 2 times. Well over the course of a 5 year period it's better to take the loss of income for the small outage then it is to pay a second person. And that's assuming that it's just 2x the cost. I think 10x is a bit conservative in many circumstances. – coteyr Jul 29 '16 at 03:02
  • @coteyr - The Snowden leak was more than a small outage, so for the NSA the overhead is worth it. I do think you're underestimating the possibility of doing two-man in a smart way and that a well designed system would have more like a 25% overhead. I'm wondering whether to ask a separate question about this. – paj28 Jul 29 '16 at 08:43
  • @paj28 It would be a question I would be interested in. – coteyr Jul 29 '16 at 14:22
  • 3
    *But bear in mind that (robust) 2 man control is EXTREMELY high overhead on productivity. We reckoned on about a factor of 10, given the respective interrupts a decent SA will be processing during a given day.* -- yes. Security has a cost and that cost needs to be balanced; disruption/delay in daily workflow when nothing "goes wrong" against the potential risk of someone going rogue. – Rob Moir Jul 30 '16 at 11:57
  • 1
    Companies I've seen don't use the 2-man control, instead there is a tremendous labyrinth of systems where everybody has control only on a small part of it. Only _very_ few have control on large parts, and probably nobody on the whole. -> It doesn't matter, who goes rogue, he can't cause too big trouble. This poor rogue in the article caused only a small problem, the router configs were probably restored from backups. He probably didn't have any influence on the backups. And the folks doing the backups couldn't to anything to the routers. – peterh Jul 31 '16 at 23:21
  • 1
    @Sobrique Not just high overhead, but without oversight/auditing of the auditor, also pretty easy and tempting for them to decide "I'll just implement a script to automatically approve all submitted scripts". And potentially still not that difficult for a savvy attacker to subvert, for instance by hiding an obfuscated malicious command within an otherwise routine script. – aroth Aug 01 '16 at 05:31
  • Separation of concerns can work a lot better. Separate out 'backup' responsibility, especially, and make it someone's job to ensure they're in 'good order'. – Sobrique Aug 01 '16 at 08:18
27

One approach is to accept that rogue actions cannot be prevented and focus on making sure the damage can be repaired. For example, make sure the routers have a separate control plane via which they can be brought back online. Make sure you have read-only backups (e.g. off-site tapes), so if someone wipes out all hard drives you can recover the data. Make sure data and code can be rolled back to a known good state quickly.

These safeguards will also help a lot in the case of unintentional mistakes.

Daniel Darabos
  • 540
  • 3
  • 6
  • 2
    I also thing that layers of redundance are the solution. Especially since they solve lot of *other possible problems* as well. – Tomáš Zato - Reinstate Monica Jul 28 '16 at 19:29
  • 1
    This is the best answer. You can't stop a trusted user from doing something bad. But you can recover from it pretty fast. – coteyr Jul 29 '16 at 03:04
  • @coteyr: this answer does not explicit that the privileged user cannot also affect the redundant part (unplug the control plane, wipe out the backups). It is something that need be stressed, for without it... – Matthieu M. Jul 29 '16 at 18:21
  • 2
    "read-only backups (e.g. off-site tapes)" I have never seen an admin awesome enough to teleport to the off site backups, and destroy those too. – coteyr Jul 29 '16 at 18:28
  • 1
    @coteyr - so you haven't seen Mr Robot? – paj28 Aug 01 '16 at 13:10
2

Audit. In particular network traffic and performed actions/operations on particular machines. You want to capture, who did what, when they did it and from where. Whilst this wont prevent an attack, it will help deter such actions if the insider believes that they will be identified and caught.

Then you have to get into the issue of tamper-proof auditing mechanisms

Colin Cassidy
  • 1,880
  • 11
  • 19
  • 6
    If someone really wants to do malicious acts, knowing that they will be caught often won't stop them. If people really would be stopped by proper auditing, it would be a much more discussed topic in the infosec arsenal. – Nzall Jul 28 '16 at 13:41
  • 1
    No it does not stop them (and I said as much), but if you knew that you actions were being logged securely, and in such a way that it could be used in a court action against you, you might at least think twice. If you're really on the ball you might detect actions that whilst not malicious, but could be actions that are 'testing the water' and knowing who that person was, and that they 'had a poor performance review' you can step in early and possibly prevent said malicious action – Colin Cassidy Jul 28 '16 at 14:02
  • 5
    This doesn't answer the question, which is specifically about preventing attacks from someone suddenly "snapping". The person in question plans on getting caught to make a statement. – TTT Jul 28 '16 at 14:07
  • 4
    Why do we have surveillance cameras? Why do we have police on the streets? Neither of these directly stop crime, nor will it stop the determined criminal. And yet their introduction reduces crime. These measures are simply another tool in a whole suite of tools to help. Likewise auditing actions and operations is another tool along with the others mentioned in this question. – Colin Cassidy Jul 28 '16 at 14:15
  • 4
    It STILL doesn't answer the question though whether or not it's a useful tool, your answer is about clean up, his question is about prevention. – Eujinks Jul 28 '16 at 14:57
  • 6
    Deterrence is a kind of prevention – paj28 Jul 28 '16 at 15:01
  • I think it's pretty clear that the guy in question either didn't think about whether he was going to get caught, or didn't care. Either way, more audits wouldn't help. (There are situations it might help - but I suspect its much more useful for external breaches.) – Martin Bonner supports Monica Jul 29 '16 at 14:16
1

The question of protecting a system or network from an insider, most specifically from the people who's own job description includes creating and managing such system has always been a tricky one.

First, what one must understand is that, in the end, it is fully impossible to prevent all kinds of attack against an infrastructure from the inside, because that would imply restricting all contact with the infrastructure, making it particularly useless.

However, there are ways we can prevent and minimize any damage to the system. To this process, I personally recognize there are three stages:

  1. The Two-Man rule
  2. The Accountability rule
  3. Division of Labour

These processes complement each other in helping any system remain secure from intruders working from the inside.

The Two-Man Rule

Let's start with the most obvious one, the Two-Man rule. An important part to IT and Infrastructure security is to make sure that all the behavior inside the system is identifiable and desired. By this implying that whatever action is taken inside the system is trusted.

When showing an example of this, my favorite way of explaining is the Git system of Forking and Pulling. In Git, everyone with access to the repository (The Infrastructure in this case) can make a copy. Then, people with access can request to pull their code into the repository. However, for this to happen, the pulled code must be analyzed, marked as compatible, and then authorized by someone else.

The same could be said and done for a secure Infrastructure. All management personnel can change the code, but for the changes to go into production, they must be approved by one or more staff.

The Accountability Rule

Another common problem with certain types of Systems and Networks is that there is one management account, who's password is known by all members with access. The first problem with accountability is raised here. Many companies, when in situations of rogue members making unauthorized changes in the server, rely on primitive methods such as checking the machine IP address, to locate who might have published changes to the system. This can be simply fixed by ensuring everyone has their own account, and making them aware that their changes are logged.

As mentioned in the last paragraph, logging is the second problem. The issue of trust rises to the surface again in this case. As the member is trusted to make certain changes to the system, the system is in most cases not properly logging the user's actions.

This situation is the perfect point to implement action accountability. The management user needs to be aware that not only are his/her actions tracked at all times while modifying the infrastructure, but they will also have contract-bound responsibilities and penalties for deliberate actions.

Division of Labour

This is another overlooked concept in most IT Infrastructure managerial positions. IT Teams have the tendency to divide their tasks, however, it is not uncommon for most users to have access to perform any task.

The best way to prevent this is to have specific system management tasks assigned to only two individuals (to prevent cases where one individual is not available). While other users can still verify and approve changes, using the Two-Man rule, only a handful of users can actually start those changes in the first place.

Personal Suggestion

A personal favorite way of implementing system-wide security, specially in large business environments is having 3 server sets. Alpha, Beta and Production, the first two being a clone of the latter. Anyone can move changes to Alpha, we use this system for testing how it would react in Production. Beta is for changes that have been tested and are ready to be deployed. To reach this stage, several members (~5) of the IT department must approve the change. When at this stage, the IT department also documents the changes, and sends them to Management and as a Memo to IT. To reach Production, 3 high-profile management members must approve the change, using their own accounts, which cannot be accessed by the IT department.

Last Note

As you may have noticed, this is not an easy process. Implementing many of these ideas will slow down production. This is one of the quintessential questions of Security. The more a system is secure, the more difficult it becomes to change and modify. To make your business productive, you must balance Security and Trust.

devSparkle
  • 111
  • 3
0

By ensuring that commands that would negatively affect infratructure in such a way it cannot be accessed remotely, can only be executed locally.

This can be achieved in a multiple ways. For example disallow shutdown command. Another way is to have a watchdog (hardware device that detects change in availablility) that will restart said infrastructure or run a recovery procedure if the infrastructure becomes unreachable.

A third way is to ensure remote access, by for example using KVM-Over-IP-solutions, and then tying these resources to controls that can only be manipulated on-site. Thus if infrastructure is brought down, it can easily be restored remotely.

Of course, its important to have backups of configuration files, important systems and such. Since configuration files are seldom changed, I would say a backup of config files would be good to do after each configuration change that is decided to be committed.

In case there is a need to disconnect infrastructure due to critical security events, a emergency system can be used, that will disconnect infrastructure in a way that its easily recoverable by management, without requiring a onsite visit.


The problem here was really not that rogue employee. What if that employee did just the same thing, but as a mistake. Lets say his intention was to repair a malfunctioning network device, not realizing that the equipment will be brought offline after a config erase, and does that particular code & command mentioned in the question, with the intention of recreating the config file after it have been erased, but after having executed the command, realizing "oops, can't connect to the device anymore".

And thus causes the same type of damage? Trust me, I have done the same mistake, mostly with my own systems, which have then required me to visit the physical location of the equipment. (but in that case the things were not critical, but if the systems are critical, you ought to do something to it)

Thats why it must exist safeties, so its impossible to cause that type of damage remotely, intentional or not.

sebastian nielsen
  • 8,779
  • 1
  • 19
  • 33
0

Lots of good answers here, but one that seems to be missing is to embrace the Principle of Least Privilege (PoLP). If there's no legitimate use case for erasing the router config files, don't give anybody access to do that. If there is a legitimate use case but it's not relevant for day-to-day operations, require (and audit) an approval process to obtain that privilege (this is, in effect, a variation on the two-man rule that only applies to especially sensitive operations).

There's also backups and fail-safes. To take the original example, if a router's configuration is wiped, it should revert to an unremovable failsafe configuration (and raise an alarm). This should also happen if it fails an internal health check. Alternatively, if you have health checks, you could have the system revert to a "last known good" configuration, essentially restoring itself from a backup if anything goes wrong. Write access to these failsafe/backups/health checks should be under extremely tight security - nobody should have day-to-day privileges to them, or be able to get such privileges easily - such that even the most highly-trusted insider cannot unilaterally bypass or overwrite them.

Obviously, all of these solutions will have costs. There's almost always a tradeoff between security and expending resources (usually described as "convenience", but the resources can also be time and/or money). Really good PoLP means nobody gets genuine root (god-level) access to anything, for example, which slows things down for people who can probably be trusted with that much access (you can't ever truly know). Failsafe code is harder to write than code that just trusts whatever commands its fed from a "trustworthy" source, even if that command is HCF. Paranoia has its price... but that price may well be less than what it costs if you lose 90% of your network connectivity.

CBHacking
  • 40,303
  • 3
  • 74
  • 98
0

One single employee should never have the power to perform such widespread damage. Administrative actions like that should always require authentication by two or more separate administrators, where the authentication system can only be disabled by the top-level administrators.

In the event that it has already happened, the employer would have every right to fire the employee.

Micheal Johnson
  • 1,746
  • 1
  • 10
  • 14
0

In every company what I've seen, there were very strict internal security rules as well. We couldn't see anything from others projects, only our actual tasks.

If we moved between projects/departments, our permissions were always precisely finetuned.

Only a narrow set of the sysadmins had access to large parts of the infrastructure, they all worked at least 5-10 years there. Probably nobody had access to the whole network.

Connecting anything to the company network without explicit permission (it was hopeless even to ask for that) was always strictly forbidden. Once I attached my smartphone to the usb port (only to load it), in 5 minutes I was asked by my coworker what I am doing.

Our computers/laptops were given to us pre-installed with the company certs. After leaving the company, we gave them back.

There were regular security checks, these were done by an external company (-> nobody knew, what and how they do). Also our software what we produced, were examined by them.

What we've done on the company network, were probably logged and backed up until the ethernity.

If we had done any doubtful, we hadn't know, how will be logs stored and examined. After leaving the company, our computers were also security checked by the external company, and probably backed up on the sector-level, for the case of a later appearing "problem", to serve as an evidence.

peterh
  • 2,938
  • 6
  • 25
  • 31