SSH access gateway for many servers

Question

Managing multiple servers, in excess of 90 currently with 3 devops via Ansible. All is working great, however there is a giant security problem right now. Each devop is using their own local ssh key to gain access directly to the servers. Each devop uses a laptop, and each laptop potentially could be be compromised thus opening the entire network of prod servers up to an attack.

I am looking for a solution to centrally manage access, and thus block access for any given key. Not dissimilar to how keys are added to bitbucket or github.

Off the top of my head I would assume the solution would be a tunnel from one machine, the gateway, to the desired prod server... while passing the gateway the request would pick up a new key and use to gain access to the prod server. The result would be we can quickly and efficiently kill access for any devop within seconds by just denying access to the gateway.

Is this good logic? Has anyone seen a solution out there already to thwart this problem?

I've been trying out Kryptonite for SSH key management and 2FA lately, and it's been working pretty well for me. their pro/enterprise package seems to give even more control and also auditing of logins.. — Alex, Mar 18 '18 at 20:41

Sven · Accepted Answer · 2018-03-18T17:15:10.307

24

That's too complicated (checking if a key has access to a specific prod server). Use the gateway server as jump host that accepts every valid key (but can easily remove access for a specific key which removes access to all servers in turn) and then add only the allowed keys to each respective server. After that, make sure you can reach the SSH port of every server only via the jump host.

This is the standard approach.

edited Mar 18 '18 at 17:15

answered Mar 18 '18 at 11:36

Sven

97,248
13
177
225

2

Even better: do what @Sven says but also add 2FA at the jump host. Because you re only connecting directly from the laptop when you need to manually, right? Anything automated is running from a server inside the jump host? – Adam Mar 18 '18 at 20:26
1

If you have a local certificate authority (subordinate or isolated), you can use those certificates with SSH, allowing you to centrally invalidate a believed compromised certificate. – Randall Mar 19 '18 at 09:27

score 11 · Answer 2 · answered Mar 18 '18 at 14:37

11

Engineers should not be running ansible directly from their laptop, unless this is a dev/test environment.

Instead, have a central server that pulls the runbooks from git. This allows for additional controls (four eyes, code review).

Combine this with a bastion or jump-host to restrict access further.

answered Mar 18 '18 at 14:37

Henk Langeveld

1,294
10
25

1

Indeed, this is the problem that AWX (or its commercial version Tower) solves. – Michael Hampton Mar 18 '18 at 18:37

score 2 · Answer 3 · answered Apr 06 '22 at 12:44

2

Check out open source CLD software, it solve that problem: https://github.com/classicdevops/cld

Your engineers will able access any server according access matrix, also it provide 2FA by IP address as option.

answered Apr 06 '22 at 12:44

Aleksandr Chendev

446
9

When you recommend your product you [should disclose your affiliation](https://serverfault.com/help/promotion). You should also at least describe how this solves the problem. – Gerald Schneider Apr 06 '22 at 12:48
Thanks i'll research it and edit post with details – Aleksandr Chendev Apr 06 '22 at 12:51

score 2 · Answer 4 · answered Apr 07 '18 at 02:01

Netflix implemented your setup and released some free software to help that situation.

See this video https://www.oreilly.com/learning/how-netflix-gives-all-its-engineers-ssh-access or this presentation at https://speakerdeck.com/rlewis/how-netflix-gives-all-its-engineers-ssh-access-to-instances-running-in-production with the core point:

We’ll review our SSH bastion architecture, which at its core uses SSO to authenticate engineers, and then issues per user credentials with short lived certificates for SSH authentication of the bastion to an instance. These short lived credentials reduce the risk associated them being lost. We’ll cover how this approach allows us to audit and automatically alert after the fact, instead of slowing down engineers before granting access.

Their software is available here: https://github.com/Netflix/bless

Some interesting take aways even if you do not implement their whole solution:

they use SSH certificates instead of just keys; you can put far more meta-data in the certificate, hence enabling a lot of constraints per requirements and also allowing simpler audits
using very short term (like 5 minutes) certificates validity (the SSH sessions stay open even after the certificate expires)
using 2FA to also make scripting difficult and force developers to find other solutions
a specific submodule, outside of their infrastructure and properly secured through the security mechanisms offered by the cloud where it runs, handles generating certificates dynamically so that each developer can access any host

score 1 · Answer 5 · answered Jul 12 '19 at 13:07

OneIdentity (ex-Balabit) SPS is the exact thing you need in this scenario. With this appliance you can manage the user identities on basically any machines, track user behavior, monitor and alert, and index whatever the users doing for later reviews.

score 0 · Answer 6 · edited Jan 28 '20 at 14:54

My suggestion is to disallow SSH access from user machines.

Instead you should

Host playbooks in Git.
Turn the "Access server" into a Jenkins server.
Grant only needed Jenkins access to devops users.
Execute Ansible plays on the Jenkins over build jobs via HTTP.
As an additional security measure , disable Jenkins CLI if needed.

The sample execution model,

Jenkins Ansible plugin: https://wiki.jenkins.io/display/JENKINS/Ansible+Plugin

OR

Classic shell -execute type of job. Add your build steps manually, including git checkout.

If you are limited with server resources, the same Jenkins server can host Git (scm-manager) as well, although there is an additional security risk if one of the developer machine is infected. You may be able to mitigate this by disconnecting the Jenkins server from internet, and resolve Ansible dependencies locally.

SSH access gateway for many servers

6 Answers6

Linked