2

I need to setup a solution similar to GitHub, where users can SSH to their git repository.

This should scale for hundreds of thousands of users, so my idea is to use a distributed filesystem for the data (so every node can access the entire data) and a replicated database to control the users (so again - every node can always access the entire list of users).

Using a normal authorized_keys file is impossible since users are not bound to a specific node, so I was looking for a way to read the list from a database (https://serverfault.com/a/443230/125948).

The problem with the AuthorizedKeysCommand command is that it only passes the username (which is my case - will be git for all users), so basically I would have to do a SELECT pub_key FROM user and always return the ENTIRE list for every connection.

This obviously is not the proper solution, so I was looking for another way to authenticate. Basically my question is: how on earth GitHub are doing that?

Gilad Novik
  • 307
  • 2
  • 3
  • 10
  • Since you're already setting up a distributed file system, what leads you to the conclusion that _"using normal authorized_keys file is impossible"_? – HBruijn Jan 22 '14 at 07:17
  • Assuming there are a lot of users (let's say - millions) - it seems really slow to compare against each pub key when a user is trying to login, not to mention - to edit this file every time (remove keys for example). Am I wrong with this assumption? In addition, I don't want to hold 2 sets of users (db and keys) and try to sync them - I prefer to have it all in the db. – Gilad Novik Jan 23 '14 at 04:58

2 Answers2

3

Ok, found the answer: https://github.com/blog/530-how-we-made-github-fast

They actually patched OpenSSH to perform a lookup against MySQL server: https://github.com/norbauer/openssh-for-git

Gilad Novik
  • 307
  • 2
  • 3
  • 10
-2

I think they have a tcp proxy sniffing the packet called proxymachine

mestachs
  • 97
  • 1