1

When it comes to storing my data "in the cloud" (aka: on someone else's server), I alway have kind of a bad feeling that something like "Google's deleted an artist's blog, along with 14 years of his work" might happen to my data, too.

On the other hand, even big companies like Microsoft store lots of source code on GitHub.

My question:

What is the usual policy of companies storing their source code on external servers when it comes to minimizing risks of data loss?

E.g.

  • They could rely on GitHub making enough backups.
  • They could have a policy to always store data in local data centers before publishing to GitHub.
  • They could have special contracts with GitHub to get additional backups.
  • They could fetch the data through GitHub APIs and store it locally.

So actually I'm just trying to understand why/how Microsoft (or other companies) can publish code to their public GitHub repositories and which security strategies they are applying to protect themself from data loss.

Maybe my question is to some degree opinion based, on the other hand there could be a chance that someone from Microsoft (or other similar companies) reads here and can actually answer that question.

Update 1

I'm not asking on how Git(Hub) technically allows you to distribute your source code. I do hope that I understand most of these concepts.

I'm more asking how to convince management from a security point-of-view to allow their intellectual property (i.e. source code) being stored on external servers by an external company.

And since lots of companies, including big ones like Microsoft actually do use GitHub, I'm interested on how they deal with this.

Update 2

I've added some extra words to hopefully make it more clear that I'm mostly concerned about data loss.

Don't know if this is still security related, though.

Uwe Keim
  • 2,686
  • 2
  • 15
  • 25
  • 8
    Are you actually familiar with the way GIT works ? Its distributed nature means pretty much everyone working on the project has a full copy of the "central" repo. As backups goes, that should be enough for everyone. – Stephane Aug 04 '16 at 07:14
  • 1
    Thanks, @Stephane - I do think I understand most of this. I'm asking from a policy-kind-of-view. I.e. do the companies have any policy on how to deal with the fact that their source code is hosted externally? Once your data is (acidentially or not) gone from an external server, it is probably too late to hope that someone has created a local backup. – Uwe Keim Aug 04 '16 at 07:21
  • 1/ this isn't the question you actually asked. 2/ This is unrelated to security 3/ this is opinion-based: every company will deal with such a thing in their own terms. – Stephane Aug 04 '16 at 07:23
  • I've looked in http://security.stackexchange.com/help/on-topic and found "risk management" and "policy". Sorry if I misunderstood this. – Uwe Keim Aug 04 '16 at 07:26
  • 1
    Instead of using the free github, enterprise can subscribe PaaS (Platform as a service) from github or even host the their own github server. Whether PaaS or hosting own server, it is enterprise that control their github server. Go contatc thte provider or read this : https://enterprise.github.com/features#pricing – mootmoot Aug 04 '16 at 07:47
  • 1
    Securing digital intellectual property should include "availability" and "business continuity". – mootmoot Aug 04 '16 at 08:54
  • Microsoft probably does use git hub for their open source work. I'm not sure they store the source for their proprietary stuff on someone else's cloud since they have their own. – Neil Davis Aug 04 '16 at 10:27
  • Also github is used primarily as a platform for openness and sharing. I'm 100% sure that trying to sell management, on their security for keeping proprietary source a secret, is not a battle I'd pick to fight. Keeping trade secrets under lock and key is best done by keeping them under lock and key. I'm not saying I agree with that type of practice, but you don't leave the keys to your store at a stranger's house. You have NO idea what kind of people they have working on their infrastructure etc. – Neil Davis Aug 04 '16 at 10:49
  • @user356540 : does it make it clear that, you can indeed purchase and install github enterprise to your own server? – mootmoot Aug 04 '16 at 12:15
  • That's not "in the cloud" and isn't on external servers in someone else's hosting environment. GitHub enterprise living in your own network, run by your admins, and meeting your security and disaster recovery policies is a different situation than the OP is evangelizing, but is probably the solution he's really looking for and would be a -much- easier sale to management. – Neil Davis Aug 04 '16 at 12:31
  • Actually I'm just trying to understand why/how Microsoft can publish code to their _public_ GitHub repositories and which security strategies they are applying to protect themself from data loss. – Uwe Keim Aug 04 '16 at 12:54
  • Sorry, but along the lines that Stephane has pointed out git is a *terrible* example of this sort of thing. The only way that you could permanently lose data is if github loses it *and* has no backups *and* no one in your company has cloned the repo (i.e. no one is working on that part of the codebase and has for some reason *every single programmer* deleted it) *and* you have no backups of anyone's machine from a time they we're working on it *and* (if available publicly) no one forked/cloned it. I mean seriously, you would have to be *both* massively irresponsible *and* extremely unlucky. – Jared Smith Aug 04 '16 at 13:09
  • After reading your updates, I think you're confusing *data loss* with *data theft*. – Jared Smith Aug 04 '16 at 13:12
  • @JaredSmith I have the same feelings every time I'm using Trello (and I use it a lot). I have no fear that someone _steals_ my data. I have a lot of fear that someday Trello tells me "Oops, sorry, we accidentially nuked our data center including backups". Same for GitHub. And despite this, big players like Microsoft still seem to use GitHub. I cannot understand why. – Uwe Keim Aug 04 '16 at 13:32
  • Because *git doesn't work like that*. When you clone/fork a repo, you *clone* it. Entirely. All the branches, all the history, everything. WIth git there *is no central source of truth*. Plenty of people (including, apparently, you) still try to use it that way, but its an illusion of your own fabrication. Every programmer at your company has every piece of data. Everyone who works on that repo *is* a backup. If github went down you wouldn't even lose that days work, just have everyone pull from everyone else and bam, everyone's current. – Jared Smith Aug 04 '16 at 13:35
  • With git you have an independent local copy of *everything*. It doesn't live in the cloud. Github is just a convenient way to share stuff via git over the internet (as opposed to a LAN). It doesn't change the way git works. That's why unless every programmer at your company spontaneously deletes everything its a non-issue. – Jared Smith Aug 04 '16 at 13:42
  • That seriously cannot be a management strategy by Microsoft to say "We hope our developers checked out everything, that is enough of a backup", can it?!? – Uwe Keim Aug 04 '16 at 13:45

1 Answers1

1

Contracts

The artist was presumably using Google's free services. The terms and conditions of such services tend to boil down to "we own everything, you have no rights" (disclaimer: I don't know the exact terms for the artist).

When companies pay serious money for an enterprise cloud platform, they have a contract that is much more in their favour. There is still a risk the provider could mess up, but this particular scenario could not happen.

Some other approaches are multi-provider redundancy and offline backups. Multi-provider does seem a good solution - although I've not seen it done in practice. Offline backups are problematic because of the high bandwidth needed to perform a backup.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • 1
    Thanks a lot! So although Microsoft's public GitHub repositories _look_ like being similar to my very own public GitHub repositories, they are most likely completely different "under the hood" when it comes to data protection, backup and the like?!? – Uwe Keim Aug 04 '16 at 12:59