I recently discovered a case where a colleague had accidentally committed their login credentials (host, username, and password) to a local source code repository, and then pushed these changes to a public repository on GitHub. Of course, this was not an isolated incident – a few years back, GitHub killed its full-code search feature after people discovered hundreds of private keys and other credentials in public repositories.
I'd like to make sure that this sort of thing hasn't happened in the past with any of our other public-facing repositories (and, in case it has, to scrub the private data, change the exposed passwords, revoke the exposed keys, etc.). It's no problem for me to cobble together a shell script to pull past commits to a given Git or Subversion repository so that I can scan them for private data. But what sort of filename and text patterns should I use? For example, I want to catch files whose name suggests that they contain private keys or credentials (password.txt
, id_dsa
, id_rsa
, secring.gpg
, .netrc
, and probably several more standard ones that I'm forgetting or am not even aware of). Is there a list somewhere covering the most common cases? Similarly, I'd like to scan the contents of text and source files for patterns that indicate hard-coded login credentials. Perhaps someone has already produced a list of regular expressions to start from?