34

I'm researching for a small talk about websecurity and I found one article about the formspring hack, which made me curious. They claim to have used SHA-256 + salt

We were able to immediately fix the hole and upgraded our hashing mechanisms from sha-256 with random salts to bcrypt to fortify security (July 10th, 2012)

… nevertheless the attackers claimed they have found ~200k passwords within the 400k published datasets (They have 11m users, it's IMHO likely that most of them have been copied).

About half of the 400,000 hashes have already been reconstructed by password crackers. (July 11th, 2012)

This looks suspicious to me. I know that without using PKCS#5 or similar techniques, the security of SHA-256 is limited, but how much calculation power would they need to find so many passwords so quick?

My guess is that formspring was lying about the hashing. Has someone some insights on that?

Jeff Atwood
  • 4,542
  • 6
  • 25
  • 29
  • 3
    summary: There is likely a difference between what the crackers *posted* and what they *know*. I agree with Remus in the comments to my answer. If the crackers have the hashes, they almost by definition have the salts. Assuming Formspring did a typical `salt + password` technique, they probably don't need access to the source code to figure out how the hashes were applied. A bit of trial and error alone could work to guess how the salts were applied. In conclusion **it is very likely that the crackers know more than they posted.** – Jeff Atwood Jul 30 '12 at 16:22
  • The reality of it among crackers with good lists, is ~45-50% of most lists get broken in the first pass because people use stupid passwords. – Ori Aug 03 '12 at 03:18

7 Answers7

36

None of the existing answers cover the critical part of this question to my satisfaction: what about the salts?

If just the password hash values were posted, other crackers can't possibly know:

  1. The actual per-password (supposedly random, per the source) salt value.

  2. How the salt is mixed with the password in the code.

All they have is the final, resulting hash!

There are lots of ways the password and salt can be combined to form the hash:

sha256(pass + salt)
sha256(salt + pass)
sha256(sha256(pass) + sha256(salt))
sha256(pass + sha256(salt))
sha256(sha256(...(salt + pass + salt)...))

But if the salt is the recommended 8 characters of pure randomness …

sha256("monkey1" + "w&yu2P2@")
sha256("w&yu2P2@" + "monkey1")

… this means a "typical" 7 or 8 character password becomes extremely difficult to brute force, because it is effectively 15 or more characters! Furthermore, to crack password hashes that you know are salted, unless you also have the salt, you have no other choice except brute force!

Based on research I did using GPU cracking, I achieved 8213.6 M c/s using two high end ATI cards brute force cracking MD5 password hashes. In my benchmarking this meant I could try:

all 6 character password MD5s   47 seconds
all 7 character password MD5s   1 hour, 14 minutes
all 8 character password MD5s   ~465 days
all 9 character password MD5s   fuggedaboudit

Note that SHA-256 is 13% of the speed of MD5 on GPUs using Hashcat. So multiply these numbers by about 8 to see how much longer that would take.

If the salts were not known, that means you're essentially brute forcing the equivalent of 12+ character passwords. That is far beyond the realm of any known computational power.

Now if you want to argue that …

  1. The original crackers also obtained the salts, but chose not to post them.

  2. The original crackers also have the source code (or it's open source) so they know how the passwords are salted, but chose not to post that information.

  3. Formspring is lying and their passwords were not salted or salted improperly such that the salts had no effect.

… then yes, cracking 200k of 400k password hashes in a few days is easily possible.

Jeff Atwood
  • 4,542
  • 6
  • 25
  • 29
  • Thanks, so my line of thinking wasn't so bad after all :-) – Patrick Cornelissen Jul 30 '12 at 08:27
  • 14
    Salts *are* public. They are usually stored along with the hashed value, in clear. And the algorithm how the salt is mixed in is also public, otherwise is just security-through-obscurity. – Remus Rusanu Jul 30 '12 at 08:47
  • @remus were the salts also published along with the hashes? Are salts typically published by crackers when they solicit help for cracking? What about the salting method -- prefix, suffix, double, or ...? The original source forum topic has been removed: http://www.h-online.com/security/news/item/Lost-Found-Revealing-tweets-a-phrack-eBook-and-self-blockades-1634124.html – Jeff Atwood Jul 30 '12 at 08:50
  • 9
    @Jeff: Speculation on my part, but if they obtained the hashes then is hard for me to see how could they *not* obtain the salt as well. They must be stored associated (eg. 2 columns in a table). Whichever means they had to access the hash, the same means *must* have give them access to the salt as well. If they gain control to a compromised server (as article suggests), simply by the fact that the server *must* be able to validate the password it follows that attacker had access to both the hash and salt, as well as the salting algorithm (a simple mater of reverse engineering). – Remus Rusanu Jul 30 '12 at 08:57
  • 1
    @remus that is true, but since the actual method of combining the salt and password (prefix? suffix? something more unusual?) is only in the *source code*, that means you'd need more than just access to the database table storing the password hash and salt -- you'd need to be able to view the source code as well. Or if it is open source, but I'm pretty sure Formspring is not open source. – Jeff Atwood Jul 30 '12 at 09:00
  • @Jeff: as for the salts being published, I'm not in the cracker community and I don't know how their protocol operates. I expect that there are private channels in which more is communicated that what makes it to public forums. – Remus Rusanu Jul 30 '12 at 09:00
  • @Jeff: if you compromised a server you have access to the source most times. Many pages are served by languages that deploy source on the front end servers (PHP, Ruby). Even if is a compiled source deployment (IL, native) reverse engineering the protocol is a game for hard core hackers. I saw same at work, and is pretty amazing how they read assembly like is a novel. And besides, there are tools to help. – Remus Rusanu Jul 30 '12 at 09:03
  • @remus I do not disagree, but if the crackers *only* published the password hashes, and not the necessary salts + combination information -- remember the LinkedIn breach was just password hashes, but they were *unsalted* -- then there is no way any fellow crackers could help them crack the hashes! Posting hashes alone means only Formspring could validate that yes, these hashes were real, with no other necessary information (the salt + combination method) presented. Basically it is like posting a "scalp" for the world to see. – Jeff Atwood Jul 30 '12 at 09:06
  • 1
    fyi (just because it popped up on my watchlist): http://www.youtu.be/0WPny7wk960 (cracking 400k pws and salting etc) – akira Jul 30 '12 at 09:15
  • 2
    OK, I see ur point. Give a DB of actual hashes and a DB of commonly used passwords, w/o the salt one cannot match the passwords to users (other than random collision probability, which is basically nothing). My point is that if the *published* the hash, they probably also *have* the salts(s). – Remus Rusanu Jul 30 '12 at 09:17
  • 3
    If they are a formspring account [it should be possible to find out the pasword+hash combination](http://security.stackexchange.com/a/17827/2304) by "brute force", right? – João Portela Jul 30 '12 at 10:43
  • This might be somewhat of a shameless plug, but I also think this article could be useful to any readers that want to understand the mechanics of password hacking. http://www.wegnerdesign.com/blog/passwords-part-1-how-they-get-hacked/ – jwegner Jul 30 '12 at 13:00
  • @akira I had remove `www` from your link to make it work. [This](http://youtu.be/0WPny7wk960) worked for me. – kasperd Jan 16 '16 at 14:43
17

Edit: All of the below assumes that the salts are known, because that's the industry-standard use of the word salt (3rd line).

Just as an example of how this often looks in the database, have a look at this SHA-256 Unix Crypt output:

$5$rounds=80000$wnsT7Yr92oJoP28r$cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5

.. where wnsT7Yr92oJoP28r is the salt in plaintext. The Python Passlib documentation has good descriptions of many common hashed password formats, and most of these store plaintext salts concatenated together with the hashed password representation.

The 'un-known to the attacker salt' which @Jeff Atwood discusses is frequently called a "pepper". See Password Hashing add salt + pepper or is salt enough?


In 2009 Daniel J. Bernstein wrote a highly optimized SHA-1 implementation for NVIDIA's CUDA framework. Back then they managed 690 million SHA-1 hashes per second on a single server using four graphics cards.

Brute force attacks against hashed passwords are an "embarrassingly parallel" workload. In 2010 Thomas Roth demonstrated brute force attacking ~10 hashes for ~2$ using Amazon EC2. This could easily be scaled up, if the rate isn't high enough, then just add more EC2 instances.

So essentially it's not a question of how long it will take to bruteforce the passwords. It's more a question of how much computational power the attacker is willing or able to pay for in order to get results faster.

the attackers claimed they have found ~200.000 passwords

One reasonable explanation is that the attackers didn't target specific hashed representations, they just went for the "low hanging fruit". In other words, they only looked for passwords that are 'easy to guess'. Perhaps something like:

  • A candidate list of the most common passwords in plaintext, built from publicized lists of actual user passwords. (Over the years there have been several leaks of real user passwords.)
  • And perhaps they additionally tried all pronounceable lowercase passwords up to a low upper bound, for example up to 6 characters. (In the 2006 MySpace data set above it was found that 17% of all end user passwords were of 6 or fewer characters.)

My guess is that formspring was lying about the hashing.

If I understand your links correctly, then the hashes were discovered on a forum on the 7th of July. But they could have been stolen from Formspring weeks or months before that?

Given what I describe above, it doesn't seem unreasonable to me that a single round of SHA256 with a unique salt per password was used, as Formspring said.

It all depends on how much time and effort the attackers put into the brute force cracking. But within the resources that a small group of individual hackers might have, it does seem plausible to discover 200.000 weak passwords from a data set of 11 million simple SHA-256 hashed representations.

  • Interesting, thanks. Let's me rethink some of my prior assumptions! – Patrick Cornelissen Jul 29 '12 at 19:45
  • 1
    In other words: **you should be using a passphrase rather than a password**. You don't want to be the low-hanging fruit. – Joel Coehoorn Jul 29 '12 at 23:51
  • what about the salt? –  Jul 30 '12 at 03:55
  • 1
    none of this covers the critical part of the question: *what about the salts?* Given that just the hashes were exposed, not the supposedly random salts they used, how does what you wrote cover that? Without knowledge of the salts, I don't see how you could even test for "common" passwords. – Jeff Atwood Jul 30 '12 at 08:02
  • 1
    Well, the relevant sites are down, so I wasn't able to find the original hashes. :-( – Patrick Cornelissen Jul 30 '12 at 08:24
  • Your cracking speed estimates are way off. Modern GPUs can compute [3,800 million SHA-1 hashes per second](http://hashcat.net/oclhashcat-lite/). There's a thread on the barswf forums where people show off their cracking rigs, I believe the reigning champion is ~45,000,000,000 MD5 hashes per second. – Polynomial Jul 30 '12 at 08:53
  • it doesn't matter if the term salt means public, the salt *was not published with the hashes*. Compare the LinkedIn breach, where unsalted password hashes were posted. I agree that it is strange, because the crackers very likely know the salts. But not sharing them and posting *just* the password hashes alone means nobody can crack any of those posted password hashes; based on the math, no known computing technology is fast enough. So you have to ask "what is the purpose of publishing just the hashes of salted passwords?" Is it merely to show off and prove you did it? – Jeff Atwood Jul 30 '12 at 16:55
9

Cracking unsalted hashes is fairly trivial: you just look up hash in a pre-computed table and find your answer. No sense brute-forcing the rest because your rainbow tables already cover your brute-force dictionary.

But cracking salted passwords is still simple; slower but still simple. In most brute-force attacks, the attacker goes for breadth not depth. Rather than trying 1 billion passwords against 1 account, he tries tens or hundreds or thousands of common passwords against ALL the accounts. Certainly rainbow tables are faster, but salt alone doesn't impede the classic brute-force methodology. Start at the top of your dictionary and try that password against all of the accounts: this obviously involves recomputing the hash for each account, but on the other hand, your chances of a hit are pretty high (that's precisely what "common password" means, anyway). Next, take your next most common password, then the next, etc.

In the end, you may only get through several thousand passwords, but statistically speaking, there's probably hundreds of accounts with the password "123456" in your list, and a fairly large number who use "password1" and "111111" and "test".

200K out of 11M is believable; only about 2%. 200K out of 400K less so. Not impossible but not as likely. The law of diminishing returns kicks in there somewhere; I'm not sure how far in but it's probably a really simple ratio like 1/e or something. Beyond that I get skeptical.

tylerl
  • 82,225
  • 25
  • 148
  • 226
  • My guess is that they stole all or almost all Hashes and published only a fraction of them – Patrick Cornelissen Jul 30 '12 at 09:07
  • even with a tiny 4 character salt, assuming formspring has a reasonable minimum password length (let's say 7 characters) that puts the brute forcing at a *minimum* of 11 characters which is deeply into "not possible with any known computing technology" territory. I think we have to conclude that the original crackers must have the salts (as Remus pointed out in the comments to my answer, this is extremely likely), and for some reason just chose not to publish them. – Jeff Atwood Jul 30 '12 at 17:06
6

It stands within reason that they are not lying.

  1. As @Jeff noted that access to the salt is essential for fast cracking, but as @Remus Rusanu noted: "if they obtained the hashes then is hard for me to see how could they not obtain the salt as well" since they must be stored in a such a way that they can be associated with each other. I would assume that: at least the initial hackers had access to the salt.1

  2. @Jeff also states that they must have access to the source code to know how the salt and the password are combined. I propose a different approach to find out how the salt and password are combined:

    1. let's assume that they have a formspring account before the attack (a pretty reasonable assumption)
    2. look for that specific account (salt, hash) in the obtained list
    3. knowing the password, the salt and the resulting hash, it should be easy to build a small script to test all the reasonable (and some unreasonable) ways of combining password and salt (no more that a few thousand ways?)
    4. ???
    5. profit!
  3. they claim to have "tens of millions of members" which makes @Jesper low hanging fruit theory sound very reasonable.

Ok, they can still be lying but at least it sounds reasonable that they aren't.

1. If this assumption is wrong - the initial hackers didn't publish the salt AND didn't try and crack the passwords themselves - the rest may not be possible.

João Portela
  • 203
  • 2
  • 10
  • 1
    true; even if you don't have real-time access to the database as the accounts are being created, I *suspect* most salting schemes are in the top 5 forms supported by John the Ripper, so attempting a few of those should rapidly tell you whether you are getting any "hits" on simple passwords that statistically must exist in all password sets.. – Jeff Atwood Jul 30 '12 at 16:48
3

hashcat can do just over 1 billion SHA256 hashes per second on Radeon HD7970, which is only half the speed of SHA1 and a quarter of the speed of MD5. However, this isn't even the bottleneck:

One more thing before we move on; did you notice the speed of the cracking was “only” 258.7M per second? What happened to the theoretical throughput of a couple of billion SHA1 hashes per second? The problem is that the higher throughput can only be achieved with “mutations” of the password dictionary values or by working through different password possibilities directly within the GPU. In other words, if you let the GPU take a dictionary value then transform it into a number of different possibilities it can work a lot faster. The GPU can’t be fed passwords fast enough to achieve the same level of throughput so effectively having a list of passwords is bottleneck.

So using SHA256 (or even SHA512) is no more secure than using SHA1 or MD5 against these kind of attacks.

mgorven
  • 596
  • 5
  • 11
  • _"using SHA256 (or even SHA512) is no more secure than using SHA1 or MD5 against these kind of attacks"_ Except that memory and data transport gets faster all the time. Better not use MD5 regardless of this current limitation. Also passwords might be cracked later, in 3 years or so. They might be much less relevant by then, but I bet at least half of them would still work. – Luc May 31 '13 at 11:35
3

Neither post makes mention of whether or not the perpetrators gained access to Formspring source code. If this were the case, the method of hash construction and random generation of the salts would be available to the attackers. This would make the task of cracking significantly easier, particularly if the method of generating the "random" salts limited them to a comparatively small set values.

Given the apparent speed with which this limited set was cracked, my best guess would be that this is the case.

mitchellrj
  • 131
  • 1
  • We know it was published in public. If you were the cracker, wouldn't you post the per-hash salts as well to aid in collaborative cracking, if you had them? Plus info on how the salts and passwords were combined in the source code, if you know that too? I mean, what's the incentive *not* to share this additional info with fellow crackers if you know it, since the goal is to crack the passwords with everyone else's help? – Jeff Atwood Jul 30 '12 at 08:47
1

If they were using hashes with a unique salt per password, then it normally would take a considerable time before one could crack these passwords. Unless the passwords were extremely weak or the passwords were words in the dictionary. Pure bruteforcing seems a bit unlikely if they were strong.

Tom's hardware has an article on GPGPU password cracking. Weak passwords (not more than 10 characters including uppercase,lowercase, signs,...) can be cracked in a matter of minutes, even with a salt if the salt is known.

Otherwise, it will take many days, months, longer passwords even thousands of years. But this is without a salt. With a salt it will take forever.

Lucas Kauffman
  • 54,169
  • 17
  • 112
  • 196