4

Does hashing speed vary significantly using different architectures (x86 vs ARM)?

Scenario

I am investigating the possibility of using a cluster of Raspberry Pi for cracking passwords hashed with bcrypt or scrypt.

As the Raspberry Pi use a CPU chip based on the ARM architecture, I am wondering if that affects matters much.

2 Answers2

12

ARM processors are a 32-bit architecture, and are relatively inefficient at doing 64-bit arithmetic operations: when they must compute a 64-bit addition, they must do so with two 32-bit additions and some carry propagation. This is somehow equivalent to x86 used in 32-bit mode (except that recent enough x86 have access to SSE2 opcodes, which offer computations on 64-bit integers, even in 32-bit mode). All recent x86 can run in 64-bit mode (if the operating system allows it), which makes them much faster at computing hash functions which rely on 64-bit arithmetic operations, the prime example being SHA-512. On a 32-bit hash function like SHA-256, ARM and x86 cores will be "comparable".

An additional twist of ARM is that it has several instruction sets; the original ARM used 32-bit opcodes, while newer versions allow for the "Thumb" instruction set which has smaller, but less powerful opcodes. "Thumb" has been then extended into "Thumb-2", with a mixture of 16-bit and 32-bit opcodes. The newest ARM cores know only Thumb-2. "Thumb-2" code is supposed to be almost as efficient as the original ARM code, while being noticeably more compact. But a lot can hide in a word such as "almost".

For bcrypt, what matters is the availability of about 4 kB of "fast RAM". Almost all ARM and x86 cores you will encounter have L1 caches which are larger than that (to get a cache-less ARM, you have to look at 33 MHz ARM7TDMI or similar systems), so, then again, ARM and x86 will be comparable.

For scrypt, the situation is mixed. Scrypt uses an internal hash function to compute seemingly random accesses in a large RAM buffer, so this is a race between the time to compute the hash function on a small input, and the latency of the main RAM. Main RAM latency does not scale (and a "faster" bus means more bandwidth, not less latency). For a small ARM-based system like the Raspberry PI, the hash function will be the bottleneck, especially if it is SHA-512. For bigger multicore x86, the memory latency will make the whole thing slow. The x86 will not give speedups proportionate with its larger price; especially if using CPU with 6 or 8 cores: there will be too much contention on the shared memory bus.


That being said, the question is about economics: what architecture choice will yield the most hashes per second, for a given budget ? This includes the cost of hardware, but also the energy, which, in the long term, dominates (you pay the hardware once, but you must feed it with electricity all year long). Energy costs must include cooling systems, which are not negligible in big computing farms. ARM processors are reputed to use less energy (it has long been their selling point, and that's why smartphones use ARM processors nowadays).

So the ARM architecture should be, on average and for large clusters, a better choice for password cracking, except when dealing with a CPU-bound hashing process (like PBKDF2) which relies on a 64-bit hash function (e.g. SHA-512), in which case an x86 CPU (in 64-bit mode) will be a better deal. This will change with the advent of 64-bit ARM processors.

For an amateur who does not build a cluster beyond what fits in his room, and must use off-the-shelf hardware, these issues will be dwarfed by those of availability and market effects. You do not buy a lone x86 CPU; you need a motherboard and other things around it; ultimately, you buy a PC. The cost of the CPU will be a small proportion of the total hardware cost. Similarly, you do not buy an ARM, but a Raspberry PI. Therefore, to get a correct estimate, you must perform benchmarks. For raw hash functions, I suggest using sphlib, a library of implementations of various hash functions (including SHA-256 and SHA-512), written in C; it comes with a benchmarking tool. For bcrypt and scrypt, peruse the reference implementations.

(As a gut feeling, I expect a draw. The Raspberry PI cluster will, of course, totally win against the x86 cluster if you take into account the "fun factor".)

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
0

I believe that as long as the application is written and optimized for the correct architecture that there will be no difference in speeds (in equivalent processors). You'll also want to look at multi-threaded CPUs vs Single Cores. I'm not sure why you would want to use RasPi's to crack passwords when GPU power is much more efficient however.

Hope that helps.

NULLZ
  • 11,426
  • 17
  • 77
  • 111
  • @D3C4FF Actually, GPU power is not more efficient when you are talking about bcrypt. bcrypt uses RAM-based lookup tables, which is slower on a GPU than a CPU. This is one of the main differences between bcrypt and PBDKF2. http://stackoverflow.com/questions/6791126/how-is-bcrypt-more-future-proof-than-increasing-the-number-of-sha-iterations – JZeolla Jan 30 '13 at 14:21