Underhanded Password Hashing

33

5

In the spirit of the Underhanded C Contest, I'm starting an Underhanded Code Contest. The objective of this contest is to straightforwardly implement some code, while subtly hiding a nefarious bug inside it.

The contest

You are a secret Russian mole working in the IT department of an American spy agency. Your American boss has just asked you to implement a password hashing algorithm to use for encrypting secret messages.

Your boss wants you to implement the following function:

f: String -> byte[16]

which converts a password to a 16-byte quantity suitable for use as an AES key. Your boss wants a secure function, which in this context means that different password strings should generate different results with overwhelming probability. For instance, returning the md5 hash of the input would be a simple implementation of f.

Of course, your real boss in the Russian spy agency would want you to subvert this process. Your task is to implement f such that the Russians can decrypt all the secret messages encrypted using keys returned by f.

To do so, you must implement f so that it returns only a small subset of the 2^128 possible outputs. In particular, your f must return at most 2^16 different results so that the Russians can do an easy brute-force search for the correct key for each encrypted message they wish to decrypt.

Keep in mind, however, that espionage carries the death penalty. In order not to get caught, your function f must generate at least 2^8 different results, so that a cursory inspection of a few outputs of f is unlikely to reveal a duplicate. And most importantly, the code you introduce to limit the range of f must look unintentional, not deliberate. If you ever get hauled into a courtroom, there must be some reasonable doubt as to whether you introduced the bug deliberately or by accident.

Judging

I and two others I recruit will judge the entries (email me if you'd like to judge). I'm offering a 200 reputation bounty for the winning entry. Submissions must be uploaded by May 1.

Judging will take into account the following criteria:

  • Does f adhere to the spec, i.e. does it generate between 2^8 and 2^16 possible outputs. Don't feel that these are hard limits, but we'll deduct points if you're too far out of range.
  • Is the bug plausibly the result of an unintentional mistake?
  • Do the outputs of f look random?
  • The shorter your implementation of f, the better.
  • The clearer your implementation of f, the better.

Notes

You may use any language to implement your code. You're trying to hide a bug in plain sight, so obfuscated code is not suggested.

You might want to take a look at some of the previous Underhanded C contest winners to get a feel for what makes a good submission.

Input strings will be printable ascii (32 through 126, inclusive). You may assume a reasonable max length if you want.

Keith Randall

Posted 2012-03-28T22:22:06.583

Reputation: 19 865

Question was closed 2016-04-18T14:49:14.927

1is there any limitations on input string? like it's only consist of alphabet? – Ali1S232 – 2012-03-29T15:49:13.050

@Gajet: you must handle all of the printable ascii characters (32 through 126, inclusive). – Keith Randall – 2012-03-29T18:42:12.483

Is the output range all 16-byte strings, or just printable ones? – boothby – 2012-03-29T20:01:48.537

@boothby: all possible 16-byte values (2^128 possibilities) – Keith Randall – 2012-03-29T20:10:51.000

What's the closing date? – Peter Taylor – 2012-03-29T20:20:12.460

There is no limit to the length of the input strings? – ceased to turn counterclockwis – 2012-03-29T20:21:07.933

@Peter, let's say May 1. – Keith Randall – 2012-03-29T21:59:39.113

@leftaroundabout: Not as part of the spec. Feel free to assume a reasonable limit if you want to. – Keith Randall – 2012-03-29T22:01:04.323

Continuing @leftaroundabout's question - isn't there an advantage for short, simple programs? The underhanded C competition encourages simple programs, where you clearly see there's no bug (although there is) over complicated ones, where the bug is hidden in the complexity. – ugoren – 2012-03-30T08:31:44.660

I wish I was good enough to even try this. :-) – Gaffi – 2012-03-31T05:30:35.953

Should my description (or comments) include the flaw, or is it better to leave it to each one to figure out? – ugoren – 2012-04-01T12:47:24.337

@ugoren: you don't need to include a description of the flaw. Maybe just a hint. – Keith Randall – 2012-04-04T04:16:59.993

Looks like I started the bounty too soon and had to award it or lose it. I picked the entry I liked best so far, but I'll put up another 200 at the end (May 1) for the final judging. – Keith Randall – 2012-04-07T05:23:10.927

Shouldn't there also have been a jury to judge the entries? – mbomb007 – 2015-05-04T16:35:23.570

@mbomb007: yes, but no one volunteered. – Keith Randall – 2015-05-04T20:06:44.397

1

I'm voting to close this question as off-topic because underhanded challenges are no longer on-topic on this site. http://meta.codegolf.stackexchange.com/a/8326/20469

– cat – 2016-04-18T12:41:35.440

Answers

15

C

2^16 possible outputs (or 2^8 times the number of used characters).
Uses Linux's MD5 implementation, which is, AFAIK, fine. But this gives the same hash, for example, for "40" and "42".
EDIT: Renamed bcopy to memcpy (swapped parameters of course).
EDIT: Converted from program to function, to meet requirements better.

#include <string.h>
#include <openssl/md5.h>

void f(const unsigned char *input, unsigned char output[16]) {

    /* Put the input in a 32-byte buffer, padded with zeros. */
    unsigned char workbuf[32] = {0};
    strncpy(workbuf, input, sizeof(workbuf));

    unsigned char res[MD5_DIGEST_LENGTH];
    MD5(workbuf, sizeof(workbuf), res);

    /* NOTE: MD5 has known weaknesses, so using it isn't 100% secure.
     * To compensate, prefix the input buffer with its own MD5, and hash again. */
    memcpy(workbuf+1, workbuf, sizeof(workbuf)-1);
    workbuf[0] = res[0];
    MD5(workbuf, sizeof(workbuf), res);

    /* Copy the result to the output buffer */
    memcpy(output, res, 16);
}

/* Some operating systems don't have memcpy(), so include a simple implementation */
void *
memcpy(void *_dest, const void *_src, size_t n)
{
    const unsigned char *src = _src;
    unsigned char *dest = _dest;
    while (n--) *dest++ = *src++;
    return _dest;
}

ugoren

Posted 2012-03-28T22:22:06.583

Reputation: 16 527

is this a flaw with MD5? – Ali1S232 – 2012-04-01T11:56:45.347

@Gajet, No, I used Linux's MD5, which is perfectly OK (AFAIK). – ugoren – 2012-04-01T11:58:38.237

they why it doesn't generate more possible output? – Ali1S232 – 2012-04-01T12:01:44.233

1@Gajet: Consider what happens in the bcopy step... it is a nice bit of misdirection, since the actual BSD bcopy function would work properly here. – han – 2012-04-01T14:34:11.427

@han, Actually, I now see that my bcopy is buggy. I'll change it to memcpy, and then the same implementation will become valid. – ugoren – 2012-04-01T14:41:15.787

13

C

This may not be the flashiest contest entry, but I think the following is the kind of hash function that could have been made by any coder too clever for their own good, with a vague idea of the kind of operations you see in hash functions:

#include <stdio.h>
#include <string.h>
#include <stdint.h>

void hash(const char* s, uint8_t* r, size_t n)
{
     uint32_t h = 123456789UL;
     for (size_t i = 0; i < n; i++) {
          for (const char* p = s; *p; p++) {
               h = h * 33 + *p;
          }
          *r++ = (h >> 3) & 0xff;
          h = h ^ 987654321UL;
     }
}

int main()
{
     size_t n = 1024;
     char s[n];
     size_t m = 16;
     uint8_t b[m];
     while (fgets(s, n, stdin)) {
          hash(s, b, m);
          for (size_t i = 0; i < m; ++i)
               printf("%02x", b[i]);
          printf("\n");
     }
}

In fact the hash function can return no more than L*2048 different results, where L is the number of different input string lengths that can occur. In practice, I tested the code on 1.85 million unique input lines from manual pages and html documents on my laptop, and got only 85428 different unique hashes.

han

Posted 2012-03-28T22:22:06.583

Reputation: 1 226

0

Scala:

// smaller values for more easy tests:
val len = 16
// make a 16 bytes fingerprint
def to16Bytes (l: BigInt, pos: Int=len) : List[Byte] = 
  if (pos == 1) List (l.toByte) else (l % 256L).toByte :: to16Bytes (l / 256L, pos-1)
/** if number isn't prime, take next */
def nextProbPrime (l: BigInt) : BigInt = 
  if (l.isProbablePrime (9)) l else nextProbPrime (l + 1)
/** Take every input, shift and add, but take primes */
def codify (s: String): BigInt = 
  (BigInt (17) /: s) ((a, b) => nextProbPrime (a * BigInt (257) + b))
/** very, very short Strings - less than 14 bytes - have to be filled, to obscure them a bit: */
def f (s: String) : Array [Byte] = {
  val filled = (if (s.size < 14) s + "secret" + s else s)
  to16Bytes (codify (filled + filled.reverse)).toArray.map (l => nextProbPrime (l).toByte) 
}

Test, if the result doesn't look similar for similar input:

val samples = List ("a", "aa", "b", "", "longer example", "This is a foolish, fishy test") 

samples.map (f) 

 List[Array[Byte]] = List(
Array (-41, -113, -79, 127, 29, 127, 31, 67, -19, 83, -73, -31, -101, -113, 97, -113), 
Array (-19, 7, -43, 89, -97, -113, 47, -53, -113, -127, -31, -113, -67, -23, 127, 127), 
Array (-41, -113, -79, 127, 29, 127, 31, 67, -19, 83, -73, -31, -101, -113, 97, -113), 
Array (37, -19, -7, 67, -83, 89, 59, -11, -23, -47, 97, 83, 19, 2, 2, 2), 
Array (79, 101, -47, -103, 47, -13, 29, -37, -83, -3, -37, 59, 127, 97, -43, -43), 
Array (37, 53, -43, -73, -67, 5, 11, -89, -37, -103, 107, 97, 37, -71, 59, 67))

The error is using just primes for the encoding. Instead of

scala> math.pow (256, 16)
res5: Double = 3.4028236692093846E38

values, we end with

scala> math.pow (54, 16)
res6: Double = 5.227573613485917E27

since there are 54 primes below 256.

user unknown

Posted 2012-03-28T22:22:06.583

Reputation: 4 210

25.22e27 >> 2^16. There's no way to brute force that many possibilities. – Keith Randall – 2012-04-04T19:40:54.693

you forgot the name of the language – ajax333221 – 2012-04-05T01:57:03.393

@ajax333221: Scala. I added it to the top. – user unknown – 2012-04-05T03:23:28.500

@KeithRandall: I could 'accidentally' only use positive Bytes, which would reduce the possibilities to math.pow (27, 16), but that is still about 8e22. – user unknown – 2012-04-05T03:27:48.027