2nd byte bias of RC4

7

4

Short version

RC4, designed in 1987, is one of the most famous stream ciphers. This question asks you to practically demonstrate its 2nd byte bias (for a theoretical proof, see section 3.2 of this paper).

Long version

OK, here I try to elaborate it for non-cryptographers. As you have noted, RC4 is a very popular cipher in the cryptographic domain, mainly because of its simplicity. It consists of a register (i.e., array) of length 256, each element of this register can hold up to 8 bits.

Here's a quick overview of how stream ciphers commonly work: a secret key is taken, a Key-scheduling algorithm (KSA) is performed (which takes the key as input), followed by a Pseudo-random generation algorithm (PRGA) (which outputs a sufficiently long random-looking key-stream). To encrypt (resp. decrypt), one has to bit-wise XOR this key-stream with the plaintext (resp. ciphertext).

For RC4, the key is of length 1 to 256. The KSA and PRGA are performed on the 256 byte register S as follows:

KSA

for i from 0 to 255
    S[i] := i
endfor
j := 0
for i from 0 to 255
    j := (j + S[i] + key[i mod keylength]) mod 256
    swap values of S[i] and S[j]
endfor

PRGA

i := 0, j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    K := S[(S[i] + S[j]) mod 256]
    output K
endwhile

These K's constitute the key-stream. After getting first 256 bytes of the key-stream, you need to count the frequency of each number from 0 to 255 in the key-stream, occuring in each position for a given key.

The interesting thing is that, even if the keys are taken at random, you will notice the key-stream is far from random - "the frequency of the second byte is zero" is nearly double of what it actually should be (if it were truly random). This is known as distinguishing attack.

Your task is to:

  1. Take as much as you can random key of 16 bytes
  2. Generate the key-stream of first 2 bytes for each key
  3. Count the number of times zero appears at the 2nd byte of each the key-streams

You program should be capable of taking at least 2^19 - 2^20 keys, and make the code run as fast as you can.

Here's a quick try of me in Python:

def ksa(S):
    key = range(256)
    j = 0
    for i in range(256):
        j = (j + key [i] + S [i % len(S)]) % 256
        key [i], key [j] = key [j], key [i]
    return key

def prga (key):
    i = 0; j = 0
    while True:
        i = (i + 1) % 256
        j = (j + key [i]) % 256
        key [i], key [j] = key [j], key [i]
        yield key [(key [i] + key [j]) % 256]

def encrypt_or_decrypt (text, S):
    alt_text = []
    key = ksa (S)
    prga_generator = prga(key)
    for byte in text:
        alt_text.append (prga_generator. next())
    return tuple (alt_text)

if __name__ == '__main__':
    from time import time 
    stime = time ()
    import sys
    rounds = int (sys.argv [1]); text_length = 256; key_length = 16
    text = [0 for x in range(text_length)]
    out = list ()

    import random, numpy as np
    for r in range (rounds):
        S = [random.randint(0, 255) for x in range (key_length)]
        out.append (encrypt_or_decrypt (text, S))

    print str (range (256)) [1:-1]

    for o in np.transpose (out):
        temp = list(np.bincount (o))
        while len (temp) < 256:
            temp.append (0)
        print str (temp)[1:-1]
    print time () - stime

Pipe the output to a CSV file and note the content of A3 cell, after opening it with an office package (it should be nearly double of each of the other cells).

This one is of 524288 keys (it took 220.447999954s of my i5 processor, 4 GB RAM):

BONUS: Create the whole 256 by 256 frequency matrix, where rows indicate the frequency of a number (in 0 to 255) and columns indicate the corresponding byte position (from 1 to 256).

Winning conditions


  • Fastest in terms of # iterations/ second in my computer
  • Single-threaded execution only (multi-threaded versions are also welcome here, but not as a participant)
  • For tie (or nearly tie) cases, popularity and capability of more iterations will be considered (though I suppose there will be no such situation)
  • No need for true random numbers, pseudo random numbers will suffice, as it is not a cryptographic contest

Update In my computer, here are the results (2^24 iterations):

  1. bitpwner The 256 * 256 matrix in 50.075s
  2. Dennis The bias 19.22s

Since, there seems no more answer coming in, I declare Dennis as the winner. bitpwner did really good with the big matrix.

xxx---

Posted 2014-07-31T18:15:24.303

Reputation: 157

What do you mean by random? Do the keys of each iteration have to get selected uniformly and independently? – Dennis – 2014-08-09T15:47:05.890

How is the winner decided? Tested keys per seconds? – Dennis – 2014-08-09T15:48:14.507

@Dennis At each iteration, each key must be taken at random. But, I am not sure what do you exactly mean by taken uniformly and independently. – xxx--- – 2014-08-09T15:50:16.800

@Dennis It will surely depend on number of iterations per second. Additionally, I will keep in mind two things: Whether it is capable of more (like 2^30) iterations and which one is most popular. – xxx--- – 2014-08-09T15:52:51.777

Just saying randomly is meaningless unless you define the distribution. Uniformly means that each key should have the same probability of getting selected. Independently means that the selected keys should not depend on the previous ones. Uniform selection implies independent selection. Strictly speaking, this would make the existing answer invalid since rand() is deterministic. – Dennis – 2014-08-09T15:53:23.043

@Dennis Each key has same chance of getting selected. But, for now you can take pseudo-random numbers. – xxx--- – 2014-08-09T15:55:23.607

I have not quite decided yet for the winning conditions. That's a requirement for every open challenge. Questions without an objective winning criterion can (and will) get closed until they're edited. You should also decide if submissions have to be single threaded. – Dennis – 2014-08-09T15:55:51.507

Let us continue this discussion in chat.

– xxx--- – 2014-08-09T15:59:08.653

Answers

2

C

Takes about 111 secs to run on my Macbook Air for 2^24 = 16777216 trials.

To compile: gcc -o rc4 rc4.c

To pipe the output to a csv: ./rc4 > rc4out.csv

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdlib.h>
#include <sys/time.h>

#define KEY_LENGTH 16
#define ITERATIONS 16777216

void ksa(unsigned char state[], unsigned char key[]) {
    int i,j=0,t; 
    for (i=0; i<256; ++i)
        state[i] = i; 
    for (i=0; i<256; ++i) {
        j = (j + state[i] + key[i % KEY_LENGTH]) % 256; 
        t = state[i]; 
        state[i] = state[j]; 
        state[j] = t; 
    }
}

void prga(unsigned char state[], unsigned char out[]) {  
    int i=0,j=0,x,t; 
    unsigned char key; 
    for (x=0; x<256; ++x) {
        i = (i + 1) % 256; 
        j = (j + state[i]) % 256; 
        t = state[i];
        state[i] = state[j];
        state[j] = t;
        out[x] = state[(state[i] + state[j]) % 256];
    }
}

void makeRandomKey(unsigned char key[]) {
    int i;
    for (i=0; i<KEY_LENGTH; ++i) 
        key[i] = rand() % 256;
} 

int main(int argc, char *argv[]) {

    struct timeval time_started, time_now, time_diff;
    gettimeofday(&time_started, NULL);

    unsigned char state[256];
    unsigned char out[256];
    unsigned char key[KEY_LENGTH];
    int occurances[256][256];
    int i,j,bytePosition,charOccurance;

    for (i=0; i<256; ++i) 
        for (j=0; j<256; ++j)
            occurances[i][j] = 0;

    srand(time(NULL));
    for (i=0; i<ITERATIONS; ++i) {
        makeRandomKey(key);
        ksa(state, key);
        prga(state, out);

        for (j=0; j<256; ++j)
            ++occurances[j][out[j]];
    }

    for (bytePosition=0; bytePosition<256; ++bytePosition) {
        printf("%d,", bytePosition);
    }
    printf("\n");
    for (charOccurance=0; charOccurance<256; ++charOccurance) {
        for (bytePosition=0; bytePosition<256; ++bytePosition) {
            printf("%d,", occurances[charOccurance][bytePosition]);
        }
        printf("\n");
    }

    gettimeofday(&time_now, NULL);
    timersub(&time_now, &time_started, &time_diff);
    printf("Time taken,%ld.%.6ld s\n", time_diff.tv_sec, time_diff.tv_usec);

    return 0;
}

Results for 16777216 trials.

enter image description here

Multithreaded version here. Compile with gcc -o rc4 rc4.c -pthreads.

Vectorized

Posted 2014-07-31T18:15:24.303

Reputation: 3 486

Why have you used unsigned char? Also, please check that I have added winning conditions. – xxx--- – 2014-08-10T05:55:43.293

2Not much reason really. It can store 8 bits to represent 0-255. Int works too. – Vectorized – 2014-08-10T06:00:11.317

1

C (1,432,725 keys / second)

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define RAND_64() *Ks = rand(), Ks++, *Ks = rand(), Ks++, *Ks = rand(), Ks++, *Ks = rand()
#define KSA_1()   t = *s, j += t + *k, *s = S[j], S[j] = t, s++
#define KSA_4()   KSA_1(); k++; KSA_1(); k++; KSA_1(); k++; KSA_1()
#define KSA_16()  KSA_4(); k++; KSA_4(); k++; KSA_4(); k++; KSA_4(); k = K
#define KSA_64()  KSA_16(); KSA_16(); KSA_16(); KSA_16()

int main(int argc, char *argv[])
{
        int i, m = atoi(argv[1]), r = 0;
        uint8_t j, K[16], *k = K, S[256], *s, S0[256], t;
        uint16_t *Ks;
        for(i = 0; i < 256; i++) S0[i] = i;
        srand(time(NULL));

        for(i = 0; i < m; i++)
        {
                Ks = (uint16_t *) K, RAND_64(), Ks++, RAND_64();
                memcpy(S, S0, 256); s = S, j = 0;
                KSA_64(); KSA_64(); KSA_64(); KSA_64();
                j = S[1], S[1] = S[j], S[j] = j;
                t = S[2], j += t, S[2] = S[j], S[j] = t, t += S[2];
                r += (S[t] == 0);
        }
        printf("%u\n", r);
}

Benchmarking

$ gcc -Ofast rc4bias-rand.c && time ./a.out $[2**24]
130924
Real    11.71
User    11.71
Sys     0.00
$ bc <<< '2^24 / 11.71'
1432725

Tested on an Intel i7-3770 running Fedora 20.

Dennis

Posted 2014-07-31T18:15:24.303

Reputation: 196 637