Generate an expression yielding near-equidistributed result

5

1

Create a program (any language) which, given positive integer m, outputs a valid C expression that:

  1. Uses a single variable x assumed of 32-bit unsigned type (i.e. uint32_t)
  2. Would evaluate to range [0, m) for any of the 2³² possible x, reaching any of the m possible outcomes either 2³² / m or 2³² / m + 1 times (where / rounds down).
  3. Is limited to use of
    • variable x as operand
    • constants from 0 to 2³²-1, in decimal (42) or hexadecimal (0x2a)
    • operators among (listed by groups of decreasing priority in C)
      • + -  restricted to addition or subtraction of two operands
      • << >>  restricted to right argument constant in range [0, 31]
      • &  restricted to bitwise and
      • ^  bitwise xor
      • |  bitwise or
    • parenthesis for  (expression)
    • sequence idiom  (x=expression1),expression2  which evaluates expression1, assigns it to x, then evaluates to expression2. The sequence idiom can only be used to form the overall expression or expression2 in a sequence. , and = do not account as operator.

It is assumed all arithmetic is such that all quantities are truncated to 32-bit, and that right shift >> introduce zeroes.

When given input m, the first output of the program must be an expression valid for m, aiming at a minimal operator count. Further output, if any, should be expressions for incremental m.

Example output for input 1, with extra columns.

output                     m  count  alternative
0                          1    0    (0)
x&1                        2    1    x>>31
(x-(x>>1)+(x>>2))>>30      3    5    (x=x-(x>>2)),(x=x-(x>>31)),x>>30
x&3                        4    1    x>>30

Following comment, this is a code challenge. Ranking criteria:

  1. Producing conforming expressions with minimum operator count up to the highest m. Expressions posted or linked in other answers are used to assess minimum, after verification.
  2. Minimal expression depth up to the highest m. Depth is 0 for x or constant. The result of an operator has the maximum depth of the two operands when they differ or in a sequence, or 1 more otherwise. Expressions in the example output have depth 1, except for 0. The expression x+1&x+2^x+3&x+4 has depth 3, and any shorter expression has lower depth.
  3. Programs that convincingly produce minimal output (as defined above) are preferred.
  4. Programs which can be evaluated at compile time by a C/C++ compiler for constant m, so that use of R(x,m) generates the same object code as the expression for m does, are preferred. Nice-to-have: not requiring C++, not requiring inline.

Motivation: starting from x uniformly random, we want to restrict it to [0, m) with a distribution comparable to the expression x%m, in a way that is fast and constant-time. Count and depth match instruction and register count on LEG Brain M0, an imaginary 32-bit CPU.


Modifying the following C program changing E and M, then compiling and running it on a compiler with 32-bit int, exhaustively checks an expression against requirement 2, assuming that it is a well-formed C expression that matches requirements 1 and 3. Try it online!

// Verify an expression for requirement 2

// expression under test
#define E (x=x-(x>>2)),(x=x-(x>>31)),x>>30
#define M 3                     // value of m matching that expression

#define T uint32_t              // type for x (change at your own risk!)
#define N 32                    // number of bits in type T
#define C (T)                   // cast to type T; use in E only!

#include <stdint.h>
typedef T t;                    // type for x
#include <inttypes.h>
#if N==32
    #define pPRIuTN "%"PRIu32
    #define pPRIxTN "%08"PRIx32 
#elif N==16
    #define pPRIuTN "%"PRIu16
    #define pPRIxTN "%04"PRIx16 
#elif N==8
    #define pPRIuTN "%"PRIu8
    #define pPRIxTN "%02"PRIx8  
#endif
#include <stdio.h>
#include <string.h>

#define m ((t)M)     // m per our type
#define R(z) #z      // stringizer
#define S(z) R(z)

t c[m];              // counter array (initialized to 0)

int main(void) {
    t fail = sizeof(t)*8!=N || (t)(-1)<0 || ((t)(-1)>>(N-1))!=1,
        lo, hi, xlo, xhi, v, x, y = 0;
    if (fail) 
        printf("### fail, please check T: "S(T)" versus N: "S(N)"\n");
    else
        {
    // laboriously show the expression tested, without extra parenthesis
        #define LEN (sizeof(S((E)))-3)  // length of the expression
        char exp[LEN+1];                // the expression, NUL-terminated
        memcpy(exp,S((E))+1,LEN);
        exp[LEN]=0;
        printf("Testing expression %s for m="pPRIuTN"\n",exp,m);
    // compute counts
        do {
            x = y;
            x = (E);
            if (x<m)
                ++c[x];
            else if (!fail) {
                fail = 1;
                printf("### fail, value "pPRIuTN" at input x="pPRIuTN"=0x"pPRIxTN" should be less than m="pPRIuTN"\n",x,y,y,m);
            }
        } while(++y);
    // find minimum and maximum count, and where it occurs
        hi = xlo = xhi = 0;
        lo = (t)-1;
        v = m;
        do {
            y = c[--v];
            if (y>hi)
                hi = y, xhi = v;
            else if (y<lo)
                lo = y, xlo = v;
        } while(v);
    // show results
        if (hi==0 && m>1) // can occur if there was overflow or other error
            fail=1, printf("### fail, only the value "pPRIuTN" occurs\n",x);
        else {
            x = (t)(-m)/m+1;   // lower limit
            y = (t)(-1)/m+1;   // upper limit
            if (lo<x)
                fail=1, printf("### fail, value "pPRIuTN" occurs "pPRIuTN" times, minimum "pPRIuTN"\n",xlo,lo,x);
            if (hi>y)
                fail=1, printf("### fail, value "pPRIuTN" occurs "pPRIuTN" times, maximum "pPRIuTN"\n",xhi,hi,y);
            if (!fail)
                printf("### pass!\n");
            }
        }
    return fail;
}

fgrieu

Posted 2018-07-24T21:00:34.407

Reputation: 545

2What is the result of executing x=1+(x=2) or (x=1)+(x=2)+x? If it's undefined behavior, you should specify so. – user202729 – 2018-07-25T04:50:56.407

1Operator count should be tagged [tag:atomic-code-golf]? – qwr – 2018-07-25T09:35:35.520

@ W W: Is atomic-code-golf for when the program, not the output should have the minimum number of (language feature)? That's fine with me, but perhaps the tag's help should be clarified: "scored by the number of operations in a specific fragment of a language you define" applies, here I want the less possible operators in C, and that's ranking criteria #1. Or perhaps a code-challenge just can't be a code-golf..

– fgrieu – 2018-07-25T17:38:31.243

No not operator? – orlp – 2018-07-25T18:06:11.007

1Warning for those playing with the 16-bit or 8-bit reduced width versions of the validator: C’s promotion rules mean that you’ll need to explicitly mask out high bits for some operations. For example, the 8-bit validator accepts ((x+x+x&0xff)>>7)+(x>>7) but not ((x+x+x)>>7)+(x>>7) for m = 3. – Anders Kaseorg – 2018-07-26T01:25:43.730

Answers

3

Haskell

import Data.Bits

bitLen :: Int -> Int
bitLen n = finiteBitSize n - countLeadingZeros n

expr' :: Maybe Int -> Int -> String
expr' sh m =
  case sh of
    Nothing
      | m == 1 -> "0"
      | n == 0 -> "x" ++ s
      | o == 0 -> "x-(x-(x>>1)>>" ++ show (i - j - 1) ++ ")" ++ s
      | otherwise ->
        "x-(x-(" ++ expr' (Just j) o ++ ")>>" ++ show (i - j) ++ ")" ++ s
      where s
              | i == 32 = ""
              | otherwise = ">>" ++ show (32 - i)
    Just h
      | n == 0 -> "x>>" ++ show (h - i)
      | o == 0 -> "(x-(x>>" ++ show (i - j) ++ ")>>" ++ show (h - i) ++ ")-1"
      | otherwise ->
        "x-(x-(" ++
        expr' (Just j) o ++ ")>>" ++ show (i - j) ++ ")>>" ++ show (h - i)
  where
    i = bitLen (m - 1)
    n = bit i - m
    j = bitLen (n - 1)
    o = bit j - n

expr :: Int -> String
expr = expr' Nothing

Try it online!

The first few expressions are

  1. 0
  2. x>>31
  3. x-(x-(x>>1)>>1)>>30
  4. x>>30
  5. x-(x-(x>>2)>>1)>>29
  6. x-(x-(x>>1)>>1)>>29
  7. x-(x-(x>>1)>>2)>>29
  8. x>>29
  9. x-(x-(x>>3)>>1)>>28
  10. x-(x-(x>>2)>>1)>>28
  11. x-(x-((x-(x>>2)>>1)-1)>>1)>>28
  12. x-(x-(x>>1)>>1)>>28
  13. x-(x-(x>>2)>>2)>>28
  14. x-(x-(x>>1)>>2)>>28
  15. x-(x-(x>>1)>>3)>>28
  16. x>>28
  17. x-(x-(x>>4)>>1)>>27
  18. x-(x-(x>>3)>>1)>>27
  19. x-(x-((x-(x>>2)>>2)-1)>>1)>>27
  20. x-(x-(x>>2)>>1)>>27

C++ template version

template <unsigned m> unsigned Rs(unsigned x, unsigned h);

template <unsigned o>
unsigned Rs2(unsigned x, unsigned h, unsigned i, unsigned j) {
  return (x - ((x - Rs<o>(x, j)) >> (i - j))) >> (h - j);
}
template <> unsigned Rs2<0>(unsigned x, unsigned h, unsigned i, unsigned j) {
  return ((x - (x >> (i - j))) >> (h - i)) - 1;
}

template <unsigned n> unsigned Rs1(unsigned x, unsigned h, unsigned i) {
  constexpr unsigned j = bitlen<n - 1>();
  constexpr unsigned o = (1 << j) - n;
  return Rs2<o>(x, h, i, j);
}
template <> unsigned Rs1<0>(unsigned x, unsigned h, unsigned i) {
  return x >> (h - i);
}

template <unsigned m> unsigned Rs(unsigned x, unsigned h) {
  constexpr unsigned i = bitlen<m - 1>();
  constexpr unsigned n = (1 << i) - m;
  return Rs1<n>(x, h, i);
}

template <unsigned o> unsigned R2(unsigned x, unsigned i, unsigned j) {
  return (x - Rs<o>(x, j)) >> (i - j);
}
template <> unsigned R2<0>(unsigned x, unsigned i, unsigned j) {
  return (x - (x >> 1)) >> (i - j - 1);
}

template <unsigned n> unsigned R1(unsigned x, unsigned i) {
  constexpr unsigned j = bitlen<n - 1>();
  constexpr unsigned o = (1 << j) - n;
  return R2<o>(x, i, j);
}
template <> unsigned R1<0>(unsigned, unsigned) { return 0; }

template <unsigned m> unsigned R(unsigned x) {
  constexpr unsigned i = bitlen<m - 1>();
  constexpr unsigned n = (1 << i) - m;
  return (x - R1<n>(x, i)) >> (32 - i);
}

For example, gcc -O2 generates the following identical code for R<19>(x) and x-(x-((x-(x>>2)>>2)-1)>>1)>>27:

movl    %edi, %edx
movl    %edi, %ecx
leal    1(%rdi), %eax
shrl    $2, %edx
subl    %edx, %ecx
movl    %ecx, %edx
shrl    $2, %edx
subl    %edx, %eax
shrl    %eax
subl    %eax, %edi
movl    %edi, %eax
shrl    $27, %eax

How it works

The expression being computed is equivalent to \$\left\lfloor \frac{mx + m - c(m)}{2^{32}} \right\rfloor\$, where \$c(m)\$ satisfies \$0 < c(m) \le m\$. The specific choice of \$c(m)\$ made for golfing reasons is

$$ \begin{align*} c(2^{i + 2}k + 2^i) &= 1 \quad (k \ge 0, i \ge 0), \\ c(2^i - 2^j) &= 1 \quad (i > j \ge 0), \\ c(2^{h + 1}k + 2^h + 2^i - 2^j) &= 2^h \quad (k \ge 0, h > i > j + 1 \ge 1). \\ \end{align*} $$

The equivalence can be shown by repeated use of the identity

$$ -{\left\lfloor \frac{k}{2^i} \right\rfloor} = \left\lfloor \frac{2^i - 1 - k}{2^i} \right\rfloor. $$

Anders Kaseorg

Posted 2018-07-24T21:00:34.407

Reputation: 29 242

You were first with a much better algorithm. My prototype essentially consisted of (x > 0) + (x > 1) + (x > 2) + (x > 3) + .... – orlp – 2018-07-25T19:28:09.233

1@fgrieu I did some brute-force searching and found a couple of shortened expressions: (x-(x>>3)>>1)+(x>>2)>>28 for m = 11, x-(x-(x>>2)-(x>>4)>>1)>>27 for m = 21. I’m not sure how they might generalize, though. – Anders Kaseorg – 2018-07-26T07:00:39.583

Can you describe in arithmetic what these function do? At first I thought it was x // (2^32 // m) but that's not right. – orlp – 2018-07-26T09:33:37.563

@orlp: I get that they compute a close approximation of ((uint64_t)m*x)>>32 (wich always pass criteria 2), performing the multiplication by addition and subtraction of shifts of x, without overflow, and, critically, with a marginal error which stays within the tolerance of requirement 2. Beyond that uh.. I'm staring with admiration. – fgrieu – 2018-07-26T11:07:56.463

@orlp I’ve added an explanation of that at the end. – Anders Kaseorg – 2018-07-26T17:27:55.517

What did you use to brute force those shortened expressions? I can - given a template such as (x-(x>>_)>>_)+(x>>_)>>_ very quickly find constants that work but enumerating over all such 'templates' is a bit expensive... – orlp – 2018-07-27T18:36:09.460