Count number of hefty decimals between 2 numbers

16

5

Let's say we have a non-negative integer that is "hefty" (that is, "heavy") if its average digit value is greater than 7.

The number 6959 is "hefty" because:

(6 + 9 + 5 + 9) / 4 = 7.5

The number 1234 is not, because:

(1 + 2 + 3 + 4) / 4 = 2.5

Write a function, in any language,

HeftyDecimalCount(a, b)

which, when provided two positive integers a and b returns an integer indicating how many "hefty" integers are within the interval [a..b], inclusive.

For example, given a=9480 and b=9489:

9480   (9+4+8+0)/4 21/4 = 5.25 
9481   (9+4+8+1)/4 22/4 = 5.5
9482   (9+4+8+2)/4 23/4 = 5.75  
9483   (9+4+8+3)/4 24/4 = 6    
9484   (9+4+8+4)/4 25/4 = 6.25     
9485   (9+4+8+5)/4 26/4 = 6.5 
9486   (9+4+8+6)/4 27/4 = 6.75  
9487   (9+4+8+7)/4 28/4 = 7
9488   (9+4+8+8)/4 29/4 = 7.25   hefty 
9489   (9+4+8+9)/4 30/4 = 7.5    hefty

Two of the numbers in this range are "hefty" and so the function should return 2.

Some guidelines:

  • assume that neither a or b exceeds 200,000,000.
  • an n-squared solution will work, but will be slow -- what's the fastest we can solve this?

all_by_grace

Posted 2011-03-22T12:36:12.663

Reputation:

2what threw the TIMEOUT? – None – 2011-03-22T12:45:02.383

Answers

11

The problem can be solved in O(polylog(b)).

We define f(d, n) to be the number of integers of up to d decimal digits with digit sum less than or equal to n. It can be seen that this function is given by the formula

f(d, n)

Let's derive this function, starting with something simpler.

h(n,d) = \binom{n+d-1}{d-1} = \binom{(n+1)+(d-1)-1}{d-1}

The function h counts the number of ways to choose d - 1 elements from a multi-set containing n + 1 different elements. It's also the number of ways to partition n into d bins, which can be easily seen by building d - 1 fences around n ones and summing up each separated section . Example for n = 2, d = 3':

3-choose-2     fences        number
-----------------------------------
11             ||11          002
12             |1|1          011
13             |11|          020
22             1||1          101
23             1|1|          110
33             11||          200

So, h counts all numbers having a digit-sum of n and d digits. Except it only works for n less than 10, since digits are limited to 0 - 9 . In order to fix this for values 10 - 19, we need to subtract the number of partitions having one bin with a number greater than 9, which I'll call overflown bins from now on.

This term can be computed by reusing h in the following way. We count the number of ways to partition n - 10, and then choose one of the bins to put the 10 into, which results in the number of partitions having one overflown bin. The result is the following preliminary function.

g(n,d) = \binom{n+d-1}{d-1} - \binom{d}{1} \binom{n+d-1-10}{d-1}

We continue this way for n less or equal 29, by counting all the ways of partitioning n - 20, then choosing 2 bins where we put the 10's into, thereby counting the number of partitions containing 2 overflown bins.

But at this point we have to be careful, because we already counted the partitions having 2 overflown bins in the previous term. Not only that, but actually we counted them twice. Let's use an example and look at the partition (10,0,11) with sum 21. In the previous term, we subtracted 10, computed all partitions of the remaining 11 and put the 10 into one of the 3 bins. But this particular partition can be reached in one of two ways:

(10, 0, 1) => (10, 0, 11)
(0, 0, 11) => (10, 0, 11)

Since we also counted these partitions once in the first term, the total count of partitions with 2 overflown bins amounts to 1 - 2 = -1, so we need to count them once more by adding the next term.

g(n,d) = \binom{n+d-1}{d-1} -  \binom{d}{1} \binom{n+d-1-10}{d-1} + \binom{d}{2} \binom{n+d-1-20}{d-1}

Thinking about this a bit more, we soon discover that the number of times a partition with a specific number of overflown bins is counted in a specific term can be expressed by the following table (column i represents term i, row j partitions with j overflown bins).

1 0 0 0 0 0 . .
1 1 0 0 0 0 . .
1 2 1 0 0 0 . .
1 4 6 4 1 0 . .
. . . . . . 
. . . . . . 

Yes, it's Pascals triangle. The only count we are interested in is the one in the first row/column, i.e. the number of partitions with zero overflown bins. And since the alternating sum of every row but the first equals 0 (e.g. 1 - 4 + 6 - 4 + 1 = 0), that's how we get rid of them and arrive at the penultimate formula.

g(n,d) = \sum_{i=0}^{d} (-1)^i \binom{d}{i} \binom{n+d-1 - 10i}{d-1}

This function counts all numbers with d digits having a digit-sum of n.

Now, what about the numbers with digit-sum less than n ? We can use a standard recurrence for binomials plus an inductive argument, to show that

\bar{h}(n,d) = \binom{n+d}{d} = \binom{n+d-1}{d-1} + \binom{n+d-1}{d} = h(n,d) + \bar{h}(n-1,d)

counts the number of partitions with digit-sum at most n. And from this f can be derived using the same arguments as for g.

Using this formula, we can for example find the number of heavy numbers in the interval from 8000 to 8999 as 1000 - f(3, 20), beacuse there are thousand numbers in this interval, and we have to subtract the number of numbers with digit sum less than or equal to 28 while taking in to acount that the first digit already contributes 8 to the digit sum.

As a more complex example let's look at the number of heavy numbers in the interval 1234..5678. We can first go from 1234 to 1240 in steps of 1. Then we go from 1240 to 1300 in steps of 10. The above formula gives us the number of heavy numbers in each such interval:

1240..1249:  10 - f(1, 28 - (1+2+4))
1250..1259:  10 - f(1, 28 - (1+2+5))
1260..1269:  10 - f(1, 28 - (1+2+6))
1270..1279:  10 - f(1, 28 - (1+2+7))
1280..1289:  10 - f(1, 28 - (1+2+8))
1290..1299:  10 - f(1, 28 - (1+2+9))

Now we go from 1300 to 2000 in steps of 100:

1300..1399:  100 - f(2, 28 - (1+3))
1400..1499:  100 - f(2, 28 - (1+4))
1500..1599:  100 - f(2, 28 - (1+5))
1600..1699:  100 - f(2, 28 - (1+6))
1700..1799:  100 - f(2, 28 - (1+7))
1800..1899:  100 - f(2, 28 - (1+8))
1900..1999:  100 - f(2, 28 - (1+9))

From 2000 to 5000 in steps of 1000:

2000..2999:  1000 - f(3, 28 - 2)
3000..3999:  1000 - f(3, 28 - 3)
4000..4999:  1000 - f(3, 28 - 4)

Now we have to reduce the step size again, going from 5000 to 5600 in steps of 100, from 5600 to 5670 in steps of 10 and finally from 5670 to 5678 in steps of 1.

An example Python implementation (which received slight optimisations and testing meanwhile):

def binomial(n, k):
    if k < 0 or k > n:
        return 0
    result = 1
    for i in range(k):
        result *= n - i
        result //= i + 1
    return result

binomial_lut = [
    [1],
    [1, -1],
    [1, -2, 1],
    [1, -3, 3, -1],
    [1, -4, 6, -4, 1],
    [1, -5, 10, -10, 5, -1],
    [1, -6, 15, -20, 15, -6, 1],
    [1, -7, 21, -35, 35, -21, 7, -1],
    [1, -8, 28, -56, 70, -56, 28, -8, 1],
    [1, -9, 36, -84, 126, -126, 84, -36, 9, -1]]

def f(d, n):
    return sum(binomial_lut[d][i] * binomial(n + d - 10*i, d)
               for i in range(d + 1))

def digits(i):
    d = map(int, str(i))
    d.reverse()
    return d

def heavy(a, b):
    b += 1
    a_digits = digits(a)
    b_digits = digits(b)
    a_digits = a_digits + [0] * (len(b_digits) - len(a_digits))
    max_digits = next(i for i in range(len(a_digits) - 1, -1, -1)
                      if a_digits[i] != b_digits[i])
    a_digits = digits(a)
    count = 0
    digit = 0
    while digit < max_digits:
        while a_digits[digit] == 0:
            digit += 1
        inc = 10 ** digit
        for i in range(10 - a_digits[digit]):
            if a + inc > b:
                break
            count += inc - f(digit, 7 * len(a_digits) - sum(a_digits))
            a += inc
            a_digits = digits(a)
    while a < b:
        while digit and a_digits[digit] == b_digits[digit]:
            digit -= 1
        inc = 10 ** digit
        for i in range(b_digits[digit] - a_digits[digit]):
            count += inc - f(digit, 7 * len(a_digits) - sum(a_digits))
            a += inc
            a_digits = digits(a)
    return count

Edit: Replaced the code by an optimised version (that looks even uglier than the original code). Also fixed a few corner cases while I was at it. heavy(1234, 100000000) takes about a millisecond on my machine.

Sven Marnach

Posted 2011-03-22T12:36:12.663

Reputation: 236

Can anyone explain how you derived the formula formula for f(d,n) ? – Amol Sharma – 2012-04-07T09:45:29.537

@AmolSharma: Sorry, I don't think anybody can explain how I derived this formula. :) I didn't include an explanation at that time because I arrived at this in a rather esoteric way, and straightening out the details and writing a good explanation would have taken a significant amount of time. Now, more than a year later, it would take even more time, so you'll have to work it out yourself. – Sven Marnach – 2012-04-07T09:59:01.423

Hi, this solution works and it was a correct computation, however the time limit for small numbers was only 0.10 second, and time limit for big number was 0.35 second. The above code that you posted took about 1 second. Do you think there is any better way, and smart way of handling this, such that, to skip some numbers because we already know that the particular number would have a digit sum less than 7? Or maybe if there is a smarter way to handle this? For your information, this question was also tagged as a hard question. – None – 2011-03-22T20:50:30.853

1@Bob: The code is written in Python, and not optimised at all. If you want it to be fast, write it in C. But also in pure Python there is a lot of room for improvement. The first thing that needs optimisation is the binomial() function. There are also a few more things that can easily be improved. I'll post an update in a few minutes. – Sven Marnach – 2011-03-22T21:05:20.083

Or we can just use a lookup table with precomputed f(m,n). Given that 200,000,000 is the limit, the memory usage should be minimal. (You have my +1 already). – None – 2011-03-22T21:11:36.537

@Moron: That certainly seems to be the best option -- I'll try it. – Sven Marnach – 2011-03-22T21:14:24.183

@Moron: I'd need to include the lookup table in the source code. Usually f(d, n) is not called twice with the same parameters during one run of the program. – Sven Marnach – 2011-03-22T21:33:40.460

@Sven: My guess is the f(d,n) computation would be the bottleneck. So even if it is not called twice, having a lookup table will help. Of course, if you want to avoid putting the lookup table in the source, you could run your algorithm twice as a proof of concept. First time you time the run as you create the necessary entries, and second time you time the same run, but instead do a lookup instead of computing f(n,d). I guess it will give you a comparison too. – None – 2011-03-22T21:37:35.637

5

Recurse, and use permutations.

Suppose we define a general function that finds the values between a and b with a heaviness more than x:

heavy_decimal_count(a,b,x)

With your example of a=8675 to b=8689, the first digit is 8, so throw it away - the answer will be the same as 675 to 689, and again from 75 to 89.

The average weight of the first two digits 86 is 7, so the remaining digits need an average weight of more than 7 to qualify. Thus, the call

heavy_decimal_count(8675,8689,7)

is equivalent to

heavy_decimal_count(75,89,7)

So our range for the (new) first digit is 7 to 8, with these possibilities:

7: 5-9
8: 0-9

For 7, we still need an average of more than 7, which can only come from a final digit of 8 or 9, giving us 2 possible values.

For 8, we need an average of more than 6, which can only come from a final digit of 7-9, giving us 3 possible values.

So, 2+3 yields 5 possible values.

What is happening is that the algorithm is starting with the 4-digit number and dividing it into smaller problems. The function would call itself repeatedly with easier versions of the problem until it has something it can handle.

Phil H

Posted 2011-03-22T12:36:12.663

Reputation:

2So you are claiming Heavy(886,887) = Heavy(6,7)? – None – 2011-03-22T17:44:31.223

@Moron: No, because the first two 8s change the threshold for heaviness. In the example, the first two were 86, which average to 7 and thus don't change the threshold. If (8+8+x)/3 > 7, then x>5. So Heavy(886,887,7.0) == Heavy(6,7,5.0). – None – 2011-04-13T10:16:01.200

@Phil H, I don't think this idea as it stands would work: if you take 9900 and 9999, it would alter it to could the heavies between 0 and 99, taking e.g. 8 into account and 9908 is not a heavy number (@Aryabhatta). – Hans Roggeman – 2014-05-14T15:54:09.843

3

Maybe you can skip many candidates in the interval from a to b by accumulating their "heaviness".

if you know the length of you number you know that every digit can change the heaviness by only 1/length.

So, if you start at one number which is not heavy you should be able to calculate the next number which will be heavy, if you increase them by one.

In your example above starting at 8680 avg=5.5, which is 7-5.5=1.5 point away from you heaviness border, you'd know that there are 1.5/(1/4)=6 numbers in between, which are NOT heavy.

That should to the trick!

Thorben

Posted 2011-03-22T12:36:12.663

Reputation:

Same goes for a row of "heavy" numbers. You can just calculate the number and skip them! – None – 2011-03-22T12:52:39.637

1Just multiply everything by the number of digits and you will get rid of those pesky /lengths. – None – 2011-03-22T13:02:44.180

1

How about a simple recursive function? To keep things simple, it calculates all heavy numbers with digits digits, and a minimal digit sum of min_sum.

int count_heavy(int digits,int min_sum) {
  if (digits * 9 < min_sum)//impossible (ie, 2 digits and min_sum=19)
    return 0; //this pruning is what makes it fast

  if (min_sum <= 0)
      return pow(10,digits);//any digit will do,
      // (ie, 2 digits gives 10*10 possibilities)

  if (digits == 1)
  //recursion base
    return 10-min_sum;//only the highest digits

  //recursion step
  int count = 0;
  for (i = 0; i <= 9; i++)
  {
     //let the first digit be i, then
     count += count_heavy(digits - 1, min_sum - i);
  }
  return count;
}

count_heavy(9,7*9+1); //average of 7,thus sum is 7*9, the +1 is 'exceeds'.

Implemented this in python and it found all 9-digit heavy numbers in ~2 seconds. A little bit of dynamic programming could improve this.

Ishtar

Posted 2011-03-22T12:36:12.663

Reputation:

0

C, for interval [a,b] it is O(b-a)

c(n,s,j){return n?c(n/10,s+n%10,j+1):s>7*j;}

HeftyDecimalCount(a,b){int r; for(r=0;a<=b;++a)r+=c(a,0,0); return r;}

//the exercise

main()
{
 printf("[9480,9489]=%d\n", HeftyDecimalCount(9480,9489));
 printf("[0,9489000]=%d\n", HeftyDecimalCount(9480,9489000));
 return 0;
}

//the results

//[9480,9489]=2
//[0,9489000]=66575

RosLuP

Posted 2011-03-22T12:36:12.663

Reputation: 3 036

What does it mean "Standard loopholes"? – RosLuP – 2017-04-16T17:28:23.347

1@Riker Here the tag is not <codegolf> it is <fast algorithm > – RosLuP – 2017-12-13T20:47:32.797

0

This is one possible solution.

public int heavy_decimal_count(int A, int B)
{
    int count = 0;                       
    for (int i = A; i <= B; i++)
    {
        char[] chrArray = i.ToString().ToCharArray();
        float sum = 0f;
        double average = 0.0f;
        for (int j = 0; j < chrArray.Length; j++)
        {
            sum = sum + (chrArray[j] - '0');                   
        }
        average = sum / chrArray.Length;                
        if (average > 7)
            count++;
    }
    return count;
}

zohaib

Posted 2011-03-22T12:36:12.663

Reputation: 1

1Welcome to Code Golf. When a question is answered already, more answers are welcome if they're better than it in one of the winning criteria, or they show a new and interesting way to answer it. I don't see how your answer is either. – ugoren – 2012-10-20T14:17:03.213