Find the substring with the most 1's in a sequence

16

3

Introduction

I want to find the substring with the most 1's in a sequence of 0's and 1's.

Input

Your program has two inputs, the sequence and the substring length.

The sequence is any number of 0's and 1's:

01001010101101111011101001010100010101101010101010101101101010010110110110

The substring length is any positive non-zero integer:

5

Output

Your program should output the starting index of the first substring of the given length that contains the most 1's. With the above input, the output is:

10

The first character in the string starts at an index of 0.

Scoring

Shortest code wins!

Rules

  • Your program must always output the correct index for any valid inputs.
  • You can pick your input/output method from any answer with positive score on the default options. Please specify the method you pick in your answer.

hmatt1

Posted 2015-01-22T19:08:37.070

Reputation: 3 356

Your title and introduction says "find the substring with the most 1's". But your program description says you are giving a substring length and looking for the index of the first substring. So should we assume that the title and introduction are wrong? Most people seem to be solving the first part. Who wins? – swstephe – 2015-01-23T17:43:56.513

@swstephe I'm not sure I understand your confusion. If there are more than one substring tied for the most 1's, you output the first substring you found. You identify the substrings with the index of the first character in that substring. Does that help? – hmatt1 – 2015-01-23T17:52:15.097

Okay, so you are breaking the sequence in substrings and returning the index of the first substring with the most 1's? It sounded like you were looking for substrings of 1's. – swstephe – 2015-01-23T17:58:58.480

Does requirement "must always output the correct index for any given inputs" still apply if we give infeasible lengths, e.g. length=99? – smci – 2015-01-27T03:52:35.383

@smci you can assume for valid input. You don't have to handle a case where the substring length is longer than the sequence. – hmatt1 – 2015-01-28T17:00:32.950

Answers

11

Dyalog APL, 11

(-∘1+⍳⌈/)+/

Try it here. Usage:

   f ← (-∘1+⍳⌈/)+/
   4 f 0 1 1 0 1 1 1 0 0 0 0 1 1
1

Explanation

This is a dyadic (meaning binary) function that takes the substring length from the left, and the sequence from the right. Its structure is the following:

   ┌───┴────┐
 ┌─┴──┐     /
 ∘  ┌─┼─┐ ┌─┘
┌┴┐ + ⍳ / +  
- 1   ┌─┘    
      ⌈      

Explanation by explosion:

(-∘1+⍳⌈/)+/
(       )+/  ⍝ Take sums of substrings of given length, and feed to function in parentheses
    + ⌈/     ⍝ The array of sums itself, and its maximum
     ⍳       ⍝ First index of right argument in left
 -∘1         ⍝ Subtract 1 (APL arrays are 1-indexed)

As an example, let's take 4 and 0 1 1 0 1 1 1 0 as inputs. First we apply the function +/ to them and get 2 3 3 3 3. Then, + and ⌈/ applied to this array give itself and 3, and 2 3 3 3 3 ⍳ 3 evaluates to 2, since 3 first occurs as the second element. We subtract 1 and get 1 as the final result.

Zgarb

Posted 2015-01-22T19:08:37.070

Reputation: 39 083

In your example, the length is 4, but there are no 4 same items in a row (01101110), so why does it output anything at all? – Thomas Weller – 2015-01-24T20:18:46.423

@ThomasW. The example in the challenge has no 5 same items in a row either, and yet the output is 10. The way I interpret the task is that I need to find the first index of a substring of the given length that has m ones, where m is maximal. – Zgarb – 2015-01-24T21:37:47.507

10

Ruby, 42

f=->s,n{(0..s.size).max_by{|i|s[i,n].sum}}

Takes input by calling it, e.g.

f['01001010101101111011101001010100010101101010101010101101101010010110110110',5]

This compares substrings using their total ASCII value and returns the index of the maximum. I'm not sure if max_by is required by the Ruby spec to be stable but it appears to be in the C implementation.

histocrat

Posted 2015-01-22T19:08:37.070

Reputation: 20 600

6

Python 2, 56

lambda s,l:max(range(len(s)),key=lambda i:sum(s[i:i+l]))

Accepts an array of integers, then the length.

feersum

Posted 2015-01-22T19:08:37.070

Reputation: 29 566

This needs an array of integers as input, so if you start with a string, you need to do: [int(s) for s in "010010...0"] – smci – 2015-01-27T03:47:08.720

Bug: f(ss, 999) will return 0 (instead of None). Can you fix that? This arguably violates rule 1. – smci – 2015-01-27T03:51:10.713

@smci I have no idea what you're talking about. How am I supposed to know what's in the variable ss? None is never a desired output in any case as the answer is an integer. – feersum – 2015-01-27T13:02:41.817

5

Batch - 222

Batch is obviously the perfect language for this kind of operation.

@echo off&setLocal enableDelayedExpansion&set s=%1&set l=-%2
:c
if defined s set/Al+=1&set "s=%s:~1%"&goto c
set s=%1&set x=0&for /l %%a in (0,1,%l%)do set c=!s:~%%a,%2!&set c=!c:0=!&if !c! GTR !x! set x=!c!&set y=%%a
echo !y!

Un-golfed / dissected:

Initial setup. The variable s is the input string, and l will be the length of the input string, minus the sub-string length (initialized at negative %2 where %2 is the given sub-string length).

@echo off
setLocal enableDelayedExpansion
set s=%1
set l=-%2

Get the length of the input as l, using a pure Batch string length solution - this mangles the variable s containing the input string, so we then set it again.

:c
if defined s (
    set /A l += 1
    set "s=%s:~1%"
    goto c
)
set s=%1

The value of x is used to check which sub-string had the greatest number of 1's. Start a loop from 0 to the length of the string, minus the sub-string length (variable l). Get the sub-string starting from the current point in the loop (%%a), c is set as the input string starting at %%a, and taking %2 (the given sub-string length) characters. Any 0s are removed from c, then the value of c is compared to x - i.e. 111 is a greater number than 11 so we can just use the 'string' to do a greater than comparison. y is then set to current location in the string - which is finally outputted.

set x=0
for /l %%a in (0, 1, %l%) do (
    set c=!s:~%%a,%2!
    set c=!c:0=!
    if !c! GTR !x! (
        set x=!c!
        set y=%%a
    )
)
echo !y!

Using OPs example -

h:\>sub1.bat 01001010101101111011101001010100010101101010101010101101101010010110110110 5
10

unclemeat

Posted 2015-01-22T19:08:37.070

Reputation: 2 302

5

C# (Regex), 196

class Test{static void Main(string[]a){System.Console.Write(System.Text.RegularExpressions.Regex.Match(a[1],"(?=((?<o>1)|0){"+a[0]+"})(?!.+(?=[10]{"+a[0]+"})(?!((?<-o>1)|0){"+a[0]+"}))").Index);}}

The actual regex is not that long, but all the fluffs needed for a C# program to compile double the size of the code.

The actual regex, setting the length to 5:

(?=((?<o>1)|0){5})(?!.+(?=[10]{5})(?!((?<-o>1)|0){5}))
  • (?=((?<o>1)|0){5}): Look-ahead to read 5 characters without consuming, and push all 1's into "stack" o.
  • (?=[10]{5})(?!((?<-o>1)|0){5}): At a position which has 5 characters ahead, there is not enough item in "stack" o to pop out, i.e. the substring has strictly more 1 than what we have in the current position.
  • (?!.+(?=[10]{5})(?!((?<-o>1)|0){5})): A position as described above can't be found for the rest of the string, i.e. all position has less than or equal number of 1's.

Taking the first result gives the answer, since all substrings in front of it has some substring ahead with more 1's, and we have check that any index larger than the current index has less than or equal number of 1's.

(And I learn something nice: the "stack" is restored on backtracking).

n̴̖̋h̷͉̃a̷̭̿h̸̡̅ẗ̵̨́d̷̰̀ĥ̷̳

Posted 2015-01-22T19:08:37.070

Reputation: 5 683

1Very cool, I wouldn't have guessed you could do this with a regex. – histocrat – 2015-01-23T15:28:31.173

4

Pyth, 12

Mho/<>GNHZUG

This defines a function g, which requires a list of numbers and a number as input. E.g.

Mho/<>GNHZUGg[0 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0)5

You can test it here: Pyth Compiler/Executor

Explanation:

Mho/<>GNHZUG
M             defines a function g(G,H), G is the sequence, H the sequence length
  o       UG  orders the numbers between 0 and len(G)-1 according to the following key
    <>GNH     take the subsequence G[N:N+5]
   /     Z    count the zeros in this subsequence (this is the key)
 h            return the first value of the sorted list (minimum)

Alternative:

Mho_s<>GNHUG

Jakube

Posted 2015-01-22T19:08:37.070

Reputation: 21 462

You can get a same length answer using a program that takes a string of values ( 01001...) then the number: ho/<>zNQ\0Uz Sadly, count on a string doesn't auto-convert what you are looking for to a string :( – FryAmTheEggman – 2015-01-22T20:13:26.643

4

J, 15 14 chars

   ([:(i.>./)+/\)

   5 ([:(i.>./)+/\) 0 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0
10

randomra

Posted 2015-01-22T19:08:37.070

Reputation: 19 909

I find it interesting when real languages beat out languages specifically made for code golf. My K entry was eaten or I would have posted it, but it came to 20 characters anyways. – JasonN – 2015-01-25T03:33:26.193

4

Haskell, 64 62 Bytes

n#l=0-(snd$maximum[(sum$take n$drop x l,-x)|x<-[0..length l]])

Usage:

5#[0,1,0,0,1,0,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,1,0,0,1,0,1,1,0,1,1,0,1,1,0]

nimi

Posted 2015-01-22T19:08:37.070

Reputation: 34 639

You can save 2 bytes by defining an infix function: n#l=... – Zgarb – 2015-01-23T07:31:38.820

you could use an infix function for p. also, I think the 0 is redundant (although the parentheses aren't, and you might need a space instead of that 0). – proud haskeller – 2015-01-23T08:58:09.353

4

Matlab (42)

Let s denote the string and n the substring length. The result is r.

Compute convolution of s with a sequence of n ones, then find the maximum. Convolution is done easily with conv, and the max function returns the position of the first maximum. It's necessary to subtract 1 to the resulting index, because Matlab indexing starts at 1, not 0.

[~, r] = max(conv(s, ones(1,n), 'valid'));
r = r-1;

Golfed:

[~,r]=max(conv(s,ones(1,n),'valid'));r=r-1

Luis Mendo

Posted 2015-01-22T19:08:37.070

Reputation: 87 464

3

JavaScript (ES6) 73

A function returning the requested value. The for loop scans the input string keeping a running total, saving the position of the max value.

F=(a,n)=>(x=>{for(r=t=i=x;a[i];t>x&&(x=t,r=i-n))t+=a[i]-~~a[i++-n]})(0)|r

Ungolfed

F=(a, n) => {
   for(x = r = t = i = 0; a[i]; i++)
     t += a[i] - ~~a[i-n], // ~~ convert undefined values (at negative index) to 0
     t > x && (x=t, r=i-n+1);
   return r;
}

Test In FireFox/FireBug console

F("01001010101101111011101001010100010101101010101010101101101010010110110110",5)

Output 10

edc65

Posted 2015-01-22T19:08:37.070

Reputation: 31 086

To reduce your code, you don't need to define the variables x and r. This should reduce 4 bytes, being the final length of 69 bytes. Also, you probably might be able to replace && with &. But nice one with the ~~ trick! – Ismael Miguel – 2015-01-23T01:19:34.773

@IsmaelMiguel you need to init x, else error at first t > x. You need to init r: try F("00000"). And && is needed to emulate and if – edc65 – 2015-01-23T08:11:16.790

You are completely right. I didn't noticed that you were expecting it to ignore (x=t, r=i-n+1) if t was lower or equal than x. That's a good use of lazy evaluation! I wish it could be chopped off somewhere, but I guess you did all the work. – Ismael Miguel – 2015-01-23T09:11:09.773

3

Mathematica, 38 36

f=#-1&@@Ordering[-MovingAverage@##]&

Example:

f[{0,1,0,0,1,0,1,0,1,0,1,1,0,1,1,1,1,0,1,1,1,0,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,1,0,0,1,0,1,1,0,1,1,0,1,1,0},5]

Output:

10

alephalpha

Posted 2015-01-22T19:08:37.070

Reputation: 23 988

3

PHP (96)

for($a=$b=$c=0;(($d=@substr_count($s,1,$a,$n))>$c&&($b=$a)&&($c=$d))||$a++<strlen($s););echo $b;

http://3v4l.org/J4vqa

variables $s and $n should be defined on the command line to the search string and substring length, respectively.

This would also work in any C-like language with appropriate functions for substr_count() and strlen().

Stephen

Posted 2015-01-22T19:08:37.070

Reputation: 201

2

C# (Linq), 148 bytes

using System.Linq;class C{int F(string s,int l){return s.IndexOf(s.Skip(l-1).Select((c,i)=>s.Substring(i,l)).OrderBy(p=>-p.Sum(c=>c)).First());}}

Formatted:

using System.Linq;

class C
{
    int F(string s, int l)
    {
        return s.IndexOf(
            s
                .Skip(l - 1)
                .Select((c, i) => s.Substring(i, l))
                .OrderBy(p => -p.Sum(c => c))
                .First()
        );
    }
}

Takes input as method params.

What it does:

string result = s // string is also char collection
    .Skip(l - 1) // make it collection shorter by l-1
    .Select((c, i) => s.Substring(i, l)) // so we can iterate, and select all substrings
    .OrderBy(p => -p.Sum(c => c)) // order substrings descending by sum of characters
    .First() // take first (most ones)

return s.IndexOf(result); // find index of result string

Krzysztof

Posted 2015-01-22T19:08:37.070

Reputation: 121

2

Scala - 70 Bytes

readLine.sliding(readInt).zipWithIndex.maxBy(x=>x._1.count(_=='1'))._2

But with function names as long as zipWithIndex I guess Scala is not the best choice for code golf.

Dominik Müller

Posted 2015-01-22T19:08:37.070

Reputation: 131

2

C, 245 185

#include <stdio.h>
main(int argc,char **argv){char *p,*q;int i,s,m=0;for(p=argv[1];*p;p++){for(s=0,q=p;q-p<atoi(argv[2])&&*q;q++)s+=*q-'0';if(s>m){m=s;i=p-argv[1];}}printf("%d\n", i);}

Formatted:

#include <stdio.h>
main(int argc, char **argv) {
        char *p, *q;
        int i, s, m = 0;
        for (p = argv[1]; *p; p++) {
                for (s = 0, q = p; q - p < atoi(argv[2]) && *q; q++)
                        s += *q - '0';
                if (s > m) {
                        m = s;
                        i = p - argv[1];
                }
        }
        printf("%d\n", i);
}

Usage:

$ ./m1s 01001010101101111011101001010100010101101010101010101101101010010110110110 5
10

Ari Malinen

Posted 2015-01-22T19:08:37.070

Reputation: 121

1

CJam, 25 21 bytes

q~_,,{1$>2$<:+~}$(]W=

Test it here.

Takes input as an integer for the substring length, and an array of zeroes and ones as the sequence:

5 
[0 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0]

Explanation

q~_,,{1$>2$<:+~}$(p];
q~                    "Read and evaluate the input.";
  _,                  "Duplicate the sequence and get its length N.";
    ,                 "Get an array [0 1 ... N-1].";
     {         }$     "Sort this array stably by the result of the given block.";
      1$              "Copy the sequence.";
        >             "Slice off the first i bits.";
         2$           "Copy the substring length.";
           <          "Truncate the sequence.";
            :+        "Get the sum to find the number of 1s.":
              ~       "Bitwise complement in order to sort from highest to lowest.";
                 (    "Shift off the first index from the sorted list.";
                  ]   "Wrap the entire stack in an array.";
                   W= "Extract the last element (the result), discarding the rest.";

The result is printed automatically at the end of the program.

Note that I'm also considering slices that start closer to the end than the desired substring length, but that's okay, because they are substrings of the last valid substring and will therefore never have more 1s than that last valid substring.

Martin Ender

Posted 2015-01-22T19:08:37.070

Reputation: 184 808

1

Java 329 bytes

was going to implent a .matches(regex), but it would have been near identical to the python solutions above, so i tried a sliding window instead. new here, so if anyone has any pointers be glad to hear them.

public class ssMostOnes{
public static void main(String[] a){
    int b=0,w=0;
    for(int i=0;i<a[0].length()-Integer.valueOf(a[1]);i++){
        int c=a[0].substring(i,i+Integer.valueOf(a[1])).length() - a[0].substring(i,i+Integer.valueOf(a[1])).replace("1","").length();
        if(c>w){w=c;b=i;}
    }
    System.out.println(b);
}

}

Bryan Devaney

Posted 2015-01-22T19:08:37.070

Reputation: 51

Some tips: You can initialize i in the third line. Most of the whitespace can be removed. Use System.out.print( (no newline needed). Instead of Integer.valueOf(, you can use new Integer(. – Ypnypn – 2015-01-23T18:00:03.463