Extraction of Bits

7

2

Your task is to devise a programme that extracts the bits, in order, represented by the first argument based on the set bits represented by the second argument (the mask).

Formal Definition:

  • Let the first argument x represent the binary string S=(xn)(xn-1)...(x1) with n>=0 and x1 being the least significant digit and xn = 1.

  • Let the second argument y represent the binary string T=(ym)(ym-1)...(y1) with m>=0 and y1 being the least significant digit and ym = 1.

  • Let the result represent 0 if x represents 0 or y represents 0.

  • Let the number of 1 digits in T be p. Let {di}, i in [1, p] be the set of binary strings of the form (dim)(di(m-1))...(d1) defined by (for all i, (for all k, |{dik such that dik = 1}| = 1)) and (if 1 <= i < j <= p then 0 < di < dj) and (dp | dp-1 | ... | d1 = T). In other words, {di} is the ordered set of strings with a single 1 digit that makes up the string T when | is applied.

  • Let the result of the programme represent the following binary string:

    ((dp & S) << (p - 1))| ... | ((d2 & S) << 1) | (d1 & S)

Examples:

  • 1111, 110 -> 11
  • 1010, 110 -> 01
  • 1010, 10110 -> 01
  • 1101010111, 11111000 -> 01010

Input and Output:

  • The two arguments may be taken in any way
  • The two arguments need not be binary strings. They can also be integers which represent binary strings.
  • Answers which do not take input representing binary strings of arbitrary length are permitted.

Winning: Shortest code wins.

Clarifications:

  • If your output gives a bit string, order of bits is not important. Simply state the order in your answer. As stated before, the output of your programme need only represent the bits required.

justinpc

Posted 2014-09-01T11:58:51.963

Reputation: 261

related on SO: Mask and aggregate bits

– Bergi – 2017-08-25T03:38:48.070

1What's your objective winning criterion, i.e. how do you determine who wins? Shortest code, or something else? – Doorknob – 2014-09-01T12:19:07.400

I assumed that shortest code was the default criterion. I will add this to the post. – justinpc – 2014-09-01T12:50:52.393

Is there a maximum number of bits in each binary string? For example, can we assume that both inputs will be less than 100 characters long? – Doorknob – 2014-09-01T12:52:53.393

@Doorknob "Answers which do not take input representing binary strings of arbitrary length are permitted." ? – Martin Ender – 2014-09-01T12:55:17.863

As per the formal definition, there is no maximum input size. As per the third criterion of the Input and Output section, reasonable limits on input size may be imposed by the programmer, such as using only 32bit or 64bit input. – justinpc – 2014-09-01T12:56:12.950

1Your clarification allows reversed output. I assume it also allows reversed input? – John Dvorak – 2014-09-01T13:51:58.640

1Yes, if the input involves bit strings. – justinpc – 2014-09-01T14:05:58.033

Would you, could you, change the subscripting to use <sub></sub>? It won't work in backticks, but I think it would improve readability considerably. – Scott Leadley – 2014-09-01T14:21:57.010

In the first step of "Formal Definition:", I think n>=0 should be n>0 to disallow (x_0)(x_1). – Scott Leadley – 2014-09-01T14:22:15.857

Regarding subscripts: I would rather not take what is in backticks outside of backticks. On n>=: I don't see what you mean. Could you please explain? – justinpc – 2014-09-01T14:41:36.227

You could use something like <code>x<sub>0</sub></code>; I'll edit the post when I get home. – Doorknob – 2014-09-01T17:55:02.193

And the reward for the first sensible edit description in this question goes to... @MartinBüttner (thank you!) – John Dvorak – 2014-09-01T18:23:48.783

This really left itself open to exploitation of standard loopholes. That's why I downvoted. – Isiah Meadows – 2014-09-02T03:49:04.550

Let's just say that there was absolutely nothing prohibiting people from using specializes functions. Also, it was poorly defined enough that I lost interest rather quickly. It isn't a very clear specification (the ES6 specification is easier than this, and it's a full WIP language specification, with a lot of new features including module support, which is really complex). – Isiah Meadows – 2014-09-02T05:58:37.370

Answers

5

CJam, 9 bytes

eaz{)~*}%

This reads the bit strings (LSB first) as command line arguments and prints the result (also LSB first).

To try this code in the CJam interpreter, change ea to qS% and separate the strings with a space.

Example run

$ cjam <(echo 'eaz{)~*}%') 1110101111 00011111; echo
01011

How it works

ea         " Read both command line arguments and collects them in an array. ";
  z        " Interleave the characters of both strings.                      ";
   {   }%  " For each pair of characters:                                    ";
    )~     " Pop the last character and interpret it as an integer.          ";
      *    " Repeat the remaining strings that many times.                   ";
  • If the mask character is 0, the string consisting of the corresponding character is repeated zero times.

  • If the mask character or the corresponding character do not exist, popping a character will result in an empty string to be repeated.

In either case, the result will be an empty string.

Dennis

Posted 2014-09-01T11:58:51.963

Reputation: 196 637

I'm accepting this as it's the shortest, and doesn't use any loopholes. Sorry it took half a decade. – justinpc – 2019-02-08T17:29:48.060

11

Edit: APL, 1 character

/, the compression operator (Documentation).

Example of use:

0 0 1 1 1 1 1 0 0 0/1 1 0 1 0 1 0 1 1 1
0 1 0 1 0

Try it yourself at http://ngn.github.io/apl/web/index.html

The mask is left, and the input bitstring is right. Disclaimer: I'm not an expert in APL and have never used it; I only submitted this to substitute the disqualification of my machine-code solution below. I admit that a limitation of the code above is that the left and right bitstrings must be of equal length in APL, and lack the APL skills to fix this.

Rejected as loophole:

x86-64 Machine Code (BMI2, Haswell+), 5 bytes

c4 e2 f2 f5 d0 is the hex representation of the raw bytes of the program as they would appear in an executable x86-64 binary program.

This represents the single instruction pext %rax, %rcx, %rdx, which accepts S in rax, T in rcx and stores the result in rdx. Max 64 bits. Runs in 3 clock cycles on Haswell, has throughput of 1 per clock cycle.

xxx@xxx:~/Documents/SO> cat > bmi2.s
.text
pext %rax, %rcx, %rdx
xxx@xxx:~/Documents/SO> gcc bmi2.s -c -o bmi2.o
xxx@xxx:~/Documents/SO> objdump -d bmi2.o

bmi2.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:

0:   c4 e2 f2 f5 d0          pext   %rax,%rcx,%rdx

Of course, requires a pretty modern assembler.

The documentation for PEXT is much clearer than OP's description, so here it is: Page in Intel Software Developer Manual describing PEXT's operation

Iwillnotexist Idonotexist

Posted 2014-09-01T11:58:51.963

Reputation: 219

c4 e2 f2 f5 d0 is hex, not binary... – user253751 – 2014-09-01T23:00:53.343

1@immibis The CPU has to execute the raw bytes in order for them to do what they must do. When I want to say "take these values and consider them as raw bytes" I generally use "binary", in the sense of "raw binary data". Is there a better way to convey that meaning? I can't paste the actual bytes c4 e2 f2 f5 d0, so I've given their hex expression, but they must be actually "binary" when executed. – Iwillnotexist Idonotexist – 2014-09-01T23:11:49.510

@immibis I've changed the wording a bit, better? – Iwillnotexist Idonotexist – 2014-09-01T23:41:01.963

1

This is a standard loophole. I've updated it to not be as Mathematica-specific, but it equally applies here.

– Isiah Meadows – 2014-09-02T00:11:02.890

1@impinball How can I invite you to chat? I'd like to discuss your characterization of my solution as a mere loophole. – Iwillnotexist Idonotexist – 2014-09-02T00:51:37.667

I forgot to link it...it's http://meta.codegolf.stackexchange.com/a/1078/14668 (the answer).

– Isiah Meadows – 2014-09-02T01:22:53.753

@IwillnotexistIdonotexist Chat room created: http://chat.stackexchange.com/rooms/16851/discussion-between-iwillnotexist-idonotexist-and-impinball

– Isiah Meadows – 2014-09-02T01:25:14.530

If that single opcode is a loophole (which it is), then the / solution is even less acceptable. Both because it's a loophole and because it doesn't follow the spec. Sorry! – John Dvorak – 2014-09-02T03:37:34.263

1@impinball downvoted the loophole and it's now just below the threshold. If a one-op solution exists, it's now officially a fault of the question, not of the answer. – John Dvorak – 2014-09-02T03:44:18.380

@JanDvorak Yeah... I discussed it in chat already. I refrained from going after the second because it would unnecessarily escalate the situation. It also clearly counts, but you're right. It's a fault of the question, not the answer. – Isiah Meadows – 2014-09-02T03:47:07.163

1If we're allowed APL's / on its own, then surely we're allowed J's # on its own? :) – justinpc – 2014-09-02T05:58:12.477

3

Haskell, 68 characters

f[c:d,'0':b]=f[d,b];f[c:d,_:b]=c:f[d,b];f _="";main=interact$f.words

Prints nothing if the input isn't exactly two words.
The first argument can contain any characters. So can the filter; only 0 is treated as "not select".
If arguments are of different length, they are taken left-aligned (LSB first; explicitly allowed since revision #4).

Haskell function, 48 characters

f(c:d)('0':b)=f d b;f(c:d)(_:b)=c:f d b;f _ _=""

Subtract two more characters if "function of two integer arrays" is an acceptable signature

John Dvorak

Posted 2014-09-01T11:58:51.963

Reputation: 9 048

2

Golfscript, 54 51 characters

'0':x;{.,64\-x*\+}:b~:m;'X':x;b{m(\:m;48={;}*}%'X'-

Only works for binary strings of length 64 or less.

Explanation:

'0':x;       store '0' in the x variable (which will be used in the following)
{.,64\-x*\+} block that pads strings to a length of 64 with the character stored in x
:b~          store the block in b and execute, second argument (y) is now padded with '0's
:m;          store second argument as mask (m)
'X':x;b      pad the first argument with 'X's (will be removed later)
{...}%       for each character in the first argument (x), ...
  m(\:m;     grab and delete the first character of m (mask)
  48={...}*  if this character is equal to 48 (0 in ASCII)...
    ;        remove the digit. Hence, this if statement keeps a digit only if the mask is 1.
'X'-         finally, remove the 'X's added in step 5

Doorknob

Posted 2014-09-01T11:58:51.963

Reputation: 68 138

49={}{;}if -> 48={;}* – John Dvorak – 2014-09-01T13:31:11.497

@JanDvorak Thanks; that's pretty interesting. Execute the block n times, where n is a boolean (0 or 1), meaning that it will only execute the block if the boolean is true. Great trick! – Doorknob – 2014-09-01T13:33:59.390

Taking the input LSB-first is now allowed. Can it shorten your code? – John Dvorak – 2014-09-01T13:50:26.080

2

J, 16 Characters:

(#=@i.@#)@],@:#[

Usage:

1 0 1 0 1 1 ((#=@i.@#)@],@:#[) 0 1 1 1 0 0 -> 0 1 0

Explanation:

First argument is left, second argument is right (mask).

(#=@i.@#)@],@:#[ -> (# =@i.@#)@] ,@:# [
  • Take right argument, produce identity matrix of size length of right argument.
  • Use set bits of right argument to choose rows of produced identity matrix.
  • Use chosen rows of identity matrix to choose digits of left argument.
  • Concatenate the digits chosen from left argument.

J, 13 Characters

According to the rules, the form of output is not important. Therefore, if we omit the concatenation of the output digits, we can have the following shorter answer.

`(#=@i.@#)@]#[`

justinpc

Posted 2014-09-01T11:58:51.963

Reputation: 261

2

J, 1 character

#, the tally operator (Documentation).

Example of use:

0 0 1 1 1 1 1 0 0 0#1 1 0 1 0 1 0 1 1 1
0 1 0 1 0

This solution does not cater for the case when the length of the mask is less than that of the string. Given that the above might be seen as a loophole, we have the following solution.

J, 11 characters

[#~],0$~-&#

This solution is like the original #, but also caters for the case when the length of the mask is less than that of the string. Note here that the mask is the right argument, and not the left argument as in the first solution.

x ([ #~ ] , 0 $~ -&#) y
   x #~ (y , (0 $~ (x -&# y)))
   x #~ (y , (0 $~ ((# x) - (# y))))
   x #~ (y , (((# x) - (# y)) $ 0))
   x #~ (y , 0 0 ... 0 0)
   x #~ y 0 0 ... 0 0
   y 0 0 ... 0 # x
  • [ and ] choose x and y respectively.
  • x (u&v) y -> (v x) u (v y).
  • (x -&# y) -> (# x) - (# y).
  • # x -> length of x.
  • x v~ y -> y v x.
  • n $ 0 -> list of n 0s.
  • (0 $~ (x -&# y)) -> m 0s, where m is the difference between the length of the string and the length of the mask.
  • (y , 0 0 ... 0 0) -> the list y 0 0 ... 0 0 such that its length is equal to that of the string.
  • a # b -> the elements of b chosen by the binary mask a.

Even though this solution is shorter than my other solution of 16 characters, I think it still makes unfair use of #. In any case, I think it's still interesting.

justinpc

Posted 2014-09-01T11:58:51.963

Reputation: 261

+1. The question somewhat begs for the use of compression/reduction/fold/inject operators, but I find it interesting how you pad the mask with 0s in the APL-derived languages. – Iwillnotexist Idonotexist – 2014-09-02T12:00:31.373

@IwillnotexistIdonotexist The fun thing is that the # operator can actually do many more things, like representing a number in an arbitrary base. – FUZxxl – 2014-09-02T13:16:26.597

2

Haskell, 33 characters

((map snd.filter((>0).fst)).).zip

Usage

(((map snd.filter((>0).fst)).).zip) [1, 0, 1, 0] [1, 0, 0, 1]
[1, 0]

(((map snd.filter((>0).fst)).).zip) [1, 1] [1, 1, 0, 1]
[1, 1]

justinpc

Posted 2014-09-01T11:58:51.963

Reputation: 261

1

J, 10 characters, no use of tally #

This solution does not cater for input where mask is shorter than string.

;@:(<@$"0)

Usage

1 0 0 1 (;@:(<@$"0)) 1 0 1 0
1 0

Explanation

  • This makes no use of #, which on its own can be used to solve this golfing problem.
  • This solution works by interleaving both sides, one element by one. This is expressed by "0, which makes <@$ applied to each pair of elements.
  • x $ y means x ys as a list. 3 $ 4 -> 4 4 4.
  • < x puts x in a "box". A box can contain an object of any dimension (list, matrix, nothing). Boxes can be concatenated into lists.
  • 1 $ x -> one-dimensional list consisting of x.
  • 0 $ x -> nothing
  • 1 (<@$"0) x -> one-dimensional list consisting of x inside a box.
  • 0 (<@$"0) x -> nothing in a box.
  • ; takes a list of boxes and concatenates its contents, throwing away empty boxes.
  • u@:v composes u and v in such a way that u is applied only after v has been applied to the whole of the arguments. In this case ; is applied only after the right side of the @: has been applied to each interleaved pair x and y made from the arguments.

justinpc

Posted 2014-09-01T11:58:51.963

Reputation: 261