36
6
We all know the age-old debate - should strings be length-prefixed or should they be null-terminated? Well, you've run into a blob of binary data and apparently whoever made it decided that the best solution would be a compromise and length-terminated their strings.
For the purposes of this challenge, a length-terminated sequence is a series of one or multiple integers such that the last item in the sequence is equal to the number of items in the sequence, and no prefix of the sequence is also a valid length-terminated sequence. For example, [1]
, [15, 2]
, and [82, 19, 42, 111, 11, 6]
are all valid length-terminated sequences. [15]
is not a valid length-terminated sequence (it hasn't been terminated), and neither is [15, 2, 3]
(it should have ended at the 2
).
Your task is to write a program or function that takes a list of integers, and outputs all the valid length-terminated sequences within that list. For example, for the input list [7, 6, 5, 4, 3, 2, 1]
, valid length-terminated sequences would be [1]
, [3, 2]
, [5, 4, 3]
and [7, 6, 5, 4]
.
- You must output all valid length-terminated sequences found in the input, in any order. The order may vary between runs of your program. You must output no length-terminated sequences not found within the input, and no sequences found within the input that aren't valid length-terminated sequences.
- If some length-terminated sequence occurs multiple times within the input, it should also be output multiple times. The sequence
[9, 8, 3]
occurs twice in the input[9, 8, 3, 9, 8, 3]
(once starting with the first number, and again starting at the fourth number) and should thus be output twice. - A number may be part of more than one length-terminated sequence. For example, the input
[9, 9, 2, 4]
contains both the length-terminated sequences[9, 9, 2, 4]
and[9, 2]
and they should both be output. - You may assume all numbers are between 0 and 127 (both inclusive).
- You may assume the input contains no more than 126 numbers.
- You may not make any further assumptions about the input, including the assumption that the input contains at least one valid length-terminated sequence.
- You may take input or produce output by any of the default methods. Input and output formats are flexible (within reason) and the input format need not be the same as the output format. However, the your input and output formats must not depend on the input, and must not vary between runs of your program.
- This is code-golf - make your code as small as possible.
Examples
[input] -> [output], [output], ...
[7, 6, 5, 4, 3, 2, 1] -> [7, 6, 5, 4], [5, 4, 3], [3, 2], [1]
[90, 80, 70] -> (no output)
[1, 2, 3, 4, 5] -> [1]
[100, 0, 2, 2, 2, 0, 0, 2, 4] -> [0, 2], [2, 2], [2, 2], [0, 0, 2, 4], [0, 2]
[] -> (no output)
[54, 68, 3, 54, 68, 3] -> [54, 68, 3], [54, 68, 3]
[4, 64, 115, 26, 20, 85, 118, 9, 109, 84, 64, 48, 75, 123, 99, 32, 35, 98, 14, 56, 30, 13, 33, 55, 119, 54, 19, 23, 112, 58, 79, 79, 45, 118, 45, 51, 91, 116, 7, 63, 98, 52, 37, 113, 64, 81, 99, 30, 83, 70] -> [85, 118, 9, 109, 84, 64, 48, 75, 123, 99, 32, 35, 98, 14], [118, 9, 109, 84, 64, 48, 75, 123, 99, 32, 35, 98, 14, 56, 30, 13, 33, 55, 119, 54, 19, 23, 112, 58, 79, 79, 45, 118, 45, 51, 91, 116, 7, 63, 98, 52, 37], [109, 84, 64, 48, 75, 123, 99, 32, 35, 98, 14, 56, 30, 13, 33, 55, 119, 54, 19], [84, 64, 48, 75, 123, 99, 32, 35, 98, 14, 56, 30, 13], [14, 56, 30, 13, 33, 55, 119, 54, 19, 23, 112, 58, 79, 79, 45, 118, 45, 51, 91, 116, 7, 63, 98, 52, 37, 113, 64, 81, 99, 30], [45, 118, 45, 51, 91, 116, 7]
Interesting problem, but surely the definition of a string is that it can store arbitrary quantities of data retrievably? Your third clause breaks that explicitly by misreporting of a substring as a separate string. If constructing a string from the values "9, 9, 2" gives you the output "9,9,2,9", then by definition this is not storing data retrievably. – Graham – 2019-10-14T16:32:15.963
12Let's give the OP the benefit of the doubt that they're proposing a fun code-golf challenge and not arguing that the given data structure should be used in real life. – Greg Martin – 2019-10-14T17:34:31.557
2@Graham If you know where in a blob of arbitrary data your string begins (or, indeed, where it ends), you can absolutely retrieve it. If you don't, it doesn't matter how you represent your strings, you're almost certainly going to find segments of memory that look like strings but aren't the string you're looking for. – Sara J – 2019-10-14T17:35:05.217
(that is not to say this is a practical way to store data) – Sara J – 2019-10-14T17:35:53.707
@Graham In a system with length-prefixed strings, the bytes
2 3 4 5 6
contain two strings (3 4
and4 5 6
), both of which can be referenced with separate pointers to the first or second byte of the list. – Sparr – 2019-10-14T18:42:56.8031@SaraJ But if you have a pointer to the start of a string, and you have stored some data in that string, and that string has not since been corrupted, then there must be one and only one result from reading that string which is identical to the string you stored. I'm sure you didn't mean for this to be a practical way to store data - it's a fun code golf thing. :) My point is that it basically isn't a way to store data, because if you don't get out what you put in then you fundamentally haven't stored that data. – Graham – 2019-10-14T20:39:43.480
@Sparr That's not the same thing at all though. Given a pointer to the start of a length-prefixed string, there is one and only one interpretation of the string stored at that address, which is exactly what you stored. You don't have to be able to pick up at a random location in the middle, but you do have to be able to read back the whole string from the start. My point is that a pointer to the start of SaraJ's not-strings fundamentally cannot do that. (It's still a nice idea for a challenge, of course. :) – Graham – 2019-10-14T20:46:33.873
1@Graham Say you want to store the numbers
2,3,5
. You find a place where you can store numbers, and put the numbers2,3,5
there, and finish with a4
as a string terminator. You later want the numbers back, so you go back to where you started writing. You read2,3,5,4
and notice that the4
is also the length, so you stop - the data ended and it was2,3,5
. You unambiguously got exactly the data you stored. Yes there are limits on the data you can store (the n-th value can't be n) which makes it impractical, but for any data that can be stored in this format it seems to work fine? – Sara J – 2019-10-14T21:32:04.0302@SaraJ I don't think it's quite that. - my reading of your rules is that all values must be >= their index in the array. For any value <= its own index, that value will be incorrectly recognised as a terminator. You also have a problem with your [9, 9, 2, 4] example - the point of a terminator is that parsing terminates there, so it can't also return 9,9,2 unless it's going to scan the entire system memory looking for matches (or at least the next 126 bytes, which may break array bounds). – Graham – 2019-10-14T22:52:42.140
4@Graham The definition of a length terminated string clearly states that the terminator is exactly the length. You might also notice that in the
9,9,2,4
example, the possible strings (9,9,2,4
and9,2
if we include the terminators) start in different locations, so if you know where the string starts you do get an unambiguous value - the2
won't terminate9,9,2
but will terminate9,2
. The input contains zero or more strings in unknown locations because scanning the entire memory for possible matches is exactly what the challenge is about. – Sara J – 2019-10-15T07:19:45.217@SaraJ OK - still not totally convinced, but then code golf isn't exactly supposed to be practical, so fair enough. :) A good challenge anyway! – Graham – 2019-10-15T07:28:52.340
4@Graham You have the problem with this format exactly backwards. If you attempt to store
9 9 2
at some memory location it will result in the memory pattern9 9 2 4
, and reading the memory location later will unambiguously give you back9 9 2
. The problem is that if you try to store9 9 2 4 9
at some location, that will result in the memory pattern9 9 2 4 9 6
, but reading it back will give you9 9 2
(and the last 3 numbers will be leaked memory) because the 4 in the data is mistaken for a terminator. – Ben – 2019-10-16T06:11:54.4571@Graham This isn't fundamentally different from the problem with trying to store strings containing null bytes with null-terminated conventions. There are some strings you just can't store without using an encoding scheme that avoids using the invalid characters, and storing the encoded strings instead of the logical ones. – Ben – 2019-10-16T06:13:44.373
1@Graham Scanning through the input string to find all possible valid length-terminated subsequences is part of the code golf problem statement, not a description that the rules for reading this storage format should be to always look for sub-sequences that don't start at the memory location you're trying to read! – Ben – 2019-10-16T06:15:23.167