7
1
Introduction
Inspired by Dyalog Ltd.'s 2016 student competition. The challenge there is to write good APL code, but this is a code-golf and it is open to all languages. Derived from this closed challenge.
The real challenge here is to combine the various (quite trivial) parts of the problem in a byte-efficient way by reusing values and/or algorithms.
If you find this challenge interesting and you use Dyalog APL (free download here), you may also want to participate in the original competition, but note that the objective there is to submit quality code – not necessarily short code. If you are a student, you can win USD 100 for Phase I. Phase II (not related to this code golf challenge) can win you up to USD 2000 and a free trip to Scotland!
Task
Take two strings (of any 7-bit ASCII characters) by any means and order (but please include input format in your answer). You may take them as separate arguments/inputs or as a single list of strings. One string contains the delimiters in the other. E.g. given ',;/'
and 'hello,hello;there;my/yes/my/dear,blue,world'
, the sub-strings are ['hello', 'hello', 'there', 'my', 'yes', 'my', 'dear', 'blue', and 'world']
.
You will do some statistics and ordering based on the lengths of those sub-strings that are not duplicated in the input. (You may chose to consider casing or not.) In the example given, those are ['there', 'yes', 'dear', 'blue', 'world']
.
Give (by any means) a list of four elements in the overall structure [Number_1, Number_2, [List_3], [[List_4_1], [List_4_2]]]
:
- The mean of the lengths of the non-duplicated strings. In our example, that is
4.2
- The median of the lengths of the non-duplicated strings. In our example, that is
4
. - The mode(s) of the lengths of the non-duplicated strings. In our example, that is
[4, 5]
. - The original sub-strings (including duplicates), divided into two lists; the first are those that are not one of the above mode lengths, and the second are those that are. In our example:
[['my', 'my', 'yes'], ['dear', 'blue', 'hello', 'hello', 'there', 'world']]
The combined final answer of our example case is: [4.2, 4, [4, 5], [['my', 'my', 'yes'], ['dear', 'blue', 'hello', 'hello', 'there', 'world']]]
Considerations
Don't forget to reuse code and values. E.g. the median is the average of the 1 or 2 middle elements, and you'll need the mode(s) to reorder the original list.
Watch out for division by zero; an empty input means one string with length zero, so the mean length is zero. If there are no non-unique sub-strings, return zero for both mean and median, an empty list for modes, and put all original sub-strings onto the non-mode list.
The list of modes, and the sub-lists of non-mode-length strings, and mode-length strings, do not need to be sorted in any way.
If all non-duplicated sub-strings have different lengths, then all the lengths are considered modes.
Wherever the term list is used, you may use any collection of multiple data pieces that your language supports, e.g. tuples, arrays, dictionaries, etc.
The numeric values only need to be within 1% of the actual value.
Example cases
Example A
',;/'
'hello,hello;there;my/yes/my/dear,blue,world'
gives
[4.2, 4, [4, 5], [['my', 'my', 'yes'], ['dear', 'blue', 'hello', 'hello', 'there', 'world']]
Example B
'+|'
''
gives
[0, 0, [0], [[], ['']]]
Note: while there are two delimiter characters, none of them occur in the data string, so this is just one empty string. Mean, median, and mode are therefore zero There are no strings in the non-mode list and just the single empty string in the mode list.
Example C
''
'ppcg.se.com'
gives
[11, 11, [11], [[], ['ppcg.se.com']]]
Note: no delimiters, so the entire input is one string.
Example D
',;'
',math,,math;'
gives
[0, 0, [], [['', 'math', '', 'math', ''], []]]
Note: there are three empty sub-strings, and since all sub-strings have at least one duplicate, the mean and median default to zero, as per Considerations, above.
Example E
',;'
'new,math,math;!;'
gives
[1.333, 1, [0,1,3], [['math', 'math'], ['new', '!', '']]]
Note: there is only one of each length among the non-duplicated sub-strings, so all three lengths are modes.
You can verify my explicit permission to post this challenge here by contacting the author of the original competition.
I presume the delimiters are always single characters? And that when you say "list" you also include tuples, in particular for the sake of strongly typed languages which require all elements of a list to have the same type? – Peter Taylor – 2016-08-11T10:41:04.153
@PeterTaylor The inputs are two plain strings, so the delimiters can only be single characters. I'm not sure I understand what the problem about tuples vs lists. But if you want to use tuples instead of lists, go ahead. – Adám – 2016-08-11T10:48:04.343
Ugh. The additional handling required for supporting empty substrings (like Example D) is brutal. :-/ – AdmBorkBork – 2016-08-11T16:30:10.740
The first number of Example A should be 4 – Leaky Nun – 2016-08-11T17:55:22.427
@TimmyD It is 2 not 3 – Leaky Nun – 2016-08-11T18:12:27.670
@TimmyD It is
'oh'
not'yes'
– Leaky Nun – 2016-08-11T18:45:22.947@LeakyNun Thanks, fixed. It was supposed to be the same as in the walk-through, but I changed the walk-through and forgot to update the example. – Adám – 2016-08-11T19:55:13.797
Could
^
,]
,-
or\
be delimiters? – Neil – 2016-08-11T23:32:58.827@Neil any 7-bit ASCII. I've edited that in. Thanks. – Adám – 2016-08-12T05:04:43.447