The code-golfer way of searching a library

15

3

Challenge:

I have thousands of songs in my music collection, and luckily for me, my favorite player has a search function. I also have a great memory—I can remember the title of every song in my collection. However, I'm very lazy and don't like to type—each extra keystroke is a chore!

  • What is the shortest string I must search for to isolate one song? Help me memorize a list of keys I can use to minimize typing when searching!

This is , so shortest code wins.


Rules:

Given an input list of song titles, generate a list of search keys subject to the following constraints:

  1. Each song title should have a search key.
  2. The total number of characters in the output list must be as few as possible.
  3. My favorite music player is foobar2000:
    • The search function is not case-sensitive. (apple is the same as aPpLE).
    • Each search key must consist of one or more "words", in any order, separated by spaces:
      • Each word must be a substring of the corresponding song title.
      • If the same substring is specified multiple times, then it must occur that many times in its corresponding song title.
      • If a substring itself contains a space, then that substring must be surrounded with quotes.

Hints:

  • Often, for some song titles, there are multiple search keys meeting rule 2. In such a case, any one key will do, but you get brownie points for listing them all.
  • You may assume that the input list will only ASCII characters, but brownie points will be awarded for UTF-8 compatibility.
  • Was Rule 3 hard to follow? Here's how it works:

+----------------------+ +--------+ +----------------+ +------------------------------------+
|         Input        | | Output | |   Statistics   | |             Explanation            |
|----------------------| |--------| |----------------| |------------------------------------|
|                      | | Search | |  Key   |  # of | |                                    |
|      Song title      | |  Key   | | length | words | |   Why is the output what it is?    |
|----------------------| |--------| |--------|-------| |------------------------------------|
| Wanna Be A Wanna Bea | | e e    | |      3 |     2 | |      ["Wanna Be A Wanna Bea"]      |
|                      | |        | |        |       | |   is the only string containing    |
|                      | |        | |        |       | |            ["e"] twice             |
|----------------------| |--------| |--------|-------| |------------------------------------|
| Wanta Bea A Wanna B  | | t ea   | |      4 |     2 | |       ["Wanta Bea A Wanna B"]      |
|                      | |        | |        |       | | is the only string containing both |
|                      | |        | |        |       | |          ["t"] and ["ea"]          |
|----------------------| |--------| |--------|-------| |------------------------------------|
| Wanta Be A Wanna B   | | t "e " | |      6 |     2 | |       ["Wanta Be A Wanna B"]       |
|                      | |        | |        |       | | is the only string containing both |
|                      | |        | |        |       | | ["t"] and ['e' preceding a space]  |
+----------------------+ +--------+ +----------------+ +------------------------------------+

Total number of characters in output (par): 13

Example:

If my music collection consisted only of two albums, Michael Jackson's Off the Wall and Thriler:

+--------------------------------+ +--------+ +----------------+
|             Input              | | Output | |   Statistics   |
|--------------------------------| |--------| |----------------|
|                                | | Search | |  key   | # of  |
|           Song title           | |  key   | | length | words |
|--------------------------------| |--------| |--------+-------|
| Don't Stop 'Til You Get Enough | | Do     | |      2 |     1 |
| Rock with You                  | | Ro     | |      2 |     1 |
| Working Day and Night          | | Wo     | |      2 |     1 |
| Get on the Floor               | | Fl     | |      2 |     1 |
| Off the Wall                   | | ff     | |      2 |     1 |
| Girlfriend                     | | lf     | |      2 |     1 |
| She's out of My Life           | | Sh     | |      2 |     1 |
| I Can't Help It                | | Ca     | |      2 |     1 |
| It's the Falling in Love       | | v      | |      1 |     1 |
| Burn This Disco Out            | | Bu     | |      2 |     1 |
| Wanna Be Startin' Somethin'    | | nn     | |      2 |     1 |
| Baby Be Mine                   | | Ba     | |      2 |     1 |
| The Girl Is Mine               | | G M    | |      3 |     2 |
| Thriller                       | | hr     | |      2 |     1 |
| Beat It                        | | Bea    | |      3 |     1 |
| Billie Jean                    | | J      | |      1 |     1 |
| Human Nature                   | | Hu     | |      2 |     1 |
| P.Y.T. (Pretty Young Thing)    | | .      | |      1 |     1 |
| The Lady in My Life            | | La     | |      2 |     1 |
+--------------------------------+ +--------+ +----------------+

Total number of characters in output (par): 37

You may use the lists above to test your program. Here is the raw version of the second list:

["Don't Stop 'Til You Get Enough","Rock with You","Working Day and Night","Get on the Floor","Off the Wall","Girlfriend","She's out of My Life","I Can't Help It","It's the Falling in Love","Burn This Disco Out","Wanna Be Startin' Somethin'","Baby Be Mine","The Girl Is Mine","Thriller","Beat It","Billie Jean","Human Nature","P.Y.T. (Pretty Young Thing)"]

ayane

Posted 2017-04-30T00:14:09.090

Reputation: 929

1Do you have an example that requires multiple strings for a key? – Jonathan Allan – 2017-04-30T01:06:05.377

1How about ["Wanta Be A Wanna B","Wanta Bea A Wanna B","Wanna Be A Wanna Bea"]? – Jonathan Allan – 2017-04-30T01:11:11.270

...but what should they / could they be if no spaces are allowed in the substrings themselves - note that all whole words collide. – Jonathan Allan – 2017-04-30T01:22:05.903

Why is the raw version in a spoiler? – Leaky Nun – 2017-04-30T02:38:45.887

What is the length of the output (and the individual parts) in the multi-string example? What do we do with a (or even multiple) literal " in our output parts (or are we guaranteed not to receive input containing them)? – Jonathan Allan – 2017-04-30T03:19:01.910

@JonathanAllan The input will never contain a " character; each " in the the output is treated as one extra character. I also updated the example to show how length is counted. – ayane – 2017-04-30T03:41:37.230

Example 2 is not optimal, nt "e " is shorter. And nn nn is the same as just nn, because you specified no order/multiple rules. – orlp – 2017-04-30T12:23:37.023

Just type one character at a time until the one you want is the only one that shows up. – Esolanging Fruit – 2017-05-02T01:57:50.000

1Related – Robert Fraser – 2017-05-02T19:04:44.560

Does the order of the words in the search key matter? That is, would "M G" also be a valid search key for "The Girl is Mine"? – RootTwo – 2017-05-03T18:18:00.080

@RootTwo No, order doesn't matter. – ayane – 2017-05-03T20:31:06.573

Answers

4

Python 2, 556 bytes

Try it online.

-10 bytes, thanks to @Riker, @ovs

Took me couple evenings to make everything works.
Outputs song name, array of search keys and lenght of search keys combined (including spaces and quotes)

import re
S=map(str.lower,input())
T=len
for s in S:
 y=s;n=T(s)
 def R(r,t):
    global n,y
    l=T(' '.join(t))+2*sum(map(lambda x:' 'in x,t))
    if l>n:return
    if(lambda K:T([s for s in S if T(s)-T(reduce(lambda x,y:re.sub(re.escape(y),'',x,1),K,s))==T(''.join(K))])==1)(t)and l<n:y=t;n=l
    u=[(lambda s,n:n and[''.join(y) for y in eval('zip(s%s)'%(''.join(',s[%s:]'%-~x for x in range(n))))]or list(s))(r,i)for i in range(T(r))]
    for i in range(T(r)):
     for j in range(T(r)-i):R(r[j+T(u[i][j]):],t+[u[i][j]])
 R(s,[])
 print[' 'in x and'"%s"'%x or x for x in y]

Some explanation:

T=len

Function len() used very often here, so this renaming saves bytes


L=lambda s,n:n and[''.join(y) for y in eval('zip(s%s)'%(''.join(',s[%s:]'%-~x for x in range(n))))]or list(s)

Evaluates all possible substrings of string s lenght n.
eval(...) creates command zip(s,s[1:],s[2:],...,s[n:])
It creates substrings of lenght n from every index of s if possible. So for s='budd' and n='2' it will produce bu, ud, dd


F=lambda K:len([s for s in S if len(s)-len(reduce(lambda x,y:re.sub(re.escape(y),'',x,1),K,s))==len(''.join(K))])==1

Filter to check if provided keys (K) are for unique song name.
re.sub is requered for multiple identical keys, like ['nn','nn'] in examples.


Inner function def R(r,t) is a recursive one to create all possible combinations of substring, that may describe song name.
Every combination is compared with currently shortest one (if there was any) to lower number of created combinations - if it is larger, it won't be accepted as all of it derivatives.
Function uses 2 variable to track state: n for lenght of current shortest key combination and y for combination itself


l=T(' '.join(t))+2*sum(map(lambda x:' 'in x,t))

This calculates length of key combination. ' '.join add spaces between keys and 2*sum(...) calculates number of needed quotes for keys with spaces.


u=[L(r,i)for i in range(0,T(r))]

Uses first lambda function to get all possible key combination (of every possible length) for current string.


Two for cycles to look through all generated keys and pass them individually to the next recursive step. Key place (j) is needed to correctly slice string at the end of it: r[j+T(u[i][j]):].
Slice provides string, that started where current key ends, so there won't be any overlaps.
If place is unknown, equal keys would mess everything up.


[' 'in x and'"%s"'%x or x for x in y]

Much longer than just y, but keys with spaces should be surrounded by quotes

Dead Possum

Posted 2017-04-30T00:14:09.090

Reputation: 3 256

This... is amazing. You're the first one to get rule 3 right! – ayane – 2017-05-04T16:20:00.910

1By the way, you should be able to shave off two bytes by removing the 0, in one of your ranges: u=[L(r,i)for i in range(0,T(r))] => u=[L(r,i)for i in range(T(r))]. – notjagan – 2017-05-04T16:41:42.887

1You could save a few more bytes: in your output, you don't have to show the input strings and the size of the output strings. – ayane – 2017-05-04T20:34:19.557

@彩音M Thanks! I've trimmed this few bytes from range and output. – Dead Possum – 2017-05-05T10:19:50.680

You can replace the S=[x.lower()for x in input()] with S=map(str.lower,input()), for 4 less bytes (I think). – Rɪᴋᴇʀ – 2017-05-06T00:34:04.873

1S=map(str.lower,input()) for -5 bytes – ovs – 2017-05-08T17:41:05.200

Congratulations! You win the bounty. (Although you had the only answer...haha!) – ayane – 2017-05-09T01:31:42.310

@Riker bytes saved by replacing spaces with tabs, thats why it was 566 and not 587 – Dead Possum – 2017-05-16T15:59:53.090

@DeadPossum ah, sorry. Didn't think of that. (curse SE's tab collapsing..) Hope skipping the lambdas helped. – Rɪᴋᴇʀ – 2017-05-16T16:02:53.803

@Riker Yeah, it saved bunch of bytes, thank you! – Dead Possum – 2017-05-16T16:49:12.300