Optimizing swiping across a 1D keyboard

16

2

This is a with a custom scoring system, where the lowest score wins.

Introduction

Many smartphones allow to enter text by swiping your finger across the 2D virtual keyboard. This technology is usually combined with a prediction algorithm that outputs a list of guessed words, sorted from most likely to least likely.

In this challenge:

  • We are going to swipe across a one-dimensional keyboard restricted to a subset of the 26 letters.
  • There will be no prediction algorithm: we want each word to be uniquely identified by its 'swipe sequence'.
  • We want the keyboard to be optimized in such a way that the total number of moves for a given list of words is minimized.

Swiping in one dimension

Below is a lexicographically sorted 1D keyboard with all letters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

NB: This may be displayed on several rows if you're browsing from mobile. Please think of it as a single row.

To enter the word 'GOLF' on such a keyboard, we will:

  • start at G
  • swipe to the right to O
  • swipe to the left to F

Because L is located between O and F, we just go on swiping without stopping there.

So the swipe sequence of 'GOLF' on this keyboard is GOF.

More generally:

  • The first and last letters are always included.
  • Other letters are included if and only if a direction change is required just after them.
  • Repeated letters must be treated the same way as single letters. For instance, on the above keyboard:

    • 'LOOP' would be encoded as LP (with no stop on O)
    • 'GOOFY' would be encoded as GOFY (the O is included because there's a direction change there -- not because it's doubled)

Keyboard optimization

Let's consider the following list of words: ['PROGRAMMING', 'PUZZLES', 'AND', 'CODE', 'GOLF'].

We need 16 distinct letters to type these words, so we just need a 16-letter keyboard. The following one is -- again -- lexicographically sorted:

ACDEFGILMNOPRSUZ

With this keyboard, the words will be encoded this way:

  • PROGRAMMING: PRGRAMING (9 moves)
  • PUZZLES: PZES (4 moves)
  • AND: AND (3 moves)
  • CODE: CODE (4 moves)
  • GOLF: GOF (3 moves)

That's a total of 23 moves for all words.

But we can do much better with this keyboard:

CGODSELZNUIFRPAM

Which gives:

  • PROGRAMMING: PGMG (4 moves)
  • PUZZLES: PS (2 moves)
  • AND: AD (2 moves)
  • CODE: CE (2 moves)
  • GOLF: GF (2 moves)

for a total of only 12 moves.

Keyboard scoring

For a given list of \$n\$ words, the score of a keyboard is given by:

$$\sum_{k=1}^{n}{m_k}^2$$

where \$m_k\$ is the number of moves for the \$k\$-th word.

For instance, the two keyboards described in the previous paragraph would be scored \$9^2+4^2+3^2+4^2+3^2=131\$ and \$4^2+2^2+2^2+2^2+2^2=32\$ respectively.

The lower, the better.

The challenge

  • Given a list of words, your code must output a valid keyboard for this list. A keyboard is considered valid if each word generates a unique swipe sequence.
  • You'll be given 11 independent lists of words. Your score will be equal to:

    $$S + L$$ where \$S\$ is the sum of the scores of your keyboards for all lists and \$L\$ is the length of your code in bytes.

    You can use this script to check your score. The score() function expects your code length as first parameter and an array of 11 keyboard strings as second parameter (the case doesn't matter).

  • The submission with the lowest score wins. In case of a tie, the submission that was submitted first wins.

Additional rules

  • Your code must be deterministic (i.e. it must always return the same output for a given input).
  • You must either A) provide a test link (e.g. on TIO) that does not time out, or B) include the generated keyboards within the body of your answer.
  • You may take the words in either full uppercase or full lowercase. Mixed cases are forbidden.
  • The input is guaranteed to have at least one solution.
  • All words are made of at least 2 distinct letters.
  • Your code must work for any valid input. It will be tested with an undisclosed list of words to make sure that it's not relying on hard-coded results.
  • I reserve the right to increase the size of the test suite at any time to make sure that the submissions are not optimized for the initial test cases.

Word lists

1) Sanity check #1 (only 4 valid solutions: HES, SEH, ESH or HSE)
SEE, SHE

2) Sanity check #2 (16 valid solutions, of which 4 are optimal: COLD, DOLC, DLOC or CLOD)
COLD, CLOD

3) Sanity check #3
ACCENTS, ACCESS

4) Warm-up
RATIO, NATION, NITRO, RIOT, IOTA, AIR, ART, RAT, TRIO, TRAIN

5) Pangram
THE, QUICK, BROWN, FOX, JUMPS, OVER, LAZY, DOG

6) Common prepositions
TO, OF, IN, FOR, ON, WITH, AT, BY, FROM, UP, ABOUT, INTO, OVER, AFTER

7) Common verbs
BE, HAVE, DO, SAY, GET, MAKE, GO, KNOW, TAKE, SEE, COME, THINK, LOOK, WANT, GIVE, USE, FIND, TELL, ASK, WORK, SEEM, FEEL, TRY, LEAVE, CALL

8) Common adjectives
GOOD, NEW, FIRST, LAST, LONG, GREAT, LITTLE, OWN, OTHER, OLD, RIGHT, BIG, HIGH, DIFFERENT, SMALL, LARGE, NEXT, EARLY, YOUNG, IMPORTANT, FEW, PUBLIC, BAD, SAME, ABLE

9) Common nouns
TIME, PERSON, YEAR, WAY, DAY, THING, MAN, WORLD, LIFE, HAND, PART, CHILD, EYE, WOMAN, PLACE, WORK, WEEK, CASE, POINT, GOVERNMENT, COMPANY, NUMBER, GROUP, PROBLEM, FACT

10) POTUS
ADAMS, ARTHUR, BUCHANAN, BUREN, BUSH, CARTER, CLEVELAND, CLINTON, COOLIDGE, EISENHOWER, FILLMORE, FORD, GARFIELD, GRANT, HARDING, HARRISON, HAYES, HOOVER, JACKSON, JEFFERSON, JOHNSON, KENNEDY, LINCOLN, MADISON, MCKINLEY, MONROE, NIXON, OBAMA, PIERCE, POLK, REAGAN, ROOSEVELT, TAFT, TAYLOR, TRUMAN, TRUMP, TYLER, WASHINGTON, WILSON

11) Transition metals
SCANDIUM, TITANIUM, VANADIUM, CHROMIUM, MANGANESE, IRON, COBALT, NICKEL, COPPER, ZINC, YTTRIUM, ZIRCONIUM, PLATINUM, GOLD, MERCURY, RUTHERFORDIUM, DUBNIUM, SEABORGIUM, BOHRIUM, HASSIUM, MEITNERIUM, UNUNBIUM, NIOBIUM, IRIDIUM, MOLYBDENUM, TECHNETIUM, RUTHENIUM, RHODIUM, PALLADIUM, SILVER, CADMIUM, HAFNIUM, TANTALUM, TUNGSTEN, RHENIUM, OSMIUM

Arnauld

Posted 2018-12-19T15:28:00.883

Reputation: 111 334

Sandbox (now deleted). – Arnauld – 2018-12-19T15:31:04.290

Guess who else knows the struggle... – Erik the Outgolfer – 2018-12-19T16:20:07.557

If you're going to include this rule: "Your code must run in less than 1 minute for each list", it might be good to specify where it will be running. Code that runs on one computer in one minute could take hours on another. – mypetlion – 2018-12-19T21:32:28.647

@mypetlion What really matters here is that the code actually outputs something (and not just runs forever), so I've relaxed this rule. – Arnauld – 2018-12-19T21:59:37.670

"A keyboard is considered valid if each word generates a unique swipe sequence." - what does unique mean here? for instance, is alphabetic order an invalid solution for the words 'abda','acda'? – ngn – 2018-12-20T07:55:20.580

@ngn Indeed. These 2 words would have the same swipe sequence ADA, making them undistinguishable. So this keyboard is invalid for this list of words. – Arnauld – 2018-12-20T09:17:00.337

Answers

5

Python 3 + Google OR-Tools, 1076 + 1971 = 3047

This always finds optimal solutions (but spends a lot of code to do so). It runs tests 1–9 in a few seconds, test 10 in six minutes, and test 11 in one minute.

Code

from ortools.sat.python.cp_model import*
from itertools import*
C=combinations
R=range
L=len
T=lambda w:[*zip(w,w[1:],w[2:])]
W=[(*(g[0]for g in groupby(w)),)for w in input().split()]
K={*sum(W,())}
m=CpModel()
V=m.NewBoolVar
B={c:V(f"B{c}")for c in C(K,2)}
for a,b in[*B]:B[b,a]=B[a,b].Not()
for a,b,c in permutations(K,3):m.AddBoolOr([B[a,b],B[b,c],B[c,a]])
M={t:V(f"M{t}")for t in{*sum(map(T,W),[])}}
for a,b,c in M:m.AddBoolXOr([B[a,b],B[b,c],M[a,b,c].Not()])
N={(w,n):V(f"N{w,n}")for w in W for n in R(1,L(w))}
for w in W:
 for n in R(1,L(w)-1):s=sum(M[t]for t in T(w));m.Add(s>=n).OnlyEnforceIf(N[w,n]);m.Add(s<n).OnlyEnforceIf(N[w,n].Not())
for a,b in C(W,2):
 if(a[0],a[-1])==(b[0],b[-1]):m.AddForbiddenAssignments([M[t]for t in T(a)+T(b)],[[x in X for x in R(L(a)-2)]+[y in Y for y in R(L(b)-2)]for n in R(L(a))for X in C(R(L(a)-2),n)for Y in C(R(L(b)-2),n)if[a[x+1]for x in X]==[b[y+1]for y in Y]])
m.Minimize(sum((2*n+3)*N[w,n]for w,n in N))
s=CpSolver()
s.Solve(m)
o={k:0for k in K}
for c in C(K,2):o[c[s.Value(B[c])]]+=1
print(*sorted(K,key=lambda k:o[k]),sep="")

Result

  1. SEH, 13
  2. DOLC, 20
  3. TNSECA, 13
  4. RATION, 80
  5. TYKCIDBRFHJUEVOXWNGZALQMPS, 32
  6. REWINTHUVOFABMPY, 66
  7. FYCWORTMHAGINDKVESULB, 125
  8. TSHRDABXLYOWUPMIENGCF, 213
  9. PVKEFDLBMUSWOIHACNYTRG, 212
  10. XHGTPMCKSUABYORDLJEIWNFV, 596
  11. PYLFNAVEKBOCHTRGDSIZUM, 601

Check score

Anders Kaseorg

Posted 2018-12-19T15:28:00.883

Reputation: 29 242