Implement t9-like functionality

10

1

Your challenge today is to implement a t9-like functionality.

You will implement a function that will only have 2 parameters.
You will receive 1 phone number in a string and the content of a text file with a list of words (don't assume a specific newline style).
You can use the link https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt to test the functionality, or use /usr/share/dict/words (check A text file with a list of words [closed] for more information).

You can assume that you will always receive at least 2 numbers.

Given the number, you will read from a list of words and returns the words starting with the letters mapping to those words. This means that the input should be only numbers from 2 to 9.
You can do whatever you want if you receive invalid input.

If no match is found, you can return an empty list, null/nil or 0.

Remember that the cellphone keys are mapped to their equivalent chars:

  • 0 and 1 are invalid
  • 2 matches [abc]
  • 3 matched [def]
  • 4 matches [ghi]
  • 5 matches [jkl]
  • 6 matches [mno]
  • 7 matches [pqrs]
  • 8 matches [tuv]
  • and 9 matches [wxyz]

Examples:

f('52726')
//returns ["Japan","japan","Japanee","Japanese","Japanesque"...,"larbowlines"]

f('552')
//returns ["Kjeldahl","kjeldahlization","kjeldahlize"...,"Lleu","Llew"]

f('1234')
//makes demons fly out your nose or divide by 0

f('9999')
//returns ["Zyzzogeton"]

f('999999')
//returns [] or null/nil or 0

After you run your function, you can print it in any way you wish.

Rules:

  • Standard loopholes are INVALID
  • You must return something, even if it is null/nil
    Javascript will return undefined if you don't return something, therefore this rule.
  • You cannot use or re-implement other's answers or copy my implementation.
  • You can assume, for Javascript, that the browser will be already opened and that the innerText/textContent of the automatic element will be passed as the 2nd parameter
  • For compiled languages, you cannot pass special arguments to the compiler
  • You can receive the file name over compiler arguments
  • Variables, macros, global variables, constants, non-standard classes and all the sort passing other values inside the function will be considered invalid.
  • In Javascript, variables without the keyword var render your code invalid
  • Your function will be named f
  • You can only and only have 2 arguments on your function
  • Try to keep your code under 500 seconds to run.
  • You don't have to worry about whitespace
  • You must use only ASCII printable characters.
    Exceptions are languages that only use non-printable characters (APL and whitespace are 2 examples).

Scoring:

  • Lowest number of bytes win
  • Having invalid ASCII printable characters in your answer, will count as the answer being encoded in UTF-32
    The exception to the encoding will make your answer be count by characters.
  • Only the function body counts, don't count anything else you do outside it
  • Bonus of -30% if you make a prediction system based on the neighborhood or most common words
  • Bonus of -20% in size if you only return the first 5 matches for each letter corresponding to the first number (e.g.: 245 would returns 5 words starting with 'a', 5 starting with 'b' and 5 starting with 'c').

Here is an example of an implementation, using Javascript:

function f(phone, words)
{
    var keypad=['','','abc','def','ghi','jkl','mno','pqrs','tuv','wxyz'];
    var regex='';

    for(var i=0,l=phone.length;i<l;i++)
    {
        regex+='['+keypad[phone[i]]+']';
    }

    var regexp=new RegExp('\\s('+regex+'[a-z]*)\\s','gi');

    return words.match(regexp);
}

To run it, open the list link and run, for example:

f('9999',document.getElementsByTagName('pre')[0].innerText);
//returns [" Zyzzogeton "]

This example was tested and works under Opera 12.17 64bits on Windows 7 Home Edition 64bits.

Ismael Miguel

Posted 2014-12-06T20:01:19.500

Reputation: 6 797

Is the second argument to the program a file name containing the words or the list of words itself ? – Optimizer – 2014-12-06T20:54:53.700

@MartinBüttner UTF-8 isn't unfair (it still counts ASCII chars as being 1 byte), but I changed the rule. – Ismael Miguel – 2014-12-06T20:57:29.257

@Optimizer The 2nd argument is a list of the words. You can pass the filename over a compiler argument and read the file, if you want to. But the only thing that counts is the function body. – Ismael Miguel – 2014-12-06T20:58:24.357

@MartinBüttner By counting as ASCII, it is being counted as bytes. You want me to say that APL code will have 1 byte having the size of 8 bits? – Ismael Miguel – 2014-12-06T21:00:06.017

@MartinBüttner I think it is more fair now. The required encoding is specified (printable ASCII characters) but APL answers will be count by characters. – Ismael Miguel – 2014-12-06T21:06:20.717

@MartinBüttner Now it is more fair: All languages that only use non-printable ascii chars are counted by chars. All the others are counted by bytes. – Ismael Miguel – 2014-12-06T21:09:44.917

Can you explain your first bonus a bit more please ? – Optimizer – 2014-12-06T21:41:05.847

I'm pretty sure space, tab and newline (which are the only characters Whitespace uses) are printable. Also, correct me if I'm wrong, but by "You don't have to worry about whitespace" I assume you mean whitespace doesn't count. In that case, a Whitespace answer would instantly win with a score of 0. – nyuszika7h – 2014-12-06T21:48:05.473

@Optimizer The t9 function tries to predict the most used words. You can use this to, for example, list the most commonly used words in english. An example would be to only return ["Japan", "japan", "Japanese"] with the example number "52726". – Ismael Miguel – 2014-12-06T21:48:10.990

@nyuszika7h The whitespace being returned don't count. you can return ["a"] or [" a "]. And you made a nice point about whitespace. In this case, what I am referring to is characters that show when you print. A space won't be printed, unless you put a background from a different color than the paper (some printers optimize its printing based on the paper color), but you don't have to go on this extend. – Ismael Miguel – 2014-12-06T21:51:16.953

2-1 for inappropriate restrictions – AJMansfield – 2014-12-08T03:47:31.520

@AJMansfield "-1 for inappropriate restrictions" --> please, develop more about this. This type of comment is as useful as going to a concert with an air guitar. – Ismael Miguel – 2014-12-08T04:20:05.030

@IsmaelMiguel Specifically, rules 2, 3, 5, 6, 7, 8, and 9 are all inappropriate. Additionally, rules 4, 12, S2, and S3 are also poor. – AJMansfield – 2014-12-09T14:10:36.827

@AJMansfield rule 2: if it doesn't return something, it isn't a function. 3: you want to copy people's answers? 5: the goal is to create a function, not a full-blown program. 6: that is to enforce the scoring, to avoid cheaters. 7: refer to rule 6. 8: this is to avoid multiple function names. 9: check the implementations: you only need 2 arguments. 'poor rules': 4: indeed the wording is poor, but check my javascript implementation for the reason why. 12: indeed it is badly worded. s2: refer to rule 13. s3: the goal is to make a function, this is FAIR – Ismael Miguel – 2014-12-09T14:24:51.240

@IsmaelMiguel Which leads to the heart of the issue. Limiting allowed programs to just functions is unreasonable, as that makes a large class of possible solutions 'invalid', such as shell scripts or command-line applications. Those rules get in the way of the fun of making super-optimized programs. – AJMansfield – 2014-12-09T14:30:13.217

So 2 is invalid because the goal shouldn't be to make a function. 3 is invalid because it means that I can't post a super-tightly optimized version of another solution. 5 is simply wrong. 6 is already covered by standard loopholes, and your version of it is bad. Same for 7, plus the 'function' thing. 8 is overly restrictive. 9 but that may preclude certain types of optimization, such as multi-argument recursion. 4 is just an excuse to let you shorten your code. You should just use one of the regular scoring rules; S2 is worse than any of them. S3 is bad because of the function thing. – AJMansfield – 2014-12-09T14:41:44.120

@AJMansfield 2 is perfectly fine. Look up at the tags. 3 is perfectly fine too. The goal is exactly that. If you want to answer, make your own answer. 5 is fine: no need for compiler arguments when all you have to do is a function. Without the rule 8, I would have to count the whole code and making a function wouldn't make any sense. 9: if you want recursion, use eval or lambda functions. 4 is not an excuse but a facility. If you post an answer in javascript, you can count that the page will already be opened on the file. s3 is perfect. you can make shorter functions with this. – Ismael Miguel – 2014-12-09T20:40:39.170

Answers

3

CJam, 28 bytes

q~{el{'h-_9/-D+3/}%s1$#!},p;

Takes input in the form of "<number>" [<list of words>]

Example:

"52726" ["Japan" "japan" "Japanee" "Japanese" "Japanesque" "larbowlines" "ablution" "ablutionary" "abluvion" "ably" "abmho" "Abnaki" "abnegate"]

Output:

["Japan" "japan" "Japanee" "Japanese" "Japanesque" "larbowlines"]

Not going for any bonus for now.

Try the code online here but for actual time measurements, run it on the Java compiler

Note that CJam represents empty lists like ""

To convert the raw wordlist into CJam list use the following code with the wordlist as input:

qN/p

Optimizer

Posted 2014-12-06T20:01:19.500

Reputation: 25 836

"You will receive 1 phone number in a string and the content of a text file with a list of words" -> can you implement, on a different block, the needed code to read the file into an usable list? – Ismael Miguel – 2014-12-06T21:22:04.593

@IsmaelMiguel You mean not the part of this code, but just a helper code to convert the list to correct format ? – Optimizer – 2014-12-06T21:22:55.727

Exactly. Which your code isn't enough to prove it can use a list of words as the provided examples. But I upvoted anyway, I just wanted that helper code. – Ismael Miguel – 2014-12-06T21:24:19.067

Can you add it to the answer? As an edit, in a different part – Ismael Miguel – 2014-12-06T21:26:05.167

Exactly. That's what I'm talking about! Lets see if you can optimize it even further – Ismael Miguel – 2014-12-06T21:28:42.440

@IsmaelMiguel note that the online compiler is not the official one and is at least 10 times slower than the official command line Java compiler. – Optimizer – 2014-12-06T21:29:29.187

Don't worry, timing isn't critical. As stated on the rules: "Try to keep your code under 500 seconds to run". Nobody wants to wait 300 years for a word list while writing an important SMS for your boss. – Ismael Miguel – 2014-12-06T21:31:28.007

If you want to go for the 20% bonus, check my answer. You have an example of how to implement it. – Ismael Miguel – 2014-12-07T00:33:39.600

2

Java: 395

This forms a regex pattern based on the letters that are allowed for each number, and then tacks on a .* at the end to match any following characters.

Here is the golfed version:

static ArrayList<String> f(String n,ArrayList<String> d){String[] k={"","","([A-Ca-c])","([D-Fd-f])","([G-Ig-i])","([J-Lj-l])","([M-Om-o])","([P-Sp-s])","([T-Vt-v])","([W-Zw-z])"};String r="";for(int i=0;i<n.length();++i)r+=k[n.charAt(i)-'0'];r += ".*";Pattern p=Pattern.compile(r);ArrayList<String> a=new ArrayList<String>();for(String w:dictionary)if(p.matcher(w).matches())a.add(w);return a;}

And here is the ungolfed version for read-ability

public static ArrayList<String> f(String phoneNumber, ArrayList<String> dictionary) {

    String[] KEY_VALUES = {"", "", "([A-Ca-c])", "([D-Fd-f])", "([G-Ig-i])",
                                            "([J-Lj-l])", "([M-Om-o])", "([P-Sp-s])",
                                            "([T-Vt-v])", "([W-Zw-z])"};

    String regex = "";
    for (int i = 0; i < phoneNumber.length(); ++i) {
        regex += KEY_VALUES[phoneNumber.charAt(i) - '0'];
    }
    regex += ".*";
    Pattern p = Pattern.compile(regex);
    ArrayList<String> answers = new ArrayList<String>();
    for (String word : dictionary) {
        if (p.matcher(word).matches()) {
            answers.add(word);
        }
    }
    return answers;
}

Brian J

Posted 2014-12-06T20:01:19.500

Reputation: 653

Nice!!! I don't know if it is possible, but maybe you can store (inside the function) a reference to ArrayList and Pattern? (I don't know if this is possible: I'm not a Java programmer) But really nice looking code! You really golfed the heck out of it! – Ismael Miguel – 2014-12-11T23:50:11.317

@IsmaelMiguel when you say store a reference to, do you mean as a way to shorten the names? – Brian J – 2014-12-12T14:33:55.057

Exactly, as long as it is inside the body. You have a few repeated data types (I think I can call them that) that you can try to shorten. – Ismael Miguel – 2014-12-12T15:02:36.120

Your code goes against rule number 7: "Variables, macros, global variables, constants, non-standard classes and all the sort passing other values inside the function will be considered invalid." and it kinda goes against rule number 3: "You cannot use or re-implement other's answers or copy my implementation.", but on your code it is kinda debatable. And it also goes against the rule 9: "Your function will be named f". – Ismael Miguel – 2014-12-07T07:24:15.863

@IsmaelMiguel Oops. Rule 7 can be easily fixed by moving the constant inside the function. I was just pulling it outside the function for better programming style. Rule 9 is also an easy fix. I confess I didn't read your answer, so I didn't intentionally try to copy it. I can remove my answer if you think it is too close for the contest. – Brian J – 2014-12-07T18:37:31.767

Your answer is alright. You have a bug on your code. On the last constant (([W-Zw-z)]) it should be ([W-Zw-z]). And on Code-golf you don't have to worry about programming styles and good practices: your code must simply do it's thing withing the required parameters. If you check my answer, you will see this line: $s=[2=>abc,def,ghi,jkl,mno,pqrs,tuv,wxyz];. This is an awful 'crime' in PHP. Basically I am forcing PHP to convert non-existing constants into strings. This is perfectly acceptable. You will also see that I'm not even setting the variable $t to an array before using it as such – Ismael Miguel – 2014-12-08T01:21:07.840

@IsmaelMiguel Good catch on the regex error. Thanks for pointing it out. I'll try to actually golf it tomorrow; maybe find some Java examples on this site. – Brian J – 2014-12-08T01:53:01.063

I'm not a java programmer, but I tell you some things. You can check http://codegolf.stackexchange.com/questions/6671/tips-for-golfing-in-java to get some tips. General tips include removing useless whitespace (newlines, spaces, tabs), one-letter-long variable names and do all that you can to reduce the code size as much as possible.

– Ismael Miguel – 2014-12-08T03:15:42.257

1

C# .NET 4.5 235

This should work:

IEnumerable<string>F(string n,string d){IEnumerable<string>w=d.Split(null).ToList();string[]a={"","","abc","def","ghi", "jkl","mno","pqrs","tuv","wxyz"};foreach(var i in n){w=w.Where(x=>x.IndexOfAny(a[i-'0'].ToArray())>0);}return w;}

Chaossie

Posted 2014-12-06T20:01:19.500

Reputation: 11

Welcome to PPCG. Your code will work, but you still need to reduce it a lot more. By removing all useless whitespace (spaces, tabs, newlines) I have managed to reduce your code to 167 bytes. This code can be reduced a lot more, I'm sure of it. I recommend reading http://codegolf.stackexchange.com/questions/173/tips-for-code-golfing-in-c to shorten even more your code. To help you a little, the word list is a string separated by newlines, and you seem to assume it is already possible to use a foreach in it. If you expect it to already be IEnumerable, include the code used outside

– Ismael Miguel – 2014-12-08T13:04:26.687

@IsmaelMiguel TY I will look at it. List are a IEnumerable there is no code outside what I posted. – Chaossie – 2014-12-08T18:03:11.320

If you look at the specification of the function, you will see the 2nd parameter is also a string. (Quoting: "You will receive 1 phone number in a string and the content of a text file with a list of words (don't assume a specific newline style).") And you have 1 useless whitespace on your a var. – Ismael Miguel – 2014-12-08T21:45:15.907

I've noticed the improvements on your question, and I gave you an upvote. But you still can save a byte on your a var. But I really see noticeable improvements! Keep up the good work. – Ismael Miguel – 2014-12-09T13:46:19.493

1

Python 2 (155 bytes)

Should also work in Python 3 with the appropriate replacements (string->bytes, b prefix on strings, etc.).

I wasn't sure if having the maketrans call outside the function is considered "fair"; if not, the function is 134 bytes with it moved inside.

EDIT: Dropped one byte from a stupid oversight.

With prepared maketrans, 67 bytes:

from string import maketrans
t=maketrans('abcdefghijklmnopqrstuvwxyz','22233344455566677778889999')

def f(n,w):
    return[x for x in w.split()if x.lower().translate(t).startswith(n)]

With maketrans in body, 134 bytes:

from string import maketrans

def f(n,w):
    return[x for x in w.split()if x.lower().translate(maketrans('abcdefghijklmnopqrstuvwxyz','22233344455566677778889999')).startswith(n)]

With import and maketrans in body, 155 bytes:

def f(n,w):
    return[x for x in w.split()if x.lower().translate(__import__('string').maketrans('abcdefghijklmnopqrstuvwxyz','22233344455566677778889999')).startswith(n)]

Test call:

print f('9999',open('words.txt','rt').read())

criptych stands with Monica

Posted 2014-12-06T20:01:19.500

Reputation: 181

Those were alright, you could keep them anyway – Ismael Miguel – 2014-12-11T23:46:41.103

The maketrans is part of the function body. You should move it. I don't know if it is even possible, but you can try to directly use the import. I think I saw it somewhere... But your code is really nice! – Ismael Miguel – 2014-12-10T22:17:06.080

Do you mean to move the import and call into the body? Yes, I think that can be done, too. – criptych stands with Monica – 2014-12-11T13:52:14.267

I was thinking about t=(from stirng import maketrans)([...]). I have no idea if it is even possible. But maybe you can use from string import as x t=x([...]) which I'm not sure if it is possible too :/ – Ismael Miguel – 2014-12-11T13:58:16.300

The correct version is the last one. But the answer as-is is acceptable in my opinion. +1 for __import__('string').maketran. – Ismael Miguel – 2014-12-11T14:45:24.330

OK, thanks. I've removed the invalid answers. – criptych stands with Monica – 2014-12-11T20:08:47.457

0

PHP 5.4+ (171 186-20% = 148.8 bytes):

Well, this is quite a huge answer, but well.

I hope this brings more people to answer.

This function expects the raw content being read.

Here is the code:

function f($_,$a){$s=[2=>abc,def,ghi,jkl,mno,pqrs,tuv,wxyz];$a=preg_split('@\r\n|\r|\n@',$a);for($i=0;$c=$_[$i];++$i)foreach($a as$k=>$v)if(!strpos(1..$s[$c],$v[$i])||$t[$v[0]]++>4)unset($a[$k]);return$a;}

This works by verifying that the letter is in the list of allowed letters.

Example: the input 36 would make to check if 1abc has the first letter of the word and that 1def has the 2nd letter.

I append 1 so it doesn't check if the letter is in the 1st position (which would return 0 and that would evaluate to false). if(!strpos(1..$s[$c],$v[$i])) or if(!strpos($c.$s[$c],$v[$i])) will have the same effect, but the 1st confuses more and I like it.

Failing to do so, will remove the word.

With no words left, it returns an empty array.

To test this online, go to http://writecodeonline.com/php/ and create a simple variable with a word for line.

A testable example:

function f($_,$a)
{
    $s=array(2=>abc,def,ghi,jkl,mno,pqrs,tuv,wxyz);
    $a=preg_split('@\r\n|\r|\n@',$a);

    for($i=0;$c=$_[$i];++$i)
        foreach($a as$k=>$v)
            if(!strpos(1..$s[$c],$v[$i]) || $t[$v[0]]++>4)
                unset($a[$k]);
    return$a;
}

$lines=<<<WORDS
one
two
three
four
five
six
seven
eight
nine
ten
WORDS;

var_dump(f('36',$lines));

This should output:

array(1) {
    [3]=>
      string(4) "four"
}

To work on older php versions, replace $s=[2=>abc,def,ghi,jkl,mno,pqrs,tuv,wxyz]; by $s=array(2=>abc,def,ghi,jkl,mno,pqrs,tuv,wxyz);


For the 20% bonus:

To reduce the code I simply added ||$t[$v[0]]++>4, which checks how many times the first letter was used.

In php, $t doesn't need to be defined, helping to reduce a big chunk of 37.2 bytes.

To see this effect, use the following variable as the 2nd argument:

$lines=<<<WORDS
one
two
three
four
five
six
seven
eight
nine
ten
twelve
time
tutor
test
truth
WORDS;

Ismael Miguel

Posted 2014-12-06T20:01:19.500

Reputation: 6 797