6
1
For context, this problem is based on a old chat-bot project I did.
Problem:
Given a string of words containing any of the characters:
" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
Find the frequency of each triplet of words. All non-alphanumeric characters should be ignored, and input/output will be case-insensitive.
For this challenge, the "triplets" of a phrase are each consecutive chunk of 3 words along the string.
For example, in the string
"Oh hi there guy. What's up? Oh hi there."
The "triplets" of the string are
[["oh", "hi", "there"], ["hi", "there", "guy"], ["there", "guy", "whats"], ["guy", "whats", "up"],
["whats", "up", "oh"], ["up", "oh", "hi"], ["oh", "hi", "there"]]
The frequency of each triplet is 1, except for ["oh", "hi", "there"]
, which appears twice.
Input
Input will be a string of space-delimited "words" that may contain any of the characters mentioned above. Although punctuation is to be ignored, it must be handled.
You can assume the input will always contain at least 3 words, and that there won't be consecutive whitespace.
Output
Output can be anything that shows the frequency of each triplet.
For the string "Oh hi there guy.", possible outputs could be:
{"oh hi there":1, "hi there guy":1}
["oh hi there", 1, "hi there guy", 1]
"oh hi there|1 hi there guy|1"
^ Or any other delimiter
Test Cases (Output order doesn't matter):
"Oh hi there guy. What's up? Oh hi there."
{["oh" "hi" "there"] 2,
["hi" "there" "guy"] 1,
["there" "guy" "whats"] 1,
["guy" "whats" "up"] 1,
["whats" "up" "oh"] 1,
["up" "oh" "hi"] 1}
"aa aa aa aa"
{["aa" "aa" "aa"] 2}
"aa bb a bb a bb a cc a bb a"
{["aa" "bb" "a"] 1,
["bb" "a" "bb"] 2,
["a" "bb" "a"] 3,
["bb" "a" "cc"] 1,
["a" "cc" "a"] 1,
["cc" "a" "bb"] 1}
"99 bottles of beer"
{["99" "bottles" "of"] 1,
["bottles" "of" "beer"] 1}
"There are two main types of chatbots, one functions based on a set of rules, and the other more advanced version uses artificial intelligence. The chatbots based on rules, tend to be limited in functionality, and are as smart as they are programmed to be. On the other end, a chatbot that uses artificial intelligence, understands language, not just commands, and continuously gets smarter as it learns from conversations it has with people."
{["main" "types" "of"] 1,
["rules" "and" "the"] 1,
["of" "chatbots" "one"] 1,
["to" "be" "limited"] 1,
["artificial" "intelligence" "understands"] 1,
["it" "has" "with"] 1,
["chatbots" "based" "on"] 1,
["smarter" "as" "it"] 1,
["the" "chatbots" "based"] 1,
["other" "more" "advanced"] 1,
["commands" "and" "continuously"] 1,
["chatbots" "one" "functions"] 1,
["tend" "to" "be"] 1,
["a" "chatbot" "that"] 1,
["continuously" "gets" "smarter"] 1,
["advanced" "version" "uses"] 1,
["functionality" "and" "are"] 1,
["are" "two" "main"] 1,
["based" "on" "rules"] 1,
["on" "a" "set"] 1,
["there" "are" "two"] 1,
["the" "other" "more"] 1,
["just" "commands" "and"] 1,
["the" "other" "end"] 1,
["that" "uses" "artificial"] 1,
["based" "on" "a"] 1,
["limited" "in" "functionality"] 1,
["smart" "as" "they"] 1,
["are" "as" "smart"] 1,
["from" "conversations" "it"] 1,
["other" "end" "a"] 1,
["intelligence" "the" "chatbots"] 1,
["functions" "based" "on"] 1,
["in" "functionality" "and"] 1,
["intelligence" "understands" "language"] 1,
["chatbot" "that" "uses"] 1,
["more" "advanced" "version"] 1,
["gets" "smarter" "as"] 1,
["rules" "tend" "to"] 1,
["on" "rules" "tend"] 1,
["as" "it" "learns"] 1,
["are" "programmed" "to"] 1,
["and" "the" "other"] 1,
["understands" "language" "not"] 1,
["and" "are" "as"] 1,
["of" "rules" "and"] 1,
["has" "with" "people"] 1,
["end" "a" "chatbot"] 1,
["set" "of" "rules"] 1,
["and" "continuously" "gets"] 1,
["as" "they" "are"] 1,
["they" "are" "programmed"] 1,
["as" "smart" "as"] 1,
["two" "main" "types"] 1,
["a" "set" "of"] 1,
["uses" "artificial" "intelligence"] 2, # <----- 2 Here
["it" "learns" "from"] 1,
["be" "limited" "in"] 1,
["programmed" "to" "be"] 1,
["types" "of" "chatbots"] 1,
["conversations" "it" "has"] 1,
["one" "functions" "based"] 1,
["be" "on" "the"] 1,
["not" "just" "commands"] 1,
["version" "uses" "artificial"] 1,
["learns" "from" "conversations"] 1,
["artificial" "intelligence" "the"] 1,
["to" "be" "on"] 1,
["on" "the" "other"] 1,
["language" "not" "just"] 1}
Your submission can be a function or full program, and can take input via stdin, or as an argument. It may output by returning, or printing to the stdout.
This is code golf, so the shortest number of bytes wins.
Should digits be removed or retained in the output? – Martin Ender – 2017-03-04T23:22:38.613
Retained. Note the 4th test case. Do I have contradictory information somewhere? I changed that from when it was in the sandbox. Might have missed updating something. – Carcigenicate – 2017-03-04T23:24:38.653
Nevermind, overlooked that test case. – Martin Ender – 2017-03-04T23:26:20.233
Are repeated spaces ever going to occur in the input, and if so how should they be treated? (At least one current solution would parse
"spam eggs ham"
(with double spaces, which markdown removes) as["spam", "", "eggs", "", "ham"]
and at least one as["spam", "eggs", "ham"]
) – Jonathan Allan – 2017-03-05T07:38:16.530You can assume consecutive spaces won't exist. Updated input specification. – Carcigenicate – 2017-03-06T00:37:43.143
Given that you are not lost in another planet: do you think that 10 out of 13 answers do not even deserve an upvote? – edc65 – 2017-03-06T15:46:30.830
@edc65 I've worked the past 2 nights, while dealing with an infected wisdom tooth, while packing for a trip. I glanced down the answers, which is why I commented on yours. Once I have a sec I'll go over and check and upvote the answers I can. – Carcigenicate – 2017-03-06T15:56:02.253
@edc65 Unless everyone needs exactly 10 rep now, I didn't think it would be an issue. – Carcigenicate – 2017-03-06T15:57:47.193
Of course it won't. Seeing so many answers with no feedback just gives a bad feeling. Best wishes for a speedy recovery – edc65 – 2017-03-06T16:12:26.373
@edc65 Thanks. I should have internet on my computer tonight when I get to the hotel, so I'll go over them then. I'm low on data so I don't want to push my luck on the way there. – Carcigenicate – 2017-03-06T16:15:11.180