Add TIO link to messages!

7

It's annoying when you have to search for a language homepage, get its link, and add it to your SE comment/answer, right? Let's write a script to do that for you!

Challenge

For each language name (intentional or not) in the input, replace it with the markdown link to the language home page, defined by TIO.

Rules

  • (important) You can only connect to https://tio.run.
  • Use the language list on TIO at the time your program is run.

  • The input is a string. Assume no newline (because multiline messages can't have markdown anyway). Unicode characters may be inside the string because they can exist in chat messages (example).

  • The input is valid markdown, but it is unlikely that this is related to the challenge.

  • You can assume there is no link inside the input. (Whoops, I missed this when the challenge was in the sandbox, thanks mercator for pointing that out. As far as I can see this should not invalidate any existing answers)

  • Case sensitive or insensitive, whatever you choose.

  • Only word breaks (not both adjacent to an English letter) are considered. (for example, matlab should not be converted to matlab, 47 should be converted to 47)

    Note: It's too hard to define what's a "character" in Unicode, so your program can detect a word break or not if one of the two surrounding characters are not printable ASCII.

  • Only use markdown [...](...), not HTML link <a href="...">...</a> because the latter doesn't work in chat.

  • Markdown components (* / _ for italic, ** for bold, --- for strikethrough (in chat)) in messages should not matter, because AFAIK there are no languages with such names on TIO.

  • Use the language name (Wolfram Language (Mathematica)), not language ID.

  • No overlapping replacement.
  • In case there are multiple output, pick whichever. See example below.

  • Although by default chat markdown doesn't allow links in code blocks, for simplicity I would just assume that they don't need to be treated specially.

Winning criteria

.

Test cases

Input -> Output
matlab -> matlab
47 -> [4](https://github.com/urielieli/py-four)[7](https://esolangs.org/wiki/7)
/// -> [///](https://esolangs.org/wiki////)
//// -> [///](https://esolangs.org/wiki////)/ or /[///](https://esolangs.org/wiki////)
////// -> [///](https://esolangs.org/wiki////)[///](https://esolangs.org/wiki////) or //[///](https://esolangs.org/wiki////)/
Uṅıċȯḋė -> Uṅıċȯḋė

More test cases

(suggested by Adám. Any replacement is fine as long as they don't overlap (valid markdown source))

\///
S.I.L.O.S.I.L.O.S
Del|m|t
Del|m|tcl

user202729

Posted 2018-02-03T15:52:20.333

Reputation: 14 620

"AFAIK there are no languages with such names on TIO" - congratulations, now there are going to be a dozen within a week because of this – Mego – 2018-02-05T08:01:25.007

Why can matlab not become [matl]ab but 47 can become [4][7]? – HyperNeutrino – 2018-02-05T13:09:25.387

@HyperNeutrino Because word break. – user202729 – 2018-02-05T13:52:28.203

Looks like people hate word breaks. I don't know. 2 downvotes already. [blame-sandbox] – user202729 – 2018-02-05T13:56:53.590

@HyperNeutrino I told you, [blame-sandbox]. I don't know about the behavior of \b before asking, and now I can't invalidate answers. – user202729 – 2018-02-05T14:28:54.087

matl isn't the name of a supported language. It's MATL. So the matlab test case doesn't cover the word break rule for answers that are case-sensitive. – mercator – 2018-02-05T22:20:26.230

And what should happen with input like a2sable or Ada (GNAT)foo? Are there word breaks at the start/end of those language names? And when you say the input is valid markdown, does this mean the input can also be a name that's already linked, like [4](https://en.wikipedia.org/wiki/4)? What should happen then? – mercator – 2018-02-05T22:41:11.497

@mercator Hopefully fixed? I don't understand, what's the problem with those inputs? – user202729 – 2018-02-06T15:57:09.607

Looks good, yes. :) The problem with those inputs is I'm still not entirely sure what word breaks are. Is there a break in the middle of a2? The way it's described now seems to me like it's a word break if the language name is on the right (2), but not if the language is on the left (a). If there is a word break, none of the answers pass. And I'm wondering if I'm going to spend the effort trying to fix my answer. :P Or if I can convince you to drop the rule. ;) – mercator – 2018-02-06T22:20:34.237

@mercator But existing answers... I will test them (and probably change the rules accordingly) later. – user202729 – 2018-02-07T01:25:02.373

Answers

6

Javascript (ES6) + jQuery, 247 bytes

s=>c=>$.getJSON('https://tio.run/static/645a9fe4c61fc9b426252b2299b11744-languages.json',data=>{for(l of Object.values(data))s=s.replace(eval(`/([^a-zA-Z]|^)${l.name.replace(/[\\/+.|()]/g,"\\$&")}(?![a-zA-Z])/g`),`$1[${l.name}](${l.link})`);c(s)})

Explanation (less golfed)

s=>c=>
// Load TIO languages using jQuery
$.getJSON('https://tio.run/static/645a9fe4c61fc9b426252b2299b11744-languages.json',data=>{
  // For every language
  for(l of Object.values(data)) {
    // Construct a regex that matches the language name
    var regex = eval(`/([^a-zA-Z]|^)${l.name.replace(/[\\/+.|()]/g,"\\$&")}(?![a-zA-Z])/g`
    // Replace all occurences of the language name with the language name and link
    s=s.replace(regex),`$1[${l.name}](${l.link})`)
  }
  // Return using callback
  c(s);
});

f=
s=>c=>$.getJSON('https://tio.run/static/645a9fe4c61fc9b426252b2299b11744-languages.json',data=>{for(l of Object.values(data))s=s.replace(eval(`/([^a-zA-Z]|^)${l.name.replace(/[\\/+.|()]/g,"\\$&")}(?![a-zA-Z])/g`),`$1[${l.name}](${l.link})`);c(s)})

f("47")(console.log);
f("///")(console.log);
f("////")(console.log);
f("//////")(console.log);
f("Uṅıċȯḋė")(console.log);
f("\///")(console.log);
f("S.I.L.O.S.I.L.O.S")(console.log);
f("Del|m|t")(console.log);
f("Del|m|tcl")(console.log);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Herman L

Posted 2018-02-03T15:52:20.333

Reputation: 3 611

2

Python + Selenium, 496 bytes

I decided using the front end of the website would be cooler. And since nearly everything is loaded via JavaScript, Selenium was my best bet.

from selenium import webdriver
from re import*
d=webdriver.Chrome("a.exe")
d.get("https://tio.run/#")
t={e.text.lower():e.get_attribute("data-id")for e in d.find_elements_by_xpath('//*[@id="results"]/div')}
def q(reg):s=reg.group(0);d.get("https://tio.run/#"+t[s.lower()]);d.find_element_by_id("permalink").click();a=d.find_elements_by_tag_name("textarea")[4].get_attribute('value');return"["+s+"]("+search(': (ht.*)\n',a).group(1)+")"
print(sub('|'.join(map(escape,t.keys())),q,input(),flags=I))

Pseudo-Ungolfed Version

I thought providing an in-between would be useful for understanding.

from selenium import webdriver
import re
import itertools

input = lambda:"47////////"

s = input()
driver = webdriver.Chrome("chromedriver.exe")
driver.get("https://tio.run/#")
temp = {ele.text.lower():ele.get_attribute("data-id")for ele in driver.find_element_by_id("results").find_elements_by_tag_name("div")}
def abc(reg):
    s=reg.group(0)
    driver.get("https://tio.run/#"+temp[s.lower()])
    ele = driver.find_element_by_id("permalink").click()
    a = driver.find_elements_by_tag_name("textarea")[4].get_attribute('value')
    return"["+s+"]("+re.search(': (ht.*)\n',a).group(1)+")"
bob = re.sub('|'.join(map(re.escape,temp.keys())),abc,s,flags=re.I)
print(bob)

Neil

Posted 2018-02-03T15:52:20.333

Reputation: 2 417

1

Python 2, 291 301 bytes

import json,urllib
s=raw_input().decode('u8');m=json.loads(urllib.urlopen('https://tio.run/languages.json').read());n=0
while n<len(s):
 for l in sorted(m,None,lambda x:m[x]['name'],1):
  L=m[l]['name'];r='[%s](%s)'%(L,m[l]['link'])
  if s.find(L,n)==n:s=s[:n]+r+s[n+len(L):];n+=len(r)-1
 n+=1
print s

Doesn't strictly speaking follow the rules, since it doesn't consider word breaks. It always takes the longest possible match first.

mercator

Posted 2018-02-03T15:52:20.333

Reputation: 359