Match the Stack Exchange URLs

15

1

Prologue

After installing an anti-XSS browser extension, Stack Snippets suddenly stopped working all across the Stack Exchange network. I could no longer learn from Stack Overflow, see working demos on User Experience and, worst of all, could not test JavaScript answers on Programming Puzzles and Code Golf! Desperately I searched for a remedy, and found a small input box in the settings, in which could place a single regex. I couldn't fit all of the Stack Exchange sites in this one, small box, so I asked for help. This is that question.

Task

Your task is to create a regular expression that matches all of the Stack Exchange website URLs, without matching any domains not owned by Stack Overflow Inc..

Your regular expression must match all URLs with the following parts:

  • protocol: this will either be http:// or https://.
  • domain: this will be an item from this list:

    stackoverflow.com
    www.stackoverflow.com
    facebook.stackoverflow.com
    serverfault.com
    superuser.com
    meta.stackexchange.com
    webapps.stackexchange.com
    nothingtoinstall.com
    meta.webapps.stackexchange.com
    meta.nothingtoinstall.com
    gaming.stackexchange.com
    arqade.com
    thearqade.com
    meta.gaming.stackexchange.com
    meta.arqade.com
    meta.thearqade.com
    webmasters.stackexchange.com
    webmaster.stackexchange.com
    meta.webmasters.stackexchange.com
    meta.webmaster.stackexchange.com
    cooking.stackexchange.com
    seasonedadvice.com
    meta.cooking.stackexchange.com
    meta.seasonedadvice.com
    gamedev.stackexchange.com
    meta.gamedev.stackexchange.com
    photo.stackexchange.com
    photography.stackexchange.com
    photos.stackexchange.com
    meta.photo.stackexchange.com
    meta.photography.stackexchange.com
    meta.photos.stackexchange.com
    stats.stackexchange.com
    statistics.stackexchange.com
    crossvalidated.com
    meta.stats.stackexchange.com
    meta.statistics.stackexchange.com
    math.stackexchange.com
    maths.stackexchange.com
    mathematics.stackexchange.com
    meta.math.stackexchange.com
    diy.stackexchange.com
    meta.diy.stackexchange.com
    meta.superuser.com
    meta.serverfault.com
    gis.stackexchange.com
    meta.gis.stackexchange.com
    tex.stackexchange.com
    meta.tex.stackexchange.com
    askubuntu.com
    ubuntu.stackexchange.com
    meta.askubuntu.com
    meta.ubuntu.stackexchange.com
    money.stackexchange.com
    basicallymoney.com
    www.basicallymoney.com
    meta.money.stackexchange.com
    english.stackexchange.com
    elu.stackexchange.com
    meta.english.stackexchange.com
    stackapps.com
    ux.stackexchange.com
    ui.stackexchange.com
    meta.ux.stackexchange.com
    meta.ui.stackexchange.com
    unix.stackexchange.com
    linux.stackexchange.com
    meta.unix.stackexchange.com
    meta.linux.stackexchange.com
    wordpress.stackexchange.com
    meta.wordpress.stackexchange.com
    cstheory.stackexchange.com
    meta.cstheory.stackexchange.com
    apple.stackexchange.com
    askdifferent.com
    meta.apple.stackexchange.com
    rpg.stackexchange.com
    meta.rpg.stackexchange.com
    bicycles.stackexchange.com
    bicycle.stackexchange.com
    cycling.stackexchange.com
    bikes.stackexchange.com
    meta.bicycles.stackexchange.com
    meta.bicycle.stackexchange.com
    programmers.stackexchange.com
    programmer.stackexchange.com
    meta.programmers.stackexchange.com
    electronics.stackexchange.com
    chiphacker.com
    www.chiphacker.com
    meta.electronics.stackexchange.com
    android.stackexchange.com
    meta.android.stackexchange.com
    boardgames.stackexchange.com
    boardgame.stackexchange.com
    meta.boardgames.stackexchange.com
    physics.stackexchange.com
    meta.physics.stackexchange.com
    homebrew.stackexchange.com
    homebrewing.stackexchange.com
    brewadvice.com
    meta.homebrew.stackexchange.com
    meta.homebrewing.stackexchange.com
    security.stackexchange.com
    itsecurity.stackexchange.com
    meta.security.stackexchange.com
    meta.itsecurity.stackexchange.com
    writers.stackexchange.com
    writer.stackexchange.com
    writing.stackexchange.com
    meta.writers.stackexchange.com
    video.stackexchange.com
    avp.stackexchange.com
    meta.video.stackexchange.com
    meta.avp.stackexchange.com
    graphicdesign.stackexchange.com
    graphicsdesign.stackexchange.com
    graphicdesigns.stackexchange.com
    meta.graphicdesign.stackexchange.com
    dba.stackexchange.com
    meta.dba.stackexchange.com
    scifi.stackexchange.com
    sciencefiction.stackexchange.com
    fantasy.stackexchange.com
    meta.scifi.stackexchange.com
    codereview.stackexchange.com
    meta.codereview.stackexchange.com
    codegolf.stackexchange.com
    meta.codegolf.stackexchange.com
    quant.stackexchange.com
    meta.quant.stackexchange.com
    pm.stackexchange.com
    meta.pm.stackexchange.com
    skeptics.stackexchange.com
    skeptic.stackexchange.com
    skepticexchange.com
    meta.skeptics.stackexchange.com
    fitness.stackexchange.com
    meta.fitness.stackexchange.com
    drupal.stackexchange.com
    meta.drupal.stackexchange.com
    mechanics.stackexchange.com
    garage.stackexchange.com
    meta.mechanics.stackexchange.com
    meta.garage.stackexchange.com
    parenting.stackexchange.com
    meta.parenting.stackexchange.com
    sharepoint.stackexchange.com
    sharepointoverflow.com
    www.sharepointoverflow.com
    meta.sharepoint.stackexchange.com
    music.stackexchange.com
    guitars.stackexchange.com
    guitar.stackexchange.com
    meta.music.stackexchange.com
    sqa.stackexchange.com
    meta.sqa.stackexchange.com
    judaism.stackexchange.com
    mi.yodeya.com
    yodeya.com
    yodeya.stackexchange.com
    miyodeya.com
    meta.judaism.stackexchange.com
    german.stackexchange.com
    deutsch.stackexchange.com
    meta.german.stackexchange.com
    japanese.stackexchange.com
    meta.japanese.stackexchange.com
    philosophy.stackexchange.com
    meta.philosophy.stackexchange.com
    gardening.stackexchange.com
    landscaping.stackexchange.com
    meta.gardening.stackexchange.com
    travel.stackexchange.com
    meta.travel.stackexchange.com
    productivity.stackexchange.com
    meta.productivity.stackexchange.com
    crypto.stackexchange.com
    cryptography.stackexchange.com
    meta.crypto.stackexchange.com
    meta.cryptography.stackexchange.com
    dsp.stackexchange.com
    signals.stackexchange.com
    meta.dsp.stackexchange.com
    french.stackexchange.com
    meta.french.stackexchange.com
    christianity.stackexchange.com
    meta.christianity.stackexchange.com
    bitcoin.stackexchange.com
    meta.bitcoin.stackexchange.com
    linguistics.stackexchange.com
    linguist.stackexchange.com
    meta.linguistics.stackexchange.com
    hermeneutics.stackexchange.com
    meta.hermeneutics.stackexchange.com
    history.stackexchange.com
    meta.history.stackexchange.com
    bricks.stackexchange.com
    meta.bricks.stackexchange.com
    spanish.stackexchange.com
    espanol.stackexchange.com
    meta.spanish.stackexchange.com
    scicomp.stackexchange.com
    meta.scicomp.stackexchange.com
    movies.stackexchange.com
    meta.movies.stackexchange.com
    chinese.stackexchange.com
    meta.chinese.stackexchange.com
    biology.stackexchange.com
    meta.biology.stackexchange.com
    poker.stackexchange.com
    meta.poker.stackexchange.com
    mathematica.stackexchange.com
    meta.mathematica.stackexchange.com
    cogsci.stackexchange.com
    meta.cogsci.stackexchange.com
    outdoors.stackexchange.com
    meta.outdoors.stackexchange.com
    martialarts.stackexchange.com
    meta.martialarts.stackexchange.com
    sports.stackexchange.com
    meta.sports.stackexchange.com
    academia.stackexchange.com
    academics.stackexchange.com
    meta.academia.stackexchange.com
    cs.stackexchange.com
    computerscience.stackexchange.com
    meta.cs.stackexchange.com
    workplace.stackexchange.com
    meta.workplace.stackexchange.com
    windowsphone.stackexchange.com
    meta.windowsphone.stackexchange.com
    chemistry.stackexchange.com
    meta.chemistry.stackexchange.com
    chess.stackexchange.com
    meta.chess.stackexchange.com
    raspberrypi.stackexchange.com
    meta.raspberrypi.stackexchange.com
    russian.stackexchange.com
    meta.russian.stackexchange.com
    islam.stackexchange.com
    meta.islam.stackexchange.com
    salesforce.stackexchange.com
    meta.salesforce.stackexchange.com
    patents.stackexchange.com
    askpatents.com
    askpatents.stackexchange.com
    meta.patents.stackexchange.com
    meta.askpatents.com
    meta.askpatents.stackexchange.com
    genealogy.stackexchange.com
    meta.genealogy.stackexchange.com
    robotics.stackexchange.com
    meta.robotics.stackexchange.com
    expressionengine.stackexchange.com
    meta.expressionengine.stackexchange.com
    politics.stackexchange.com
    meta.politics.stackexchange.com
    anime.stackexchange.com
    meta.anime.stackexchange.com
    magento.stackexchange.com
    meta.magento.stackexchange.com
    ell.stackexchange.com
    meta.ell.stackexchange.com
    sustainability.stackexchange.com
    meta.sustainability.stackexchange.com
    tridion.stackexchange.com
    meta.tridion.stackexchange.com
    reverseengineering.stackexchange.com
    meta.reverseengineering.stackexchange.com
    networkengineering.stackexchange.com
    meta.networkengineering.stackexchange.com
    opendata.stackexchange.com
    meta.opendata.stackexchange.com
    freelancing.stackexchange.com
    meta.freelancing.stackexchange.com
    blender.stackexchange.com
    meta.blender.stackexchange.com
    mathoverflow.net
    mathoverflow.stackexchange.com
    mathoverflow.com
    meta.mathoverflow.net
    space.stackexchange.com
    thefinalfrontier.stackexchange.com
    meta.space.stackexchange.com
    sound.stackexchange.com
    socialsounddesign.com
    sounddesign.stackexchange.com
    meta.sound.stackexchange.com
    astronomy.stackexchange.com
    meta.astronomy.stackexchange.com
    tor.stackexchange.com
    meta.tor.stackexchange.com
    pets.stackexchange.com
    meta.pets.stackexchange.com
    ham.stackexchange.com
    meta.ham.stackexchange.com
    italian.stackexchange.com
    meta.italian.stackexchange.com
    pt.stackoverflow.com
    br.stackoverflow.com
    stackoverflow.com.br
    meta.pt.stackoverflow.com
    meta.br.stackoverflow.com
    aviation.stackexchange.com
    meta.aviation.stackexchange.com
    ebooks.stackexchange.com
    meta.ebooks.stackexchange.com
    alcohol.stackexchange.com
    beer.stackexchange.com
    dranks.stackexchange.com
    meta.alcohol.stackexchange.com
    meta.beer.stackexchange.com
    softwarerecs.stackexchange.com
    meta.softwarerecs.stackexchange.com
    arduino.stackexchange.com
    meta.arduino.stackexchange.com
    cs50.stackexchange.com
    meta.cs50.stackexchange.com
    expatriates.stackexchange.com
    expats.stackexchange.com
    meta.expatriates.stackexchange.com
    matheducators.stackexchange.com
    meta.matheducators.stackexchange.com
    meta.stackoverflow.com
    earthscience.stackexchange.com
    meta.earthscience.stackexchange.com
    joomla.stackexchange.com
    meta.joomla.stackexchange.com
    datascience.stackexchange.com
    meta.datascience.stackexchange.com
    puzzling.stackexchange.com
    meta.puzzling.stackexchange.com
    craftcms.stackexchange.com
    meta.craftcms.stackexchange.com
    buddhism.stackexchange.com
    meta.buddhism.stackexchange.com
    hinduism.stackexchange.com
    meta.hinduism.stackexchange.com
    communitybuilding.stackexchange.com
    moderator.stackexchange.com
    moderators.stackexchange.com
    meta.communitybuilding.stackexchange.com
    meta.moderators.stackexchange.com
    startups.stackexchange.com
    meta.startups.stackexchange.com
    worldbuilding.stackexchange.com
    meta.worldbuilding.stackexchange.com
    ja.stackoverflow.com
    jp.stackoverflow.com
    meta.ja.stackoverflow.com
    emacs.stackexchange.com
    meta.emacs.stackexchange.com
    hsm.stackexchange.com
    meta.hsm.stackexchange.com
    economics.stackexchange.com
    meta.economics.stackexchange.com
    lifehacks.stackexchange.com
    meta.lifehacks.stackexchange.com
    engineering.stackexchange.com
    meta.engineering.stackexchange.com
    coffee.stackexchange.com
    meta.coffee.stackexchange.com
    vi.stackexchange.com
    vim.stackexchange.com
    meta.vi.stackexchange.com
    musicfans.stackexchange.com
    meta.musicfans.stackexchange.com
    woodworking.stackexchange.com
    meta.woodworking.stackexchange.com
    civicrm.stackexchange.com
    meta.civicrm.stackexchange.com
    health.stackexchange.com
    meta.health.stackexchange.com
    ru.stackoverflow.com
    hashcode.ru
    stackoverflow.ru
    meta.ru.stackoverflow.com
    meta.hashcode.ru
    rus.stackexchange.com
    russ.hashcode.ru
    russ.stackexchange.com
    meta.rus.stackexchange.com
    mythology.stackexchange.com
    meta.mythology.stackexchange.com
    law.stackexchange.com
    meta.law.stackexchange.com
    opensource.stackexchange.com
    meta.opensource.stackexchange.com
    elementaryos.stackexchange.com
    meta.elementaryos.stackexchange.com
    portuguese.stackexchange.com
    meta.portuguese.stackexchange.com
    computergraphics.stackexchange.com
    meta.computergraphics.stackexchange.com
    hardwarerecs.stackexchange.com
    meta.hardwarerecs.stackexchange.com
    es.stackoverflow.com
    meta.es.stackoverflow.com
    3dprinting.stackexchange.com
    threedprinting.stackexchange.com
    meta.3dprinting.stackexchange.com
    ethereum.stackexchange.com
    meta.ethereum.stackexchange.com
    latin.stackexchange.com
    meta.latin.stackexchange.com
    languagelearning.stackexchange.com
    meta.languagelearning.stackexchange.com
    retrocomputing.stackexchange.com
    meta.retrocomputing.stackexchange.com
    crafts.stackexchange.com
    meta.crafts.stackexchange.com
    korean.stackexchange.com
    meta.korean.stackexchange.com
    monero.stackexchange.com
    meta.monero.stackexchange.com
    ai.stackexchange.com
    meta.ai.stackexchange.com
    esperanto.stackexchange.com
    meta.esperanto.stackexchange.com
    sitecore.stackexchange.com
    meta.sitecore.stackexchange.com
    
  • page: this will either be the empty string, / or / followed by any string

The url will be the string created by appending protocol, domain and page to each other, in that order.

Testcases

Your regex must match:

https://codegolf.stackexchange.com/
http://retrocomputing.stackexchange.com
https://facebook.stackoverflow.com/questions/1234
http://meta.nothingtoinstall.com/thisisa404.php?file=command.com

Your regex must not match:

http//codegolf.stackexchange.com/
https://meta.stackoverflow.com.fakesite.dodgy/cgi-bin/virus.cgi?vector=apt
file://www.stackoverflow.com/
http://ripoff.com/stackoverflow.com/q/1234/

Your regex may match:

http://panda.stackexchange.com/
https://www.meta.codegolf.stackexchange.com
http://alpha.beta.charlie.delta.chiphacker.com
https://stackexchange.com/sites/

because these are owned by Stack Exchange Inc. and so will not be vulnerable to XSS attacks.

This is a challenge, so the shortest regular expression wins!

wizzwizz4

Posted 2016-09-27T17:57:12.227

Reputation: 1 895

What do you mean 'may match'? Shouldn't those also be in 'must match'? We generally do not take kindly to 'bonus objectives' here, as in a code golf context they'll virtually always be skipped to save bytes. – orlp – 2016-09-27T19:06:53.603

@orlp I see this as more like ASCII-art challenges that say, "Your program may output any amount of trailing whitespace as long as the output looks like the example." In other words, these are some cases that the programmer doesn't have to worry about explicitly disallowing. If they fail, fine; if they match, fine. – DLosc – 2016-09-27T19:10:52.980

4@orlp I added them because, for most implementation techniques, they save bytes. – wizzwizz4 – 2016-09-27T20:32:16.630

Answers

16

337 336 333 327 bytes

^https?://([^/]+\.)*(stackoverflow\.(com(\.br)?|ru)|mathoverflow\.(com|net)|hashcode\.ru|((the)?arqade|chiphacker|(mi)?yodeya|ask(ubuntu|different|patents)|(seasoned|brew)advice|s(erverfault|(tack|keptic)exchange|uperuser|tackapps|harepointoverflow|ocialsounddesign)|nothingtoinstall|crossvalidated|basicallymoney)\.com)(/.*)?$

Doesn't use any fancy regex features, so it should work in any regex flavour.

orlp

Posted 2016-09-27T17:57:12.227

Reputation: 37 067

@TimmyD When I tried it, it matched one of the optional ones too... regexr

– wizzwizz4 – 2016-09-27T20:37:43.263

@wizzwizz4 I forgot stackexchange and start/end markers. – orlp – 2016-09-27T21:17:09.090

@oflp It doesn't match anything now... Maybe start/end markers don't work in regexr? regexr

– wizzwizz4 – 2016-09-27T21:35:38.613

@wizzwizz4 No clue, I never use regexr. Try inputting one string at a time rather than a full text. – orlp – 2016-09-27T21:43:04.930

@wizzwizz4 You need to enable the multi line flag (upper right corner) else it matches start/end of the entire text, rather than start/end of the line. – AdmBorkBork – 2016-09-27T22:11:03.410

1You can can save a few bytes by combining seasonedadvice with brewadvice instead of with the other s words. – Riley – 2016-09-27T22:35:50.300

I was going to post an answer but I realised it was essentially identical to yours except for using (\w+\.) at the beginning. – curiousdannii – 2016-09-28T04:43:47.663

5

359 348 bytes

https?:\/\/(([^/]+\.)*((stack(overflow|apps|exchange)|ask(ubuntu|different|patents)|(the)?arqade|serverfault|superuser|nothingtoinstall|(seasoned|brew)advice|crossvalidated|basicallymoney|chiphacker|skepticexchange|(sharepoint|math)overflow|(mi)?yodea|socialsounddesign)\.com)|(stackoverflow(\.com\.br|\.ru)|hashcode\.ru|mathoverflow\.net))(\/.*)?$

Test it out on regexr

AdmBorkBork

Posted 2016-09-27T17:57:12.227

Reputation: 41 581

3

2179 2092 1966 bytes

https?:\/\/((((www|facebook|jp|(meta\.?)?(es|ru|ja|pt|br)?)\.)?stackoverflow\.com)|(stackoverflow\.(ru|com\.br))|(((russ|meta)\.)?hashcode\.ru)|(crossvalidated|socialsounddesign|mathoverflow|(mi\.?)?yodeya|(www\.)?(sharepointoverflow|basicallymoney)|skepticexchange|brewadvice|(www\.)?chiphacker|ask(different|ubuntu)|stackapps|(meta\.)?(nothingtoinstall|arqade|thearqade|seasonedadvice|superuser|serverfault|ask(ubuntu|patents)))\.com|((meta\.)?mathoverflow\.net)|(((meta\.)?((3|three)dprinting|(ask)?patents|(community|world)building|(econo|acade)mics|(it)?security|(reverse|network)?engineering|a(cademia|i|lcohol|ndroid|nime|pple|rduino|stronomy|viation|vp)|b((e|lend)er|i(cycles?|kes|ology|tcoin)|oardgames?|ricks|uddhism)|c(h(emistry|ess|inese|ristianity)|ivicrm|o(de(review|golf)|ffee|gsci|mputer(science|graphics)|oking)|r(aftcms|afts|ypto(graphy)?)|s(50)?|stheory)|d(ba|eutsch|iy|ranks|rupal|sp)|(earth|data)science|e(books|l(ectronics|ementaryos|l|u)|macs|nglish|spanol|speranto|thereum|xp(at(s|riates)|ressionengine))|f(antasy|itness|re(elancing|nch))|g(a(medev|ming|rage|rdening)|erman|enealogy|is|raphics?designs?|uitars?)|h(am|ealth|ermeneutics|induism|istory|omebrew(ing)?|sm)|(hard|soft)warerecs|islam|italian|j(apanese|oomla|udaism)|korean|l(a((ndscap|nguagelearn)ing|tin|w)|i(fehacks|nguist(ics)?|nux|))|m(a(gento|rtialarts|th(educators|ematica|ematics|s|overflow)?)|eta|echanics|o(derators?|nero|ney|vies)|usic(fans)?|ythology)|o(pen(data|source)|utdoors)|(cycl|parent|retrocomput)ing|p(ets|h(ilosophy|oto(graphy|s)?|ysics)|m|o(ker|litics|rtuguese)|roductivity|rogrammers?|uzzling)|quant|r(aspberrypi|obotics|pg|us(s|sian)?)|s(alesforce|ci(comp|encefiction|fi)|harepoint|ignals|itecore|keptics?|ound(design)?|p(ace|anish|orts)|qa|tartups|tat(s|istics)|ustainability)|t(ex|hefinalfrontier|or|ravel|ridion)|u(buntu|i|nix|x)|vi(deo|m)?|w(eb(apps|masters?)|indowsphone|o(odworking|rdpress|rkplace)|rit(ers?|ing))|(yodeya)))\.stackexchange\.com))(\/|$)

Matches exactly the domains listed and nothing else. I did most of the compressing by hand. I'm a bit embarrassed that I spent this much time on it.

Riley

Posted 2016-09-27T17:57:12.227

Reputation: 11 345

Is there any more .com compressing you can do? – wizzwizz4 – 2016-09-27T21:32:57.697

@wizzwizz4 I'm sure I could compress a lot more. I might come back to this later. – Riley – 2016-09-27T21:39:51.697

1All subdomains are considered safe, so you don't need to list any of them. – curiousdannii – 2016-09-28T04:47:45.513

@curiousdannii I know, but I wanted to see how small I could make it and only match the given subdomains. – Riley – 2016-09-28T13:28:51.213

2

142 140 334 bytes

#^https?:\/\/([^\/]*\.)?(hashcode\.ru|mathoverflow\.(com|net)|stackoverflow\.(ru|com(\.br)?)|((stack|skeptic)exchange|stackapps|ask(different|patents|ubuntu)|(brew|seasoned)advice|(the)?arqade|basicallymoney|chiphacker|crossvalidated|nothingtoinstall|serverfault|sharepointoverflow|socialsounddesign|superuser|(mi)?yodeya)\.com)(/|$)#
  • matches everything on the specified second level domains, path or no path
  • uses # as delimiter so / needs no escaping (saved two bytes)
  • compressed manually

Titus

Posted 2016-09-27T17:57:12.227

Reputation: 13 814