[Br]eaking Code Golf [Ba]d

20

4

Consider the following string:

Tin Snips

This string contains several atomic symbols on the periodic table. We could rewrite this string to identify several of them:

[Ti][N] [Sn][I][P][S]

Of course, we could also write it this way:

T[In] [S][Ni][P][S]

The rules for rewriting the input are as follows:

  1. The case of the input does not matter in terms of matching atomic symbols.
  2. If an element is used in an atomic symbol, its case must change so the symbol is correct. Ex: h would become [H].
  3. All element symbols are encased in ASCII square brackets, [ and ].
  4. Whitespace is preserved: Big ego cannot combine the "g" and "e" into [Ge].
  5. Not all input characters need be combined into an atomic symbol: if an input character is not put into a symbol, it is passed through as-is (case does not matter).
  6. If a symbol can be made, it must be made. In other words, it is not allowed to output Tin in the above example because it is possible to create at least one symbol in that word. The only time a character may be passed through unused is when it cannot be used to construct an atomic symbol.
  7. For the purposes of this challenge, all elements from Hydrogen (1) to Oganesson (118) are valid. No higher elements are valid.
  8. Some of the higher elements have ambiguous names and symbols: for the purposes of this challenge, the version at Wikipedia shall be used. For convenience, the allowable atomic symbols are here: H, He, Li, Be, B, C, N, O, F, Ne, Na, Mg, Al, Si, P, S, Cl, Ar, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Br, Kr, Rb, Sr, Y, Zr, Nb, Mo, Tc, Ru, Rh, Pd, Ag, Cd, In, Sn, Sb, Te, I, Xe, Cs, Ba, La, Ce, Pr, Nd, Pm, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, Lu, Hf, Ta, W, Re, Os, Ir, Pt, Au, Hg, Tl, Pb, Bi, Po, At, Rn, Fr, Ra, Ac, Th, Pa, U, Np, Pu, Am, Cm, Bk, Cf, Es, Fm, Md, No, Lr, Rf, Db, Sg, Bh, Hs, Mt, Ds, Rg, Cn, Nh, Fl, Mc, Lv, Ts, Og.

Write a program or function that generates all possible outputs from a single provided input. Both input and output may be in any form of your choosing. This could be a string, array of characters, or some other data structure: whatever is both convenient and clearly represents the input and output. Both input and output may be passed in/out of your code however you choose: standard in/out, function argument/return, or something else.

  • Input shall be a string (see previous paragraph) of positive length containing only ASCII characters of arbitrary case and the space (0x20) character.
  • Your code must generate all output strings that can be created using the input rules above.
  • The order of the output is implementation-defined. The only requirement is that all output strings are present.
  • If presented with a valid input string that does not contain any atomic symbols, simply output the input string.
  • If presented with an input string that is not valid per the rules above (null, zero characters, contains illegal characters, etc.) your program may do anything (crash, blank output, etc.)
  • Output is case-insensitive other than atomic symbols needing to match the periodic table.
  • Standard loopholes not allowed.

Test cases:

Tin Snips
[Ti][N] [Sn][I][P][S]
[Ti][N] [S][Ni][P][S]
[Ti][N] [S][N][I][P][S]
T[In] [Sn][I][P][S]
T[In] [S][Ni][P][S]
T[In] [S][N][I][P][S]
T[I][N] ...

Quack
Q[U][Ac][K]
Q[U]a[C][K]

hehe
[H]e[H]e
[H]e[He]
[He][H]e
[He][He]

Stack Exchange
[S][Ta][C][K] Ex[C][H]a[N][Ge]
[S]t[Ac][K] Ex[C][H]a[N][Ge]

This is code golf, so let me see your shortest code!

user18932

Posted 2017-03-10T21:23:13.687

Reputation:

1Per @Rassars comment Tin would be T[I][N] not [T][I][N] because T is not an element. My question (and possibly Rassar's) is: do we only have to give 1. Only outputs where the maximium number of element subsitutions are made? 2. Only the minimum amount of wastage? (The HeHe with hydrogens indicates the answer to this one is no) 3. All outputs where matches are completely exhausted? (in this case T[I][N] as well as T[In] would be valid.) I think the correct interpretation is 3. – Level River St – 2017-03-11T00:59:56.893

@LevelRiverSt you are correct, give each unique output where no letters are passed through that can be combined into an atomic symbol. So both T[I][N] and T[In] work, but T[I]n does not because the n can be worked into an atomic symbol. – None – 2017-03-11T03:35:38.767

1

I think this is a dup

– Digital Trauma – 2017-03-11T05:20:26.400

1So there are 2 possibilities for Quack: Q[U][Ac][K] and Q[U]a[C][K]. RIght? – RootTwo – 2017-03-11T05:35:31.290

@DigitalTrauma It's definitely related, but I don't think it's a dupe. The use of a wordlist as opposed to input in the other one differentiates it, plus the lack of strict specification on what versions of the ambiguous atomic symbols are to be used make the other one unclear and thus not a good dupe target. – Mego – 2017-03-11T07:04:08.650

Fixed Quack and Stack exchange cases. – CalculatorFeline – 2017-03-11T17:11:20.440

Checking other test cases for possibilities... – CalculatorFeline – 2017-03-11T17:14:30.503

1All cases verified. – CalculatorFeline – 2017-03-11T17:19:17.263

@CalculatorFeline thank you! I was going to do that when I had time but you beat me to it. – None – 2017-03-11T17:20:48.493

You're welcome :D Here's the verifier link. It outputs all elements that can fit into the input. Might miss a few errors (ones with no new elements), but the only way to solve that is to answer this question :P

– CalculatorFeline – 2017-03-11T17:23:49.647

What if you greedily find a short one and then fail to find a longer one? For example, for Heads, you might do [H]ea[Ds] rather than [He]a[Ds]? – Esolanging Fruit – 2017-03-12T04:54:02.053

1@Challenger5 "Your code must generate all output strings that can be created using the input rules above" – Jonathan Allan – 2017-03-12T07:23:47.683

Answers

5

Python 3, 289 263 bytes

Found a more complete library on Pypi: mendeleev

from mendeleev import*
Z={element(i).symbol for i in range(1,119)}
L,R='[]'
def f(h,r=''):t=h.title();return((Z&{t[0]}and f(h[1:],r+L+t[0]+R)or[])+(Z>{(t+L)[:2]}and f(h[2:],r+L+t[:2]+R)or[])+(not{(r[-1:]+t[0]).title(),t[0]}&Z and f(h[1:],r+h[0])or[]))if h else[r]

Old answer:

from elements import*
Z={e.symbol for e in ELEMENTS}|{*'Cn Ds Fl Lv Mc Nh Og Rg Ts'.split()}
L,R='[]'
def f(h,r=''):t=h.title();return((Z&{t[0]}and f(h[1:],r+L+t[0]+R)or[])+(Z>{(t+L)[:2]}and f(h[2:],r+L+t[:2]+R)or[])+(not{(r[-1:]+t[0]).title(),t[0]}&Z and f(h[1:],r+h[0])or[]))if h else[r]

Uses a library elements.py from http://www.lfd.uci.edu/~gohlke/code/elements.py.html. It is missing elements 110 to 118, but it was the most up to date library I could find. Cost 40 bytes to add the missing elements.

The trickiest part was the logic for when a character can be passed through without being part of an element symbol.

RootTwo

Posted 2017-03-10T21:23:13.687

Reputation: 1 749

1Uhh wait, wasn't mendeleev a user, not a library? – Matthew Roh – 2017-03-18T05:38:24.900

3

Jelly,  192  191 bytes

-1 by use of Ɗ (a since-developed quick)

“¦UV2ḤF2ı½ṅḶḊ⁼5JI5MẇvẋẊẊ¬Ḥḳ'ƈ<ḷėƤ7*⁾ṾxMæS⁺`?^Ƭb¦ɗDß⁼pþɲOṃ⁽2Ė>,Ḣ(ḞŒƊOGƤ×⁺ṇṂ®ȤT0y°^Ẇ⁺:Þ]ṢṬ¶ịṪḂƇ ñAƬCṫ$wÆĿĖỴỴƇẓƊqḌ@;ẏ`ṃFƥḣ⁽²»ḟ⁶s2;“¤²R.ȯ7ŒL£ɦ»Œt
Œte¢
ŒṖµL€=1oÇ€ṂµÐfµṡ2;€ÇÐfÇ€€S€¬SµÐḟ⁾[]jŒtƊ¹Ç?€€

Try it online! - Too inefficient for the "Stack Exchange" test case to complete within the 60s limit (running it offline gives the correct result within 2 minutes).

How?

The first line of code is a niladic link to create a list containing all 118 element symbols. To do so it concatenates two lists, the first containing all length 2 lists of characters (i.e. strings) the second a list of characters and title cases the resulting list. The two lists themselves are created mostly by looking up words in Jelly's dictionary to make single strings.

The first of these compressions is:

“¦UV2ḤF2ı½ṅḶḊ⁼5JI5MẇvẋẊẊ¬Ḥḳ'ƈ<ḷėƤ7*⁾ṾxMæS⁺`?^Ƭb¦ɗDß⁼pþɲOṃ⁽2Ė>,Ḣ(ḞŒƊOGƤ×⁺ṇṂ®ȤT0y°^Ẇ⁺:Þ]ṢṬ¶ịṪḂƇ ñAƬCṫ$wÆĿĖỴỴƇẓƊqḌ@;ẏ`ṃFƥḣ⁽²»

which yields

" biznagas sepmag ratbag catchflies paracmes mdse bharal ramcat monopteros irrepressibilities lunarnauts geniculate hopbinds rutabaga potlache broghs bergamas crossbirth purblind xebecs nonhardy classism fleurets moneybag scarce corf Mg Sr Zr CD HG CF FM Lr SG TM Gd Bk Fr Rh Fe Sn lv cndbmnnbkrmtpdnp"

Where all but the final entry (split by spaces) are entries in Jelly's dictionary. The spaces are filtered out with ḟ⁶, and then the result is split into twos:

["bi","zn","ag","as","se","pm","ag","ra","tb","ag","ca","tc","hf","li","es","pa","ra","cm","es","md","se","bh","ar","al","ra","mc","at","mo","no","pt","er","os","ir","re","pr","es","si","bi","li","ti","es","lu","na","rn","au","ts","ge","ni","cu","la","te","ho","pb","in","ds","ru","ta","ba","ga","po","tl","ac","he","br","og","hs","be","rg","am","as","cr","os","sb","ir","th","pu","rb","li","nd","xe","be","cs","no","nh","ar","dy","cl","as","si","sm","fl","eu","re","ts","mo","ne","yb","ag","sc","ar","ce","co","rf","Mg","Sr","Zr","CD","HG","CF","FM","Lr","SG","TM","Gd","Bk","Fr","Rh","Fe","Sn","lv","cn","db","mn","nb","kr","mt","pd","np"]

The second,

“¤²R.ȯ7ŒL£ɦ»

is formed from the concatenation of the words "finch", "pub", "sky", and "vow" (without spaces), and as such is a list of characters:

['f','i','n','c','h','p','u','b','s','k','y','v','o','w']

The two lists are concatenated with ; and every entry is title-cased using Œt, yielding:

["Bi","Zn","Ag","As","Se","Pm","Ag","Ra","Tb","Ag","Ca","Tc","Hf","Li","Es","Pa","Ra","Cm","Es","Md","Se","Bh","Ar","Al","Ra","Mc","At","Mo","No","Pt","Er","Os","Ir","Re","Pr","Es","Si","Bi","Li","Ti","Es","Lu","Na","Rn","Au","Ts","Ge","Ni","Cu","La","Te","Ho","Pb","In","Ds","Ru","Ta","Ba","Ga","Po","Tl","Ac","He","Br","Og","Hs","Be","Rg","Am","As","Cr","Os","Sb","Ir","Th","Pu","Rb","Li","Nd","Xe","Be","Cs","No","Nh","Ar","Dy","Cl","As","Si","Sm","Fl","Eu","Re","Ts","Mo","Ne","Yb","Ag","Sc","Ar","Ce","Co","Rf","Mg","Sr","Zr","Cd","Hg","Cf","Fm","Lr","Sg","Tm","Gd","Bk","Fr","Rh","Fe","Sn","Lv","Cn","Db","Mn","Nb","Kr","Mt","Pd","Np","F","I","N","C","H","P","U","B","S","K","Y","V","O","W"]

A list containing all 118 element symbols as required (there are duplicates, but that's fine).

The second line of code is a monadic link (a helper function designed to take one input) that returns a 1 if the input, titled-cased exists in the list created above and a 0 otherwise.

The third line of code is the main link, a monadic function that takes a string and returns a list of lists of characters (i.e. strings) as required:

ŒṖµL€=1oÇ€ṂµÐfµṡ2;€ÇÐfÇ€€S€¬SµÐḟ⁾[]jŒtƊ¹Ç?€€ - Main link: s
ŒṖ                                           - all partitions of s
  µ        µÐf                               - filter keep:
   L€=1                                      -     length €ach equals (vectorises) 1
       o                                     -     or
        ǀ                                   -     last link as a monad (is an element when title-cased)
          Ṃ                                  -     minimum 
                                             - (i.e. all partitions that are all single characters OR are strings that when title-cased are elements)
              µ              µÐḟ             - filter discard:
               ṡ2                            -     slices of length 2
                 ;€                          -     concatenate €ach
                    Ðf                       -     filter keep:
                   Ç                         -         last link as a monad (is an element when title-cased)
                      Ç€€                    -     last link as a monad for €ach for €ach
                         S€                  -     sum €ach
                           ¬                 -     logical not
                            S                -     sum
                                             - (i.e. discard any partitions that contain a run of two that joined together and title-cased ARE an element but separately NEITHER are)
                                         ?€€ - if then else for €ach (partition) for €ach (part):
                                        Ç    -     IF: last link as a monad (is an element when title-cased)
                                             -   THEN:
                                      Ɗ      -         last three links as a monad:
                                ⁾[]                      "[]"
                                   j         -           joined by:
                                    Œt       -           title case the part
                                             -   ELSE:
                                       ¹     -         the part itsef (¹ is the identity atom)

Jonathan Allan

Posted 2017-03-10T21:23:13.687

Reputation: 67 804

1

C++11, 944 928 bytes

Here's a piece of truly terrible code, but it should work. Could still be made a lot shorter probably.

#import<iostream>
#import<set>
using namespace std;int r,i;set<string>O;S(string&s){s[0]-=s[0]>90?32:0;if(s[1])s[1]+=s[1]<91?32:0;char*l="HHeLiBeBCNOFNeNaMgAlSiPSClArKCaScTiVCrMnFeCoNiCuZnGaGeAsSeBrKrRbSrYZrNbMoTcRuRhPdAgCdInSnSbTeIXeCsBaLaCePrNdPmSmEuGdTbDyHoErTmYbLuHfTaWReOsIrPtAuHgTlPbBiPoAtRnFrRaAcThPaUNpPuAmCmBkCfEsFmMdNoLrRfDbSgBhHsMtDsRgCnNhFlMcLvTsOg";for(r=0;*l++;)if(*l>90){if(*(l++-1)==s[0]&&*(l-1)==s[1])r=1;}else if(*(l-1)==s[0]&&!s[1])r=1;}P(set<string>*V,string s,string o,int b,int l=0,int m=0){if(!s[b])O.insert(o);else if(l)P(V,s,o,b+1);else if(V[b].size()==0)P(V,s,o+s[b],b+1);else for(auto t:V[b]){P(V,s,o+"["+t+"]",b+1,t.length()-1);if(t.length()>1&&V[b].size()==1&&V[b+1].size()>0&&!m)P(V,s,o+s[b],b+1,0,1);}}F(string s){set<string>V[s.length()];for(i=0;s[i++];){string t="";t+=s[i-1];S(t);if(r)V[i-1].insert(t);t+=s[i];S(t);if(r&&s[i])V[i-1].insert(t);}P(V,s,"",0);for(auto o:O)cout<<o<<"\n";O.clear();}

Call with:

int main()
{
    F("Tin Snips");cout << "\n";
    F("Quack");cout << "\n";
    F("hehe");cout << "\n";
    F("Stack Exchange");
}

Steadybox

Posted 2017-03-10T21:23:13.687

Reputation: 15 798