VSEPR Strikes Back! [Revamped Again]

2

(Note: I know that the VSEPR method fails sometimes, and that there are exceptional molecules. The challenge is about the molecules which conform.)

Most people who have taken an introductory chemistry course know about molecules, and (probably) the VSEPR theory of chemical bonding. Basically, the theory predicts the shape of a molecule given three main properties:

  • A: The central atom. You may assume this is the first element appearing in the chemical formula; for example, in CH4, you may assume the central atom is C.
  • X: The number of atoms bonded to the central atom A.
  • E: The number of "lone electron pairs" on the central atom. For the purposes of this challenge, the number of outer electrons A has is given by the last digit of the element's column number on the periodic table.

    • If A is bonded to an element X in column 17 of the periodic table, there will be a mutual sharing of electrons, so that A effectively has one additional electron for each atom of X attached to it. (This does not apply for elements such as O or S in column 16.)

Each bond requires two of these electrons. Leftover electrons not used in bonding with other atoms are found in pairs of two, called lone pairs. So E is the number of electrons not used in bonding by the central atom A divided by 2. (It may be assumed that input will always lead to A having an even number of electrons.)

For example, a molecule which has 1 lone pair and 3 atoms bonded to the main atom is AX3E1, which is the trigonal pyramidal configuration. (There is always only 1 central atom, so A never has a subscript.)

The Challenge

Your job is, given a string representing a covalent chemical compound, to output the shape of that molecule. "But wait!", exclaim the exasperated programmers, "you can't expect us to input all of the molecular data for every element!" Of course we can, but as that wouldn't be fun, we'll only consider the compounds formed by the following 7 elements:

Carbon    C
Sulfur    S
Oxygen    O
Nitrogen  N
Chlorine  Cl
Bromine   Br
Fluorine  F

The input is any molecule composed of the above 7, such as CO2 or SF4, that is in the form AXN, where A and X are each one of the above seven element symbols, X is not C or N, and N is a number between 2 and 6 inclusive.

The output should be the name of the geometric shape of the molecule, for example "linear".

The following (don't get scared, I swear they're less than they seem!) is a list of all possible configurations along with their AXE number and a drawing of how they look (adapted from Wikipedia's nifty table):

All AXE configurations

Test Cases

CO2: CO2 -> linear
OF2: OF2 -> bent
CCl2: CCl2 -> bent
BrF3: BrF3 -> T-shaped
SF4: SF4 -> seesaw

Notes:
- We won't consider compounds in which the first element has a subscript other than 1, such as C2H4.
- Standard loopholes apply.
- This is a code golf, so the shortest answer wins (obviously)

Daccache

Posted 2015-07-20T07:39:46.853

Reputation: 155

1Welcome to PPCG. Apart from the fact that Ethylene is C2H4 there are multiple ambiguities here. 1.I think you should abandon substances like ethylene and guarantee that the first named element will be A and there will be only one. 2.Your explanation is inadequate for a non chemist, you should at least explain that the number of electrons in the outer shell can be found by looking at the periodic table. 3. Is X guaranteed to be mono or divalent? is it only halogens, hydrogen and oxygen? 4.there is nothing in the basic spec about ionic compounds such as NaCl, which ones have to be detected? – Level River St – 2015-07-20T11:28:31.030

Voting to close as unclear. I suggest you simplify the bonuses and give a fixed list of elements that must be supported (and that edge cases where VSEPR fails, including ionic compounds, be ignored, otherwise it's a chemistry question not a programming one.) You should also list the molecule shapes exactly as they are to be output. Writing a good question is hard. I suggest you consider using http://meta.codegolf.stackexchange.com/q/2140/15599

– Level River St – 2015-07-20T11:31:23.917

BTW, I don't think this deserves a downvote (if whoever gave it is reading.) There's potential for a good question. It's just that codegolf requires a really tight spec to be fair and sporting. A few more points: 5. What is the maximum value of X+E to be supported? The examples go up to 6, I'm used to going up to 7, but Wikipedia goes up to 9! 6. you should explain subtleties around AB3X2 : T shaped preferred to keep the lone pairs far apart. 7. Angles can only be found by looking them up, not calculating, so have no place in a programming question like this. BrF3 is actually 86.2deg, not 90. – Level River St – 2015-07-20T18:41:43.720

@steveverrill Hello and thanks for such a thorough breakdown of where my question doesn't add up. I actually did place the question in the sandbox a day or two before, but got no feedback, so I decided to just try my luck. About your suggestions (thanks again for those!): 1. Changed. 2. Tried to make clearer and more elementary. 3. I only allowed 7 elements, so that should make the question less broad. 4. I defined what counts as an ionic compound here, didn't fully understand what you meant though. Simplified the bonuses and other small things like removing the angles. – Daccache – 2015-07-20T19:39:36.357

start="5">

  • I think X + E is bounded by the restrictions on the possible compounds, otherwise 6 should be good (and I doubt it goes higher than that.) 6. I'd prefer to remove it altogether so as not to make the question too technical. 7. Took them out.
  • < – Daccache – 2015-07-20T19:44:10.920

    Feedback in the sandbox can be patchy and slow, especially when your question needs knowledge of a particular field. With the restriction on elements I agree X+E is limited to 6. Of the remaining bonuses 2 doesn't make sense because the elements in wikipedia's exceptions are outside the ones listed. Bonus 1 requires a library of all nonmetals, but some elements are ambiguous. A cheat implementation would be to consider all elements not on the list of 7 to be "metals." for example if I said "ionic" for iodine it would pass, because the program isn't supposed to cope with that element. – Level River St – 2015-07-20T20:54:35.747

    +1 for what you've fixed so far, but in order to cast a reopen vote, I'd like to see: bonuses gone as they don't seem to make sense; removal of the examples containing metals, as they'd no longer be relevant; but on the other hand, if you want to include ions, give some examples of that. And most importantly, include the table of AXnEm with the required output for each one (all relevant lines from AX to AX5E / AX6). I know it's halfway down the wikipedia article, but it would be far less ambiguous if it were in the question. – Level River St – 2015-07-20T21:01:52.623

    @steveverrill Changed as per your suggestions, although on seeing the table of configurations, I'm starting to think a steric number of 6 is a bit too high. Maybe I should reduce it further to 5? Thanks again for all your time on this question. – Daccache – 2015-07-21T19:22:20.200

    I've made some edits to your explanation. A does not neccesarily have 8 electrons, it actually has (X+E)*2 electrons, which can be a reduced octet as in BF3 or SO3 (6 electrons) or an increased octet as in SF4 (10 electrons.) That's why you get the different shapes. The issue of pi backdonation by O in compounds like SO3 or CO2 (to create a double bond) is best ignored in VSEPR. I deleted the bit about ions as it creates some issues (though I'm no longer sure they're serious.) roll back if you disagree. – Level River St – 2015-07-21T20:22:59.610

    I don't see the need to reduce steric number. 6 is a natural number for the elements you've restricted it to, as in SF6 or SCl5 (i'm not sure if steric number refers to X or X+E.) Reducing it would be a new arbitrary change. Does output have to be exactly per the table? (It's a lot of characters, that's why I've wanted to clarify.) I assume seesaw is accepted instead of disphenoidal. If you agree with what I've done I'll cast a reopen vote. It's a shame it's taken a while to get this clear. Next time leave it a bit longer in the sandbox, it's recommended to wait for some upvotes or a comment – Level River St – 2015-07-21T20:35:31.990

    @steveverrill thank you (and Thomas) for your edits. Your help has both really made the question do a 180 in terms of quality and clarity (although on second thought any subsequent science questions I might do in the future should be much less technical.) I do agree 6 is the natural value of X + E imposed by the elements chosen, and reducing it would probably cause more harm than good. Oh and yes, the names of the shapes should be exact, and as they're long, I don't really expect to see any ultra-short answers anyway. (and seesaw is accepted, who uses disphenoidal anyways?) – Daccache – 2015-07-22T05:13:03.130

    Answers

    2

    Ruby, 250

    s=gets
    e=7-"FBOSN C".index(s[0])/2+(s[1]==?l?3:0)
    x=s.reverse!.to_i
    e+="Frl".index(s[2])?x:0
    t="trigonal "
    puts [["linear",b="bent",b],[t+"planar",t+p="pyramidal","T-shaped"],["tetr"+h="ahedral","seesaw"],[t+"bi"+p,"square "+p],["oct"+h]][x-2][e/2-x]
    

    Note that with the rules as they are, AX2E3 and AX4E2 are not possible with the given elements, so they are not supported. All test cases passed.

    with comments

    s=gets                                     #get input with newline attached
    e=7-"FBOSN C".index(s[0])/2+(s[1]==?l?3:0) #calculate the number of electrons (not lone pairs) from A. Note the addition required to differentiate C and Cl.
    x=s.reverse!.to_i                          #reverse the input string so the number is at the beginning (after the newline) and evaluate the string as a number to get X.
    e+="Frl".index(s[2])?x:0                   #if the character after the number is in "Frl" we have a halogen so increment number of electrons accordingly.
    t="trigonal "                              #as "trigonal " goes at the beginning of the string, we cannot define it on the fly, it is defined as a constant here.
                                               #array of outputs below. note the subscripts at the end: x-2 to put it in the range 0..4 and e/2-x to convert e into the actual number of lone pairs.
    puts [["linear",b="bent",b],[t+"planar",t+p="pyramidal","T-shaped"],["tetr"+h="ahedral","seesaw"],[t+"bi"+p,"square "+p],["oct"+h]][x-2][e/2-x]
    

    Level River St

    Posted 2015-07-20T07:39:46.853

    Reputation: 22 049

    Nice answer. I like the way you were able to cut down on the bytes for the names by concatenating bits and pieces of the words. Cheers! – Daccache – 2015-07-23T05:56:57.517