Drawing Lewis Structures of Alkanes

17

3

I just had a lesson in school about alkanes and figured it would probably make for a great code golf challenge! Don't worry, it isn't as complicated as it may look!

A Quick Rehash

(Please Note: To keep this brief, not all information is 100% accurate.)

Alkanes are strings of carbon and hydrogen. Every carbon atom has 4 bonds, and every hydrogen atom 1 bond. All the carbon atoms of the alkane form a string where each C-atom is connected to 2 other C-atoms (left and right in the Lewis structure) and 2 H-atoms (up and down), except for the ends of the string, where the C-atom is connected to only 1 other C but 3 Hs. Here's a basic example for pentane (an alkane with 5 C-atoms and 12 H-atoms):

  H H H H H
  | | | | |
H-C-C-C-C-C-H
  | | | | |
  H H H H H

Alkanes also can have branches. But don't worry, all alkanes in this challenge can be expressed with only 1 level of branching. Example:

        H
        |
      H-C-H
  H H H | H
  | | | | |
H-C-C-C-C-C-H
  | | | | |
  H H H H H

For completing this challenge you also must understand the IUPAC naming convention for branched alkanes. First of there's the root alkane. In our previous example, this would be the "C-C-C-C-C" part. Depending on how long this chain is, it has a different name. 1 C is called methane, 2 C ethane, 3 C propane, then butane, pentane, hexane, heptane, octane, nonane and decane (10 C). Then, for each branch, there's a certain prefix to that: First of, there's the index (offset) of the C-atom the branch is appended to (count from the left). In the example, this would be 4 (aka it's the 4th C-atom from the left). Then there's a hyphen (this symbol: "-") and after that another name indicating the size of the branch. The naming of the size of the branch is almost same like the naming of the size of the root, just that instead of "ane" you append "yl". With that the full name of the example would be

4-methylpentane

If you have multiple branches, then they are prepended as well, separated by another hyphen. Example:

2-butyl-5-methylhexane

One last thing: if you have multiple branches of the same size, they get grouped; their offsets are separated by comma and they share the same size-name, which gets prepended by an extra syllable depending on how many branches are grouped: "di" for 2 branches, "tri" for 3, "tetra" for 4 (you don't need more for this challenge). Example:

2-ethyl-2,4,6-trimethyldecane

FYI, this could look somewhat like this: (Omitted H-atoms)

   |
  -C-
   |       |
  -C-     -C-
 | | | | | | | | | |
-C-C-C-C-C-C-C-C-C-C-
 | | | | | | | | | |
  -C- -C-
   |   |

Nomenclature Cheatsheet

Prefixes indicating numbers:
| Num  | Prefix |
|------|--------|
| 1    | meth   |
| 2    | eth    |
| 3    | prop   |
| 4    | but    |
| 5    | pent   |
| 6    | hex    |
| 7    | hept   |
| 8    | oct    |
| 9    | non    |
| 10   | dec    |
Suffix root:   ane
Suffix branch: yl
Prefixes grouping: di, tri, tetra

The Rules

Write a program that reads in such an IUPAC name from STDIN, program arguments or equivalent and draws it as an ASCII-art lewis structure to STDOUT (or equivalent).

  • For simplicity, you DO NOT have to draw the H-atoms (Else you will run into spacing issues)
  • You are NOT allowed to print any empty leading or trailing horizontal lines
  • The chains you have to parse won't be any longer than 10, and the maximum of branches in a "group" is limited to 4.
  • The maximum "offset" of a branch is 9 (meaning that you don't have to parse more than 1 digit)
  • Your branches have to alternate between going up and down after every new branch. In case this space is already taken by another branch at the same offset, you have to draw it on the other side of the root. (up->down, down->up)
  • On corrupt, not-correctly-formatted or otherwise not-drawable input, your program may behave unspecified.

This is code golf, shortest code in bytes wins!

Happy golfing! :)

Thomas Oltmann

Posted 2015-11-17T16:47:09.933

Reputation: 471

Should 4-methylpropane say 4-methylpentane? 4-<anything>propane seems unlikely, unless I've completely misunderstood something. – Peter Taylor – 2015-11-17T17:14:12.947

Yes, you're right. Edited it! – Thomas Oltmann – 2015-11-17T17:23:49.067

4The last molecule you have is 3-3-5-7-methyldodecane because the longest continuous carbon chain is 12 long. Also, you said not all the info in the question was accurate, but I think it's worth pointing out that the second molecule is 2-methylpentane, not 4-methylpentane because you start at the carbon with the closest branching. – Arcturus – 2015-11-17T17:30:28.360

I know, but that was exactly the inaccuracy I was disclaiming. That's just the trade off for keeping it short enough for a code golf challenge! :) – Thomas Oltmann – 2015-11-17T17:33:27.930

1>

  • Your branches have to alternate between going up and down after every new branch. your example violates this rule 2. What is the maximum chain length we have to support (parsing the prefixes will be a part of the challenge.) You should link (or preferably copy) a list of nomenclature.
  • < – Level River St – 2015-11-17T18:20:32.690

    @steveverrill 1. Sorry, by that rule I meant something else. Edited it now as "You are NOT allowed to print any empty leading or trailing horizontal lines ". 2. Max chain size 10, max group size 4. hope this makes things more clear! :) – Thomas Oltmann – 2015-11-17T18:59:20.737

    I wasn't referring to the rule you edited. I was referring to the one about alternating branches. Your example shows 2 consecutive methyl groups above the line, which is in violation of your rule. Please clarify. – Level River St – 2015-11-17T19:06:00.660

    Oh that; That's not really an example for what the program should output, but what "2-ethyl-2,4,6-trimethyldecane" should generally look like. But you're right, I think it would cause less confusion if it would conform to the rules. – Thomas Oltmann – 2015-11-17T19:12:02.327

    Answers

    3

    Python 2, 620 bytes

    import re
    i=input()
    s='m|e|pr|b|p|hex|h|o|n|de';d=dict(zip(s.split('|'),range(1,11)))
    z=[[eval('['+a+']'),d[b]]for a,b in re.findall('(?:(\d[,\d]*).*?[\-ia]|l)('+s+')',i[:-3])]
    v=z[-1][1]
    l=[[0,0]for _ in range(v)]
    c=0
    for a,b in sorted([(i,b)for a,b in z[:-1]for i in a]):l[a-1][c]=b;c=~c
    m=[max(x) for x in zip(*l)]
    L,R=[[[' 'for _ in '_'*2*i]for _ in '_'*(2*v+1)]for i in m]
    c=[' |'*v+' ']
    C=c+['-C'*v+'-']+c
    for i in range(len(l)):
     X=L;q=2*i+1
     for a in l[i]:
      if a>0:
       for j in range(0,2*a,2):
        X[q][j]='C'
        X[q-1][j]=X[q+1][j]='-'
        X[q][j+1]='|'
      X=R
    for l in zip(*L)[::-1]+C+zip(*R):print ''.join(l)
    

    Explanation

    Input: '2-ethyl-2,4,6-trimethyldecane'

    First parses to string with regex (last group is root):

    [[[2], 2], [[2, 4, 6], 1], [[], 10]]

    Each branch is written in an array of length len(root) (Alternating up/down is handled here):

    [[0, 0], [1, 2], [0, 0], [1, 0], [0, 0], [0, 1], [0, 0], [0, 0], [0, 0], [0, 0]]

    'Left' and 'Right'(L,R) and 'root' (C) string branches are initialized.

    Each branch is then added to the corresponding 'string' branch (big loop).

    The two sides and center are printed at the end:

       |   |             
      -C- -C-            
     | | | | | | | | | | 
    -C-C-C-C-C-C-C-C-C-C-
     | | | | | | | | | | 
      -C-     -C-        
       |       |         
      -C-                
       |                 
    

    TFeld

    Posted 2015-11-17T16:47:09.933

    Reputation: 19 246