Molecules to Atoms

44

0

The Challenge

Write a program that can break down an input chemical formula (see below), and output its respective atoms in the form element: atom-count.


Input

Sample input:

H2O

Your input will always contain at least one element, but no more than ten. Your program should accept inputs that contain parentheses, which may be nested.

Elements in the strings will always match [A-Z][a-z]*, meaning they will always start with an uppercase letter. Numbers will always be single digits.


Output

Sample output (for the above input):

H: 2
O: 1

Your output can be optionally followed by a newline.


Breaking Down Molecules

Numbers to the right of a set of parentheses are distributed to each element inside:

Mg(OH)2

Should output:

Mg: 1
O: 2
H: 2

The same principle applies to individual atoms:

O2

Should output:

O: 2

And also chaining:

Ba(NO2)2

Should output:

Ba: 1
N: 2
O: 4

Examples

> Ba(PO3)2
Ba: 1
P: 2
O: 6

> C13H18O2
C: 13
H: 18
O: 2

> K4(ON(SO3)2)2
K: 4
O: 14
N: 2
S: 4

> (CH3)3COOC(CH3)3
C: 8
H: 18
O: 2

> (C2H5)2NH
C: 4
H: 11
N: 1

> Co3(Fe(CN)6)2
Co: 3
Fe: 2
C: 12
N: 12

Inputs are denoted by an arrow (greater-than sign; >).

Scoreboard

For your score to appear on the board, it should be in this format:

# Language, Score

Or if you earned a bonus:

# Language, Score (Bytes - Bonus%)

function getURL(e){return"https://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function getAnswers(){$.ajax({url:getURL(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),useData(answers)}})}function getOwnerName(e){return e.owner.display_name}function useData(e){var s=[];e.forEach(function(e){var a=e.body.replace(/<s>.*<\/s>/,"").replace(/<strike>.*<\/strike>/,"");console.log(a),VALID_HEAD.test(a)&&s.push({user:getOwnerName(e),language:a.match(VALID_HEAD)[1],score:+a.match(VALID_HEAD)[2],link:e.share_link})}),s.sort(function(e,s){var a=e.score,r=s.score;return a-r}),s.forEach(function(e,s){var a=$("#score-template").html();a=a.replace("{{RANK}}",s+1+"").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SCORE}}",e.score),a=$(a),$("#scores").append(a)})}var QUESTION_ID=58469,ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",answers=[],answer_ids,answers_hash,answer_page=1;getAnswers();var VALID_HEAD=/<h\d>([^\n,]*)[, ]*(\d+).*<\/h\d>/;
body{text-align:left!important}table thead{font-weight:700}table td{padding:10px 0 0 30px}#scores-cont{padding:10px;width:600px}#scores tr td:first-of-type{padding-left:0}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"><div id="scores-cont"><h2>Scores</h2><table class="score-table"><thead> <tr><td></td><td>User</td><td>Language</td><td>Score</td></tr></thead> <tbody id="scores"></tbody></table></div><table style="display: none"> <tbody id="score-template"><tr><td>{{RANK}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SCORE}}</td></tr></tbody></table>

Edit: Square brackets are no longer a part of the question. Any answers posted before 3AM UTC time, September 23, are safe and will not be affected by this change.

Zach Gates

Posted 2015-09-22T15:37:42.837

Reputation: 6 152

What are the allowed forms of input? – Oberon – 2015-09-22T15:48:33.910

1@ZachGates It's better that we're allowed to support either, but keep in mind that square brackets is still incorrect. AFAIK in chemical formulae square brackets are only used to indicated concentration. E.g.: [HCl] = 0.01 mol L^-1. – orlp – 2015-09-22T15:50:10.917

They are, but for all intensive purposes we're going to use them for grouping, too. @orlp Unless it's really a big deal; in which case, I'll removed brackets completely. – Zach Gates – 2015-09-22T15:51:05.247

See the "Examples" section. Is there something specific you're asking about? @Oberon Inputs are denoted by a >. – Zach Gates – 2015-09-22T15:51:31.080

Could you maybe add some really complex examples for people to test with? – orlp – 2015-09-22T16:53:12.997

I've just added a few more examples. Currently, only valid chemical formulas are shown in the post. I could perhaps add some non-real examples; just let me know. @orlp – Zach Gates – 2015-09-22T17:58:02.200

The prompt is needed? Is it enough to process one formula at once? The prompt is '> ' or just '>'? – uno20001 – 2015-09-22T18:43:16.423

No, it should handle just one input. The > format was purely for demonstration. @uno20001 – Zach Gates – 2015-09-22T18:47:43.133

Must the output format be exactly as above? Or are similar formats allowed? – orlp – 2015-09-22T19:15:37.760

Each element must be in the format element: atom-count followed by a newline. (the format above) @orlp – Zach Gates – 2015-09-22T19:18:08.550

+1 for a great challenge, but is there any good reason you are allowing square brackets? They are chemically incorrect and arguably don't add much to the question. – March Ho – 2015-09-23T02:58:34.883

I've just removed them. :-) Any answers posted before 3AM UTC, September 23, are safe and will not be affected by this change, as per the above edit. @MarchHo – Zach Gates – 2015-09-23T03:02:37.020

Would it be safe to assume that the numbers are always single digit? – Reto Koradi – 2015-09-23T04:17:55.817

Yes. I just edited to add this detail, also. @RetoKoradi – Zach Gates – 2015-09-23T04:19:02.197

Challenge would be so much better if you had to output the elements in Atomic Number order :-) – corsiKa – 2015-09-23T16:08:39.570

Perhaps the next one will have that element. @corsiKa – Zach Gates – 2015-09-23T17:38:06.687

I hope so, it will energize the community. – corsiKa – 2015-09-23T17:39:01.500

1Just a note, the examples still have elements with multiple digit atom counts. – ProgrammerDan – 2015-09-23T22:07:01.430

In Perl 5, it seems to me that s/(?|([A-Z][a-z]*)|\(((?0)+)\))(\d)?/$1x($2||1)/ge should work to expand the string (preparatory to parsing it further), but it doesn't. Maybe someone can work with that, though. – msh210 – 2016-04-22T20:09:08.483

Answers

11

CJam, 59 57 bytes

q{:Ci32/")("C#-"[ ] aC~* Ca C+"S/=~}%`La`-S%$e`{~": "@N}/

Try it online in the CJam interpreter.

How it works

q             e# Read all input from STDIN.
{             e# For each character:
  :Ci         e#   Save it in C and cast to integer.
  32/         e#   Divide the code point by 32. This pushes
              e#   2 for uppercase, 3 for lowercase and 1 for non-letters.
  ")("C#      e#   Find the index of C in that string. (-1 if not found.)
  -           e#   Subtract. This pushes 0 for (, 1 for ), 2 for digits,
              e#   3 for uppercase letters and 4 for lowercase letters.

 "[ ] aC~* Ca C+"

 S/           e#   Split it at spaces into ["[" "]" "aC~*" "Ca" "C+"].
 =~           e#   Select and evaluate the corresponding chunk.
              e#     (   : [    : Begin an array.
              e#     )   : ]    : End an array.
              e#     0-9 : aC~* : Wrap the top of the stack into an array
              e#                  and repeat that array eval(C) times.
              e#     A-Z : Ca   : Push "C".
              e#     a-z : C+   : Append C to the string on top of the stack.
}%            e#
`             e# Push a string representation of the resulting array.
              e# For input (Au(CH)2)2, this pushes the string
              e# [[["Au" [["C" "H"] ["C" "H"]]] ["Au" [["C" "H"].["C" "H"]]]]]
La`           e# Push the string [""].
-             e# Remove square brackets and double quotes from the first string.
S%            e# Split the result at runs of spaces.
$e`           e# Sort and perform run-length encoding.
{             e# For each pair [run-length string]:
  ~           e#   Dump both on the stack.
  ": "        e#   Push that string.
  @N          e#   Rotate the run-length on top and push a linefeed.
}/            e#

Dennis

Posted 2015-09-22T15:37:42.837

Reputation: 196 637

10

Python3, 157 154 bytes

import re
s=re.sub
f=s("[()',]",'',str(eval(s(',?(\d+)',r'*\1,',s('([A-Z][a-z]*)',r'("\1",),',input()))))).split()
for c in set(f):print(c+":",f.count(c))

Only supports input using regular brackets.

Before creating the golfed solution using eval above I created this reference solution, which I found very elegant:

import re, collections

parts = filter(bool, re.split('([A-Z][a-z]*|\(|\))', input()))
stack = [[]]
for part in parts:
    if part == '(':
        stack.append([])
    elif part == ')':
        stack[-2].append(stack.pop())
    elif part.isdigit():
        stack[-1].append(int(part) * stack[-1].pop())
    else:
        stack[-1].append([part])

count = collections.Counter()
while stack:
    if isinstance(stack[-1], list):
        stack.extend(stack.pop())
    else:
        count[stack.pop()] += 1

for e, i in count.items():
    print("{}: {}".format(e, i))

orlp

Posted 2015-09-22T15:37:42.837

Reputation: 37 067

10

Pyth, 66 65 bytes

VrSc-`v::z"([A-Z][a-z]*)""('\\1',),"",?(\d+)""*\\1,"`(k))8j": "_N

Port of my Python answer. Only supports input using regular brackets.

orlp

Posted 2015-09-22T15:37:42.837

Reputation: 37 067

3+1. Three answers in an hour? Nice. – Zach Gates – 2015-09-22T19:51:07.673

6

JavaScript ES6, 366 bytes

function f(i){function g(a,b,c){b=b.replace(/[[(]([^[(\])]+?)[\])](\d*)/g,g).replace(/([A-Z][a-z]?)(\d*)/g,function(x,y,z){return y+((z||1)*(c||1))});return(b.search(/[[(]/)<0)?b:g(0,b)}return JSON.stringify(g(0,i).split(/(\d+)/).reduce(function(q,r,s,t){(s%2)&&(q[t[s-1]]=+r+(q[t[s-1]]||0));return q},{})).replace(/["{}]/g,'').replace(/:/g,': ').replace(/,/g,'\n')}

JS Fiddle: https://jsfiddle.net/32tunzkr/1/

I'm pretty sure this can be shortened, but I need to get back to work. ;-)

styletron

Posted 2015-09-22T15:37:42.837

Reputation: 231

2I'm pretty sure it can be shortened as well. Since you clame to be using ES6, you could start by using the big-arrow notation to create functions. And the implicit return statement. That should be enough for now. – Ismael Miguel – 2015-09-22T22:59:07.397

You also use replace a lot so you can save some bytes by using xyz[R='replace'](...) the first time and abc[R] (...) each subsequent time. – DankMemes – 2015-09-23T15:25:30.807

6

SageMath, 156 148 bytes

import re
i=input()
g=re.sub
var(re.findall("[A-Z][a-z]?",i))
print g("(\d+).(\S+)\D*",r"\2: \1\n",`eval(g("(\d+)",r"*\1",g("([A-Z(])",r"+\1",i)))`)

Try it online here (hopefully link will work, might need an online account)

Note: If trying online, you will need to replace input() with the string (e.g. "(CH3)3COOC(CH3)3")

Explanation

Sage allows you to simplify algebraic expressions, providing they are in the right format (see 'symbolic manipulation' of this link). The regexes inside the eval() basically serve to get the input string into the right format, for example something like:

+(+C+H*3)*3+C+O+O+C+(+C+H*3)*3

eval() will then simplify this to: 8*C + 18*H + 2*O, and then it's just a matter of formatting the output with another regex substitution.

Jarmex

Posted 2015-09-22T15:37:42.837

Reputation: 2 045

5

Python 3, 414 bytes

I hope that the order of the result doesn't count.

import re
t=input().replace("[", '(').replace("]", ')')
d={}
p,q="(\([^\(\)]*\))(\d*)","([A-Z][a-z]*)(\d*)"
for i in re.findall(q,t):t = t.replace(i[0]+i[1],i[0]*(1if i[1]==''else int(i[1])))
r=re.findall(p,t)
while len(r)>0:t=t.replace(r[0][0]+r[0][1],r[0][0][1:-1]*(1if r[0][1]==''else int(r[0][1])));r=re.findall(p,t)
for i in re.findall(q[:-5], t):d[i]=d[i]+1if i in d else 1
for i in d:print(i+': '+str(d[i]))

uno20001

Posted 2015-09-22T15:37:42.837

Reputation: 321

5

Javascript (ES6), 286 284

Not that much shorter than the other ES6 one but I gave it my best. Note: this will error out if you give it an empty string or most invalid inputs. Also expects all groups to have a count of more than 1 (ie, no CO[OH]). If this breaks any challenge rules, let me know.

a=>(b=[],c={},a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "$11").match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g).reverse().map(d=>(d*1==d&&b.push(d*1),d.match(/\(|\[/)&&b.pop(),d.match(/[A-Z]/)&&eval('e=b.reduce((f,g)=>f*g,1),c[d]=c[d]?c[d]+e:e,b.pop()'))),eval('g="";for(x in c)g+=x+`: ${c[x]}\n`'))

Uses a stack-based approach. First, it preprocesses the string to add 1 to any element without a number, ie Co3(Fe(CN)6)2 becomes Co3(Fe1(C1N1)6)2. Then it loops through in reverse order and accumulates element counts.

a=>(
  // b: stack, c: accumulator
  b=[], c={},

  // adds the 1 to every element that doesn't have a count
  a.replace(/([A-Z][a-z]*)(?![0-9a-z])/g, "$11")

    // gathers a list of all the elements, counts, and grouping chars
    .match(/[A-Z][a-z]*|[0-9]+|[\[\(]/g)

    // loops in reverse order
    .reverse().map(d=>(

       // d*1 is shorthand here for parseInt(d)
       // d*1==d: true only if d is a number
       // if it's a number, add it to the stack
       d * 1 == d && b.push(d * 1),

       // if there's an opening grouping character, pop the last item
       // the item being popped is that group's count which isn't needed anymore
       d.match(/\(|\[/) && b.pop(),

       // if it's an element, update the accumulator
       d.match(/[A-Z]/) && eval('

         // multiplies out the current stack
         e = b.reduce((f, g)=> f * g, 1),

         // if the element exists, add to it, otherwise create an index for it
         c[d] = c[d] ? c[d] + e : e,

         // pops this element's count to get ready for the next element
         b.pop()
       ')
  )),

  // turns the accumulator into an output string and returns the string
  eval('
    g="";

    // loops through each item of the accumulator and adds it to the string
    // for loops in eval always return the last statement in the for loop
    // which in this case evaluates to g
    for(x in c)
      g+=x+`: ${c[x]}\n`
  ')
)

Fiddle

DankMemes

Posted 2015-09-22T15:37:42.837

Reputation: 2 769

5

Perl, 177 172 bytes

171 bytes code + 1 byte command line parameter

Ok, so I may have got a little carried away with regex on this one...

s/(?>[A-Z][a-z]?)(?!\d)/$&1/g;while(s/\(([A-Z][a-z]?)(\d+)(?=\w*\W(\d+))/$2.($3*$4).$1/e||s/([A-Z][a-z]?)(\d*)(\w*)\1(\d*)/$1.($2+$4).$3/e||s/\(\)\d+//g){};s/\d+/: $&\n/g

Usage example:

echo "(CH3)3COOC(CH3)3" | perl -p entry.pl

Jarmex

Posted 2015-09-22T15:37:42.837

Reputation: 2 045

2

Mathematica, 152 bytes

f=TableForm@Cases[PowerExpand@Log@ToExpression@StringReplace[#,{a:(_?UpperCaseQ~~___?LowerCaseQ):>"\""<>a<>"\"",b__?DigitQ:>"^"<>b}],a_. Log[b_]:>{b,a}]&

The above defines a function f which takes a string as input. The function takes the string and wraps each element name into quotes and adds an infix exponentiation operator before each number, then interprets the string as an expression:

"YBa2Cu3O7" -> ""Y""Ba"^2"Cu"^3"O"^7" -> "Y" "Ba"^2 "Cu"^3 "O"^7

Then it takes the logarithm of that and expands it out (mathematica doesn't care, what to take the logarithm of :)):

Log["Y" "Ba"^2 "Cu"^3 "O"^7] -> Log["Y"] + 2 Log["Ba"] + 3 Log["Cu"] + 7 Log["O"]

and then it finds all occurrences of multiplication of a Log by a number and parses it into the form of {log-argument, number} and outputs those in a table. Some examples:

f@"K4(ON(SO3)2)2"
K   4
N   2
O   14
S   4


f@"(CH3)3COOC(CH3)3"
C   8
H   18
O   2


f@"Co3(Fe(CN)6)2"
C   12
Co  3
Fe  2
N   12

LLlAMnYP

Posted 2015-09-22T15:37:42.837

Reputation: 361

1

Java, 827 bytes

import java.util.*;class C{String[]x=new String[10];public static void main(String[]a){new C(a[0]);}C(String c){I p=new I();int[]d=d(c,p);for(int i=0;i<10;i++)if(x[i]!=null)System.out.println(x[i]+": "+d[i]);}int[]d(String c,I p){int[]f;int i,j;Vector<int[]>s=new Vector();while(p.v<c.length()){char q=c.charAt(p.v);if(q=='(')s.add(d(c,p.i()));if(q==')')break;if(q>='A'&&q<='Z'){f=new int[10];char[]d=new char[]{c.charAt(p.v),0};i=1;if(c.length()-1>p.v){d[1]=c.charAt(p.v+1);if(d[1]>='a'&&d[1]<='z'){i++;p.i();}}String h=new String(d,0,i);i=0;for(String k:x){if(k==null){x[i]=h;break;}if(k.equals(h))break;i++;}f[i]++;s.add(f);}if(q>='0'&&q<='9'){j=c.charAt(p.v)-'0';f=s.get(s.size()-1);for(i=0;i<10;)f[i++]*=j;}p.i();}f=new int[10];for(int[]w:s){j=0;for(int k:w)f[j++]+=k;}return f;}class I{int v=0;I i(){v++;return this;}}}

Git repository w/ ungolfed source (not perfect parity, ungolfed supports multi-character numbers).

Been a while, figured I'd give Java some representation. Definitely not going to win any awards :).

ProgrammerDan

Posted 2015-09-22T15:37:42.837

Reputation: 1 129

1

ES6, 198 bytes

f=s=>(t=s.replace(/(([A-Z][a-z]?)|\(([A-Za-z]+)\))(\d+)/,(a,b,x,y,z)=>(x||y).repeat(z)))!=s?f(t):(m=new Map,s.match(/[A-Z][a-z]?/g).map(x=>m.set(x,-~m.get(x))),[...m].map(([x,y])=>x+": "+y).join`\n`)

Where \n is a literal newline character.

Ungolfed:

function f(str) {
    // replace all multiple elements with individual copies
    // then replace all groups with copies working outwards
    while (/([A-Z][a-z]?)(\d+)/.test(str) || /\(([A-Za-z]+)\)(\d+)/.test(str)) {
        str = RegExp.leftContext + RegExp.$1.repeat(RegExp.$2) + RegExp.rightContext;
    }
    // count the number of each element in the expansion
    map = new Map;
    str.match(/[A-Z][a-z]?/g).forEach(function(x) {
        if (!map.has(x)) map.set(x, 1);
        else map.set(x, map.get(x) + 1);
    }
    // convert to string
    res = "";
    map.forEach(function(value, key) {
        res += key + ": " + value + "\n";
    }
    return res;
}

Neil

Posted 2015-09-22T15:37:42.837

Reputation: 95 035

1

Pip, 85 77 + 1 = 78 bytes

Non-competing answer because it uses language features that are newer than the challenge. Takes the formula as a command-line argument, and uses the -n flag for proper output formatting.

Y(VaRl:`([A-Z][a-z]*)``"&"`R`\d+``X&`R`(?<=\d|")[("]``.&`l)u:UQyu.": ".Y_NyMu

Try it online!

The main trick is to transform the formula via regex replacements into a Pip expression. This, when eval'd, will do the repetition and resolve parentheses for us. We then post-process a bit to get the atom counts and format everything correctly.

Ungolfed, with comments:

                         a is command-line arg (implicit)
l:`([A-Z][a-z]*)`        Regex matching element symbols
aR:l `"&"`               Replace each symbol in a with symbol wrapped in quotes
aR:`\d+` `X&`            Add X before each number
aR:`(?<=\d|")[("]` `.&`  Add . before ( or " if it's preceded by a digit or "
Y (Va)@l                 Eval result, findall matches of l, and yank resulting list into y
u:UQy                    Remove duplicates and store in u
u.": ".(_Ny M u)         Map function {a IN y} to u, returning list of element counts;
                           append this (with colon & space) itemwise to list of symbols
                         Print that list, newline-separated (implicit, -n flag)

Here's how the input Co3(Fe(CN)6)2 is transformed:

Co3(Fe(CN)6)2
"Co"3("Fe"("C""N")6)2
"Co"X3("Fe"("C""N")X6)X2
"Co"X3.("Fe".("C"."N")X6)X2
CoCoCoFeCNCNCNCNCNCNFeCNCNCNCNCNCN

Then:

["Co" "Co" "Co" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "Fe" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N" "C" "N"]
["Co" "Fe" "C" "N"]
[3 2 12 12]
["Co: 3" "Fe: 2" "C: 12" "N: 12"]

DLosc

Posted 2015-09-22T15:37:42.837

Reputation: 21 213