Print The Formula

8

Introduction

In chemistry there is a type of extension, .xyz extension,(https://en.wikipedia.org/wiki/XYZ_file_format), that prints in each line a chemical element, and the coordinates in the plane of the element. This is very useful for chemists to understand chemical compounds and to visualize the compounds in 3D. I thought it would be fun to, given a .xyz file, print the chemical formula.

Challenge

Given an .xyz file, print the chemical formula of the compound in any programming language in the smallest possible number of bytes. Note:

  • Originally, the input was to be given as a file. As I have been pointed out, this constraints the challenge. Therefore you may assume the input is a list/array of strings, each representing a line of the .xyz file.
  • There are no restrictions in the ordering of the elements.
  • Each element should be printed with an underscore "_" delimiting the element and the number of times it appears
  • The first two lines of any .xyz file is the number of elements, and a comment line (keep that in mind).

Example Input and Output

Suppose you have a file p.xyz which contains the following (where the first line is the number of elements, and the second a comment), input:

5  
A mystery chemical formula...  
Ba      0.000   0.000  0.000  
Hf      0.5     0.5    0.5  
O       0.5     0.5    0.000  
O       0.5     0.000  0.5  
O       0.000   0.5    0.5  

Output:
Ba_1Hf_1O_3


Testing

A quick test is with the example mentioned. A more thorough test is the following: since the test file is thousands of lines, I'll share the .xyz file:
https://gist.github.com/nachonavarro/1e95cb8bbbc644af3c44

McGuire

Posted 2016-01-26T22:35:23.617

Reputation: 189

Requiring input to be read from a file unnecessarily and unfairly prohibits a very significant portion of programming languages from participating in your challenge. See: http://meta.codegolf.stackexchange.com/a/8077/3808, http://meta.codegolf.stackexchange.com/q/2447/3808

– Doorknob – 2016-01-26T22:37:58.650

@Doorknob Good point. I've changed that. – McGuire – 2016-01-26T22:41:56.350

Code golf, code challenge, and fastest code are mutually-exclusive tags. The score should be in bytes, not characters. The file format should be completely described in the question, instead of requiring outside resources. Use a GitHub Gist instead of an untrusted file hosting service. Because of these issues, I'm voting to close as unclear. Please use the Sandbox in the future.

– Mego – 2016-01-26T22:46:19.537

2@Mego how about now? :) – McGuire – 2016-01-26T22:49:14.770

@nacho There are still many issues with it. You should probably delete this question, post it in the Sandbox, and then post it to main after getting feedback and fixing issues that others point out. – Mego – 2016-01-26T22:50:45.190

5what is the answer for the large test case? – Maltysen – 2016-01-26T23:08:23.150

3Does ordering matter in the output? – Digital Trauma – 2016-01-26T23:18:16.380

Answers

2

Pyth - 18 bytes

sjL\__MrShMcR;ttQ8

Try it online here.

Maltysen

Posted 2016-01-26T22:35:23.617

Reputation: 25 023

2

Japt, 21 bytes

U=¢m¸mg)â £X+'_+Uè_¥X

Test it online! Input is given as an array of strings (which can be formatted as in the link).

Ungolfed and explanation

U=¢   m¸  mg)â £    X+'_+Uè_  ¥ X
U=Us2 mqS mg)â mXYZ{X+'_+UèZ{Z==X

          // Implicit: U = input array of strings
Us2       // Slice off the first two items of U.
mqS mg    // Map each item by splitting at spaces, then taking the first item.
U=    )   // Set U to the result.
â mXYZ{   // Uniquify, then map each item X to:
UèZ{Z==X  //  Count the number of items Z in U where Z == X.
X+'_+     //  Prepend X and an underscore.
          // Implicit output

ETHproductions

Posted 2016-01-26T22:35:23.617

Reputation: 47 880

1

AWK, 44

NR>2{a[$1]++}END{for(i in a)printf i"_"a[i]}

Try it online.

Digital Trauma

Posted 2016-01-26T22:35:23.617

Reputation: 64 644

0

Shell + GNU Utilities, 67

sed '1d;2d;s/ .*//'|sort|uniq -c|sed -Ez 's/\s*(\S+) (\S+)/\2_\1/g'

Try it online.

Digital Trauma

Posted 2016-01-26T22:35:23.617

Reputation: 64 644

1d;2d1,2d – manatwork – 2016-01-27T18:29:33.473

Just because trailing spaces in the output are not forbidden: tail -n+3|cut -c-3|sort|uniq -c|sed -rz 's/\s*(\S+) (\S+)/\2_\1/g' – manatwork – 2016-01-27T18:41:41.980

0

Mathematica, 79 53 bytes

StringRiffle[Tally@StringExtract[#[[3;;]],1],"","_"]&

Quite simple.

LegionMammal978

Posted 2016-01-26T22:35:23.617

Reputation: 15 731