18
3
Introduction
RNA is the less famous cousin of DNA. Its main purpose is to control the production of proteins in cells through a process called translation. In this challenge, your task is to implement a part of this process where the RNA is split into codons.
This challenge is thematically related, but concentrates on another part of the translation process.
Codons
We will think of RNA as a long string over the alphabet of base pairs, AUCG
.
In translation, RNA is split into non-overlapping chunks of three base pairs, called codons.
The process begins at a start codon, AUG
, and ends at a stop codon, one of UAA
, UAG
or UGA
.
Each codon (except the stop codons) corresponds to an amino acid, and the resulting string of amino acids forms the protein.
Input
Your input is a non-empty string of RNA.
Output
Your output is the list of codons in which the RNA is split, in any reasonable format.
In this simplified model, the process begins at the leftmost start codon AUG
, which is included in the output.
It ends when a stop codon is encountered, or when we run out of RNA.
If the input contains no start codon, the output shall be an empty list.
Examples
Consider the input sequence
ACAUGGAUGGACUGUAACCCCAUGC
The parsing begins at the leftmost occurrence of AUG
, at index 2.
It continues as follows:
AC AUG GAU GGA CUG UAA CCCCAUGC
* ^ ^ ^ +
The codon marked with *
is the start codon, and those marked with ^
are also part of the output.
The stop codon is marked with +
.
The correct output is
AUG,GAU,GGA,CUG
For the shorter input
ACAUGGAUGGACUGU
the process goes
AC AUG GAU GGA CUG U
* ^ ^ ^
This time, a stop codon is not encountered, so the process stops when we run out of base pairs. The output is the same as above.
Rules and scoring
You can write a full program of a function. The lowest byte count wins, and standard loopholes are disallowed.
Test cases
GGUACGGAUU ->
GGCGAAAUCGAUGCC -> AUG
ACAUGGAUGGACUGU -> AUG,GAU,GGA,CUG
AUGACGUGAUGCUUGA -> AUG,ACG
UGGUUAGAAUAAUGAGCUAG -> AUG,AGC
ACAUGGAUGGACUGUAACCCCAUGC -> AUG,GAU,GGA,CUG
CUAAGAUGGCAUGAGUAAUGAAUGGAG -> AUG,GCA
AAUGGUUUAAUAAAUGUGAUAUGAUGAUA -> AUG,GUU
UGUCACCAUGUAAGGCAUGCCCAAAAUCAG -> AUG
UAUAGAUGGUGAUGAUGCCAUGAGAUGCAUGUUAAU -> AUG,GUG,AUG,AUG,CCA
AUGCUUAUGAAUGGCAUGUACUAAUAGACUCACUUAAGCGGUGAUGAA -> AUG,CUU,AUG,AAU,GGC,AUG,UAC
UGAUAGAUGUAUGGAUGGGAUGCUCAUAGCUAUAAAUGUUAAAGUUAGUCUAAUGAUGAGUAGCCGAUGGCCUAUGAUGCUGAC -> AUG,UAU,GGA,UGG,GAU,GCU,CAU,AGC,UAU,AAA,UGU
13The relationship of DNA to RNA to protein was once explained to me in computing terms that I could understand: DNA equates to a program on a hard disk; RNA equates to that program loaded into memory; and protein equates to the output data produced as a result of that program running. – Digital Trauma – 2016-01-16T01:42:47.753
4The Dogma of molecular biology is "DNA makes RNA makes protein." So DNA is fairly rare, and RNA is less famous, but far more common. Protein is most common of all. – Level River St – 2016-01-16T01:53:27.270
1@DigitalTrauma: As a geneticist I need to point out that this analogy is woefully inadequate to describe the reality of how DNA works. DNA is not some dead thing waiting to be transcribed into RNA so it can do something. – Jack Aidley – 2016-01-16T10:08:51.070
What actually occurs in practice if a piece of mRNA terminates before a stop codon (as in the simple example), meaning no stop triplet for a release factor to bind to? – Reinstate Monica - ζ-- – 2016-01-16T11:52:11.003
1@Jack hard disk contents are not necessarily dead things either - upgrades, auto updates, etc, though of course not self-healing to the extent I understand DNA to be. But you're right - It is a weak analogy. However I think it got my non-geneticist self a little closer to a layman understanding – Digital Trauma – 2016-01-16T17:31:33.533
The Dogma is also horribly wrong when it comes to some types of viruses. – ApproachingDarknessFish – 2016-06-12T05:11:33.773
@DigitalTrauma DNA -> RNA -> protein chain -> protein folding -> goes of and does awesome nature – noɥʇʎԀʎzɐɹƆ – 2016-06-12T20:13:57.353
@DigitalTrauma My analogy would be DNA = github repo online, RNA = downloaded source code, protein chain = after ./.configure, protein folding = compiled program (it's super duper complicated, our computers can't even fold programming) – noɥʇʎԀʎzɐɹƆ – 2016-06-12T20:17:31.940