11
As you may know, in DNA there are four bases — adenine (A
), cytosine (C
), guanine (G
) and thymine (T
). Typically A
bonds with T
and C
bonds with G
, forming the "rungs" of the DNA double helix structure.
We define the complement of a base to be the base it bonds to — i.e. the complement of A
is T
, the complement of T
is A
, the complement of C
is G
and the complement of G
is C
. We can also define the complement of a DNA string to be the string with each base complemented, e.g. the complement of GATATC
is CTATAG
.
Because of the double-stranded structure of DNA, the bases on one strand are complementary to the bases on the other strand. However DNA has a direction, and DNA transcription occurs in opposite directions on the two strands. Hence molecular biologists are often interested in the reverse complement of a DNA string — quite literally the reverse of the complement of the string.
To extend our previous example, the reverse complement of GATATC
is CTATAG
backwards, so GATATC
. As you may have noticed, in this example the reverse complement is equal to the original string — we call such a string a reverse palindrome.*
Given a string of DNA, can you find the longest substring which is a reverse palindrome?
*I use the term "reverse palindrome", taken from Rosalind, to differentiate from the usual meaning of palindrome.
Input
Input will be a single string consisting of only the characters ACGT
in upper case. You may write either a function or a full program for this challenge.
Output
You may choose to output via either printing or returning (the latter choice is only available in the case of a function).
Your program should output the longest reverse palindromic substring of the input string, if there is a unique solution. If multiple solutions exist, then you may either output any single one of them, or all of them (your choice). Duplicates are okay if you choose to output all of them.
The input is guaranteed to have a solution of at least length 2.
Worked example
ATGGATCCG -> GGATCC
The reverse complement of GGATCC
is itself (GGATCC --complement--> CCTAGG --reverse--> GGATCC
), so GGATCC
is a reverse palindrome. GATC
is also a reverse palindome, but it is not the longest one.
Test cases
AT -> AT
CGT -> CG
AGCA -> GC
GATTACA -> AT, TA
ATGGATCCG -> GGATCC
CCCCCGGGGG -> CCCCCGGGGG
ACATATATAGACT -> ATATAT, TATATA
ATTCGATCTATGTAAAGAGG -> TCGA, GATC
CGCACGTCTACGTACCTACGTAG -> CTACGTAG
TCAATGCATGCGGGTCTATATGCAT -> ATGCAT, GCATGC [, ATGCAT]
CGCTGAACTTTGCCCGTTGGTAGAACGGACTGATGTGAACGAGTGACCCG -> CG, GC, TA, AT [, GC, CG, CG, CG, CG]
CTCGCGTTTGCATAACCGTACGGGCGGAACAGTCGGCGGTGCCTCCCAGG -> CCGTACGG
Scoring
This is code golf, so the solution in the fewest bytes wins.
It would have been nicer if printing all of them had some sort of bonus. – Optimizer – 2014-12-05T06:03:53.867
@Optimizer isn't printing just the longest more difficult than printing all of them? – trichoplax – 2014-12-06T15:03:40.550
Or do you mean printing all of the longest ones? – trichoplax – 2014-12-06T15:04:09.173
@githubphagocyte yes, your second comment. – Optimizer – 2014-12-06T15:12:13.783