Golf Text into DNA

26

3

Text to DNA golf

Challenge

Convert input into a DNA output.

Algorithm

  • Convert text into ASCII code points (e.g. codegolf -> [99, 111, 100, 101, 103, 111, 108, 102])
  • String the ASCII codes together (e.g. 99111100101103111108102)
  • Convert to binary (e.g. 10100111111001101001011010001000011001101011011110000110010111111011000000110)
  • Pad 0s onto the end to make an even number of characters (e.g. 101001111110011010010110100010000110011010110111100001100101111110110000001100)
  • Replace 00 with A, 01 with C, 10 with G, and 11 with T (e.g. GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA)
  • Output

Test Cases

codegolf > GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA
ppcg > GGCTAATTGTCGCACTT
} > TTGG (padding)

Specifications

  • This is
  • Your program must accept spaces in input.
  • Your program must work for codegolf.

NoOneIsHere

Posted 2016-05-02T19:04:46.360

Reputation: 1 916

2I think you should add a test case that requires the padding behaviour. The lazy choice would be } which I believe becomes TTGG. – FryAmTheEggman – 2016-05-02T19:22:37.207

3How large of input do we need to support? 99111100101103111108102 for example is larger than uint-64, so some languages may struggle with bigger conversions. – AdmBorkBork – 2016-05-02T19:46:09.400

4That is not how you string ASCII codes together if you want to ever be able to decode them again. – user253751 – 2016-05-04T03:22:54.853

@immibis I know. – NoOneIsHere – 2016-05-04T03:51:04.630

Answers

17

Jelly, 15 13 bytes

OVBs2UḄị“GCTA

Try it online! or verify all test cases.

How it works

OVBs2UḄị“GCTA    Main link. Argument: s (string)

O                Ordinal; replace each character with its code point.
 V               Eval. This converts the list to a string before evaluating, so it
                 returns the integer that results of concatenating all the digits.
  B              Binary; convert from integer to base 2.
   s2            Split into chunks of length 2.
     U           Upend; reverse the digits of each chunk.
                 Reversing means that we would have to conditionally PREPEND a zero
                 to the last chunk, which makes no difference for base conversion.
      Ḅ          Unbinary; convert each chunk from base 2 to integer.
                 `UḄ' maps:
                     [0, 1   ] -> [1,    0] -> 2
                     [1, 0(?)] -> [0(?), 1] -> 1
                     [1, 1   ] -> [1,    1] -> 3
                     [0, 0(?)] -> [0(?), 0] -> 0
       ị“GCTA    Replace each number by the character at that index.
                 Indexing is 1-based, so the indices are [1, 2, 3, 0].

Dennis

Posted 2016-05-02T19:04:46.360

Reputation: 196 637

9

CJam, 24 23 bytes

Thanks to Dennis for saving 1 byte in a really clever way. :)

l:isi2b2/Wf%2fb"AGCT"f=

Test it here.

Explanation

Very direct implementation of the specification. The only interesting bit is the padding to an even number of zeros (which was actually Dennis's idea). Instead of treating the digits in each pair in the usual order, we make the second bit the most significant one. That means, ending in a single bit is identical to appending a zero to it, which means we don't have to append the zero at all.

l          e# Read input.
:i         e# Convert to character codes.
si         e# Convert to flat string and back to integer.
2b         e# Convert to binary.
2/         e# Split into pairs.
Wf%        e# Reverse each pair.
2fb        e# Convert each pair back from binary, to get a value in [0 1 2 3].
"AGCT"f=   e# Select corresponding letter for each number.

Martin Ender

Posted 2016-05-02T19:04:46.360

Reputation: 184 808

I don't know anything about CJam, but why do you need to reverse each pair? Can you not convert them directly back from binary? – Value Ink – 2016-05-03T01:41:12.450

@KevinLau-notKenny Reversing each pair avoids appending zeroes to get an even length. In the reversed pairs, you'd have to prepend zeroes, which doesn't matter for base conversion. – Dennis – 2016-05-03T01:52:41.120

Nice trick! It would probably have saved a ton of bytes on my own solution if I had thought about that trick – Value Ink – 2016-05-03T02:05:08.780

6

Python 2, 109 103 bytes

lambda s,j=''.join:j('ACGT'[int(j(t),2)]for t in
zip(*[iter(bin(int(j(`ord(c)`for c in s))*2)[2:])]*2))

Test it on Ideone.

Dennis

Posted 2016-05-02T19:04:46.360

Reputation: 196 637

4

Python 3, 130 bytes.

Saved 2 bytes thanks to vaultah.
Saved 6 bytes thanks to Kevin Lau - not Kenny.

I hate how hard it is to convert to binary in python.

def f(x):c=bin(int(''.join(map(str,map(ord,x)))))[2:];return''.join('ACGT'[int(z+y,2)]for z,y in zip(*[iter(c+'0'*(len(c)%2))]*2))

Test cases:

assert f('codegolf') == 'GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA'
assert f('ppcg') == 'GGCTAATTGTCGCACTT'

Morgan Thrapp

Posted 2016-05-02T19:04:46.360

Reputation: 3 574

Looks like you have 1 extra pair of brackets after the second ''.join – vaultah – 2016-05-02T19:54:49.923

@vaultah Oops, yup, you're right. – Morgan Thrapp – 2016-05-02T19:55:36.047

Use 'ACGT'[int(z+y,2)] instead, converting directly out of binary instead of using your longer string and converting from base 10. Also, unsure how much difference it would make but look at using re.sub instead of your messy join trick? – Value Ink – 2016-05-02T19:56:21.787

@KevinLau-notKenny Oooo, thanks. I forgot you can specify a base with int. I'll look into re.sub, thanks for the suggestion. – Morgan Thrapp – 2016-05-02T19:58:45.830

Nice approach, I came up with (almost) exactly the same code without having looked at yours. :) – Byte Commander – 2016-05-03T11:48:30.950

4

Ruby, 59 bytes

$_='%b0'.%$_.bytes*''
gsub(/../){:ACGT[$&.hex%7]}
chomp'0'

A full program. Run with the -p flag.

xsot

Posted 2016-05-02T19:04:46.360

Reputation: 5 069

how did you even... i don't understand – Value Ink – 2016-05-03T18:45:13.610

3

Ruby, 80 bytes

->s{s=s.bytes.join.to_i.to_s 2;s+=?0*(s.size%2)
s.gsub(/../){"ACGT"[$&.to_i 2]}}

Value Ink

Posted 2016-05-02T19:04:46.360

Reputation: 10 608

As straightforward as the problem is, it's possible to squeeze a lot more bytes out of this :) – xsot – 2016-05-03T12:41:17.783

3

Mathematica, 108 bytes

{"A","C","G","T"}[[IntegerDigits[Mod[Floor@Log2@#,2,1]#&@FromDigits[""<>ToString/@ToCharacterCode@#],4]+1]]&

Takes a string as input, and outputs a list of bases.

LegionMammal978

Posted 2016-05-02T19:04:46.360

Reputation: 15 731

3

Python 3, 126 bytes

lambda v:"".join(["ACGT"[int(x,2)]for x in map(''.join,zip(*[iter((bin(int("".join([str(ord(i))for i in v])))+"0")[2:])]*2))])

Hunter VL

Posted 2016-05-02T19:04:46.360

Reputation: 321

Welcome to Programming Puzzles & Code Golf! In case you're wondering about the downvote, this is what happened.

– Dennis – 2016-05-15T02:44:32.907

2

Pyth, 25 bytes

sm@"ACGT"id2Pc.B*4sjkCMQ2

Try it here!

Explanation

Burrowing the padding trick from Martins CJam answer.

sm@"ACGT"id2Pc.B*4sjkCMQ2    # Q = input

                     CMQ     # Map each character of Q to its character code
                  sjk        # Join into one string and convert to an integer
              .B*4           # Mulitply with 4 and convert to binary
             c          2    # Split into pairs
            P                # Discard the last pair
 m                           # Map each pair d
         id2                 # Convert pair from binary to decimal
  @"ACGT"                    # Use the result ^ as index into a lookup string
s                            # Join the resulting list into on string

Denker

Posted 2016-05-02T19:04:46.360

Reputation: 6 639

2

05AB1E, 23 bytes

Code:

SÇJb00«2÷¨C3210"TGCA"‡á

Uses CP-1252 encoding. Try it online!.

Adnan

Posted 2016-05-02T19:04:46.360

Reputation: 41 965

2

Java, 194 bytes

String a(int[]a){String s="",r=s;for(int i:a)s+=i;s=new BigInteger(s).toString(2)+0;for(int i=0,y,n=48;i<(s.length()/2)*2;r+=s.charAt(i++)==n?y==n?'A':'G':y==n?'C':'T')y=s.charAt(i++);return r;}

Ungolfed

String a(int[] a) {
    String s = "", r = s;
    for (int i : a) s += i;
    s = new BigInteger(s).toString(2) + 0;
    for (int i = 0, y, n = 48; i < (s.length() / 2) * 2; 
        r += s.charAt(i++) == n 
                 ? y == n 
                 ? 'A' 
                 : 'G' 
                 : y == n 
                 ? 'C' 
                 : 'T')
        y = s.charAt(i++);
    return r;
}

Note

  • Input is an array of chars (which should count as a form of String), parameter is of type int[] because thats one byte saved over char[].

Output

Input:  codegolf
Output: GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA

Input:  .
Output: GTG

Input:  }
Output: TTGG

Input:  wow
Output: TGATAGTTGTGCTG

Input:  programming puzzles
Output: GTGTCAGAGTTGAAGGCCGTTCCGCAGTGCATTTGGCTCGTCTGGTGTCTACTAGCCTGCGAGAGGAGTTACTTTGGATCCTTGACTTGT

Marv

Posted 2016-05-02T19:04:46.360

Reputation: 839

2

MATL, 21 bytes

'CGTA'joV4Y2HZa2e!XB)

Try it online!

Explanation

'CGTA'   % Push string to be indexed into
j        % Take input string
o        % Convert each char to its ASCII code
V        % Convert to string (*). Numbers are separated by spaces
4Y2      % Push the string '0123456789'
H        % Push number 2
Za       % Convert string (*) from base '0123456789' to base 2, ignoring spaces
2e       % Reshape into a 2-column matrix, padding with a trailing 0 if needed
!        % Transpose
XB       % Convert from binary to decimal
)        % Index into string with the DNA letters. Indexing is 1-based and modular

Luis Mendo

Posted 2016-05-02T19:04:46.360

Reputation: 87 464

1

Pyth, 23 bytes

sm@"AGCT"i_d2c.BsjkCMQ2

Try it online!

Explanation

Borrowing the trick from Dennis' Jelly answer.

sm@"AGCT"i_d2c.BsjkCMQ2
                   CMQ   convert each character to its byte value
                sjk      convert to a string and then to integer
              .B         convert to binary
             c        2  chop into pairs
 m         d             for each pair:
          _                  reverse it
         i  2                convert from binary to integer
  @"AGCT"                    find its position in "AGCT"
s                        join the string

Leaky Nun

Posted 2016-05-02T19:04:46.360

Reputation: 45 011

1

Groovy, 114 bytes

{s->'ACGT'[(new BigInteger(((Byte[])s).join())*2).toString(2).toList().collate(2)*.with{0.parseInt(it.join(),2)}]}

Explanation:

{s->
    'ACGT'[ //access character from string
        (new BigInteger( //create Big Integer from string
           ((Byte[])s).join() //split string to bytes and then join to string
        ) * 2) //multiply by 2 to add 0 at the end in binary
        .toString(2) //change to binary string
        .toList() //split to characters
        .collate(2) //group characters by two
        *.with{
            0.parseInt(it.join(),2) //join every group and parse to decimal
        }
     ]
}

Krzysztof Atłasik

Posted 2016-05-02T19:04:46.360

Reputation: 189

Great answer! Can you please add an explanation please? – NoOneIsHere – 2016-05-03T17:46:16.200

First version was not working, because I forgot to append 0. I fixed it, and went down with bytes btw. – Krzysztof Atłasik – 2016-05-03T19:08:43.000

1

Python 2.7, 135 bytes

def f(A):g=''.join;B=bin(int(g(map(str,map(ord,A)))))[2:];B+=len(B)%2*'0';return g('ACGT'[int(B[i:i+2],2)] for i in range(len(B))[::2])

Ungolfed:

def f(A):
    g = ''.join
    B = bin(int(g(map(str,map(ord,A)))))[2:] # convert string input to binary
    B += len(B)%2 * '0' # add extra 0 if necessary
    return g('ACGT'[int(B[i:i+2],2)] for i in range(len(B))[::2]) # map every two characters into 'ACGT'

Output

f('codegolf')
'GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA'

deustice

Posted 2016-05-02T19:04:46.360

Reputation: 61

@DrGreenEggsandHamDJ I have the g(...) function in there twice, so I believe replacing it with the join would add 2 bytes? – deustice – 2016-05-03T19:55:12.760

Ah, I missed that. My bad! – James – 2016-05-03T19:56:06.403

1

Julia 0.4, 77 bytes

s->replace(bin(BigInt(join(int(s)))),r"..?",t->"AGCT"[1+int("0b"reverse(t))])

This anonymous function takes a character array as input and returns a string.

Try it online!

Dennis

Posted 2016-05-02T19:04:46.360

Reputation: 196 637

1

J, 52 bytes

 3 :'''ACGT''{~#._2,\#:".,&''x''":(,&:(":"0))/3&u:y'

Usage: 3 :'''ACGT''{~#._2,\#:".,&''x''":(,&:(":"0))/3&u:y' 'codegolf' ==> GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA

ljeabmreosn

Posted 2016-05-02T19:04:46.360

Reputation: 341

1

Javascript ES7, 105 103 bytes

s=>((+[for(c of s)c.charCodeAt()].join``).toString(2)+'0').match(/../g).map(x=>"ACGT"['0b'+x-0]).join``

The ES7 part is the for(c of s) part.

ES6 version, 107 105 bytes

s=>((+[...s].map(c=>c.charCodeAt()).join``).toString(2)+'0').match(/../g).map(x=>"ACGT"['0b'+x-0]).join``

Ungolfed code

dna = (str)=>{
  var codes = +[for(c of str)c.charCodeAt()].join``;
  var binaries = (codes.toString(2)+'0').match(/../g);
  return binaries.map(x=>"ACGT"['0b'+x-0]).join``
}

This is my first try at golfing on PPCG, feel free to correct me if something's wrong.

Thanks @AlexA for the small improvement.

BusyBeingDelicious

Posted 2016-05-02T19:04:46.360

Reputation: 189

1This is a nice first golf! Since the function isn't recursive and we don't require functions to be named, so you should be able to remove f=, saving 2 bytes. :) – Alex A. – 2016-05-15T21:16:21.443

1

Common Lisp (Lispworks), 415 bytes

(defun f(s)(labels((p(e f)(concatenate'string e f)))(let((b"")(d""))(dotimes(i(length s))(setf b(p b(write-to-string(char-int(elt s i))))))(setf b(write-to-string(parse-integer b):base 2))(if(oddp #1=(length b))(setf b(p b"0")))(do((j 0(+ j 2)))((= j #1#)d)(let((c(subseq b j(+ j 2))))(cond((#2=string="00"c)(setf d(p d"A")))((#2#"01"c)(setf d(p d"C")))((#2#"10"c)(setf d(p d"G")))((#2#"11"c)(setf d(p d"T")))))))))

ungolfed:

(defun f (s)
  (labels ((p (e f)
             (concatenate 'string e f)))
  (let ((b "") (d ""))
    (dotimes (i (length s))
      (setf b
            (p b
               (write-to-string
                (char-int (elt s i))))))
    (setf b (write-to-string (parse-integer b) :base 2))
    (if (oddp #1=(length b))
        (setf b (p b "0")))
      (do ((j 0 (+ j 2)))
          ((= j #1#) d)
        (let ((c (subseq b j (+ j 2))))
          (cond ((#2=string=  "00" c)
                 (setf d (p d "A")))
                ((#2# "01" c)
                 (setf d (p d "C")))
                ((#2# "10" c)
                 (setf d (p d "G")))
                ((#2# "11" c)
                 (setf d (p d "T")))))))))

Usage:

CL-USER 2060 > (f "}")
"TTGG"

CL-USER 2061 > (f "golf")
"TAAAAATTATCCATAAATA"

sadfaf

Posted 2016-05-02T19:04:46.360

Reputation: 101

0

Perl, 155 148 137 + 1 (-p flag) = 138 bytes

#!perl -p
s/./ord$&/sge;while($_){/.$/;$s=$&%2 .$s;$t=$v="";$t.=$v+$_/2|0,$v=$_%2*5
for/./g;s/^0// if$_=$t}$_=$s;s/(.)(.)?/([A,C],[G,T])[$1][$2]/ge

Test it on Ideone.

Denis Ibaev

Posted 2016-05-02T19:04:46.360

Reputation: 876

0

Perl 6, 57 + 1 (-p flag) = 58 bytes

$_=(+[~] .ords).base(2);s:g/..?/{<A G C T>[:2($/.flip)]}/

Step by step explanation:

-p flag causes Perl 6 interpreter to run code line by line, put current line $_, and at end put it back from $_.

.ords - If there is nothing before a period, a method is called on $_. ords method returns list of codepoints in a string.

[~] - [] is a reduction operator, which stores its reduction operator between brackets. In this case, it's ~, which is a string concatenation operator. For example, [~] 1, 2, 3 is equivalent to 1 ~ 2 ~ 3.

+ converts its argument to a number, needed because base method is only defined for integers.

.base(2) - converts an integer to a string in base 2

$_= - assigns the result to $_.

s:g/..?/{...}/ - this is a regular expression replacing any (:g, global mode) instance of regex ..? (one or two characters). The second argument is a replacement pattern, which in this case in code (in Perl 6, curly brackets in strings and replacement patterns are executed as code).

$/ - a regex match variable

.flip - inverts a string. It implicitly converts $/ (a regex match object) to a string. This is because a single character 1 should be expanded to 10, as opposed to 01. Because of that flip, order of elements in array has G and C reversed.

:2(...) - parses a base-2 string into an integer.

<A G C T> - array of four elements.

...[...] - array access operator.

What does that mean? The program gets list of all codepoints in a string, concatenates them together, converts them to base 2. Then, it replaces all instances of two or one character into one of letters A, G, C, T depending on flipped representation of a number in binary.

Konrad Borowski

Posted 2016-05-02T19:04:46.360

Reputation: 11 185

0

Hoon, 148 138 bytes

|*
*
=+
(scan (reel +< |=({a/@ b/tape} (weld <a> b))) dem)
`tape`(flop (turn (rip 1 (mul - +((mod (met 0 -) 2)))) |=(@ (snag +< "ACGT"))))

"abc" is a list of atoms. Interpolate them into strings (<a>) while folding over the list, joining them together into a new string. Parse the number with ++dem to get it back to an atom.

Multiply the number by (bitwise length + 1) % 2 to pad it. Use ++rip to disassemble every two byte pair of the atom into a list, map over the list and use the number as an index into the string "ACGT".

> =a |*
  *
  =+
  (scan (reel +< |=({a/@ b/tape} (weld <a> b))) dem)
  `tape`(flop (turn (rip 1 (mul - +((mod (met 0 -) 2)))) |=(@ (snag +< "ACGT"))))
> (a "codegolf")
"GGCTTGCGGCCGGAGACGCGGTCTGACGCCTTGTAAATA"
> (a "ppcg")
"GGCTAATTGTCGCACTT"
> (a "}")
"TTGG"

RenderSettings

Posted 2016-05-02T19:04:46.360

Reputation: 620