Check if a pattern appears in a DNA sequence

2

1

In a project I am working on, I am checking to see if certain DNA sequences appear in certain genes in E. coli. I have written a program in Java that performs the below features. However, since I wrote my program in Java, the universe needs to be corrected and rid of verbosity by having you write your code in as few bytes as possible.

My program took a DNA sequences and a regex as input (for this challenge, order doesn't matter). The DNA sequence contains only the letters A, C, G, and T. It will be determined if the DNA sequence contains any subsequences that follow the regex, and the output of the program will be the one-indexed location(s) of the first character of each matching subsequence in the DNA sequence in your choice of format. If there is no such location, you may output any reasonable indicator of this.

The regex I used had the following specifications.

  • A, C, G, and T match to themselves.
  • N matches to any of A, C, G, and T
  • (X/Y) means that at the current location in the sequence, the next character(s) in the sequence match X or Y (note that X and Y can be multiple characters long, and that there can be multiple slashes; for example, (AA/AG/GA) matches to AA, AG or GG; however, X and Y must have the same length)

You may assume that the DNA sequence is longer than any sequence that follows the specifications of the regex, and that both the DNA and the regex are valid.


Test cases

Here, the DNA sequence is first, and the regex is second.

AAAAAAAAAA
AAA
==> 1 2 3 4 5 6 7 8

ACGTAATGAA
A(A/C)(T/G/A)
==> 1 5

GGGGCAGCAGCTGACTA
C(NA/AN)
==> 5 8 15

ACGTTAGTTAGCGTGATCGTG
CGTNA
==> 2 12

This is . The shortest answer in bytes wins. Standard rules apply.

Arcturus

Posted 2016-06-09T21:08:50.713

Reputation: 6 537

Inb4 someone asks me why I didn't use Java's native regex system; I don't know. – Arcturus – 2016-06-09T21:09:34.623

Answers

1

Python 3, 125 bytes

import re,sys
a=sys.argv
s=re.sub
print([i+1 for i in range(len(a[1]))if re.match(s('N','[ACGT]',s('/','|',a[2])),a[1][i:])])

RootTwo

Posted 2016-06-09T21:08:50.713

Reputation: 1 749

Can you use a=input() instead of the sys stuff? – Rɪᴋᴇʀ – 2016-07-02T13:36:41.897

0

Pyth, 28 bytes

fT*VSlQ_:RX+\^z"/N.|")0_M.__

Simple regex matching.

I hope there had a function which returns the position of each match, but there is not.

Test suite.

Leaky Nun

Posted 2016-06-09T21:08:50.713

Reputation: 45 011