Insert a string or blank line after specific search criteria, in a loop

I'm wondering if someone could help me with a specific coding question. I have a DNA sequencing file that reads something like this (as an example):

Plate1A1_R1_AGTAGTACGACTAGCATCAGCATACGATCAGCATCAGCATCAG
Plate1A1_R1_GTAGATCGATGCATGCATGCTAGCTAGCTAGCTAGCTAGCTAA
Plate1A1_R1_AGCTAGCATCGATCGATGCTAGCATGCATCGATCGATGCATGC
Plate1A1_R2_AGCATCGATGCAGCATGCTAGCTAGCTAGCTAGCAGCTAGTCT
Plate1A1_R2_AGCATGCATCGATCGTAGCTAGCAGCGAGCGGCATCGATCGAT
Plate1A2_R1_CAGCTAGATGCATCGATCGATCGATCGATCGATGCTAGCTTAC
Plate1A2_R1_CAGTAGCATGCATGCATGCATGCATGCATCGATGCTAGCTAGC
Plate1A2_R1_ACAACGTAGCTAGCTAGCTACTACTAGTCATCATCGATGCTAG
Plate1A2_R1_CAGCTAGCTAGCTAGCTAGGCTACATCGATCGTAGCTAGTCGA
Plate1A2_R1_CAGTCAGCATGCTATCGATCGTAGCTAGTCATCGATGTAGTGA
....etc.

You can see that there are lines that belong to the same similar starting pattern (here: Plate1A1_R1, Plate1A1_R2, Plate1A2_R1). I'd like to place a blank line after each grouping, e.g.:

Plate1A1_R1_AGTAGTACGACTAGCATCAGCATACGATCAGCATCAGCATCAG
Plate1A1_R1_GTAGATCGATGCATGCATGCTAGCTAGCTAGCTAGCTAGCTAA
Plate1A1_R1_AGCTAGCATCGATCGATGCTAGCATGCATCGATCGATGCATGC

Plate1A1_R2_AGCATCGATGCAGCATGCTAGCTAGCTAGCTAGCAGCTAGTCT
Plate1A1_R2_AGCATGCATCGATCGTAGCTAGCAGCGAGCGGCATCGATCGAT

Plate1A2_R1_CAGCTAGATGCATCGATCGATCGATCGATCGATGCTAGCTTAC
Plate1A2_R1_CAGTAGCATGCATGCATGCATGCATGCATCGATGCTAGCTAGC
Plate1A2_R1_ACAACGTAGCTAGCTAGCTACTACTAGTCATCATCGATGCTAG
Plate1A2_R1_CAGCTAGCTAGCTAGCTAGGCTACATCGATCGTAGCTAGTCGA
Plate1A2_R1_CAGTCAGCATGCTATCGATCGTAGCTAGTCATCGATGTAGTGA

....etc.

This means I need to be able to grab the first 11 characters of each line, search for where that pattern no longer occurs in the line below, and insert a blank line at that point.

I've tried sed and awk efforts with 'while read line' loops, but can't seem to find a way to hold the first 11 characters in a search variable to be used through the consecutive lines of a text file, if that search variable is 'stuck' in the processing of an individual line.

I'm hoping someone can help with a solution that would allow the referenced file to be accessed with a redirect (<) (with hundreds of lines of DNA sequence data in this format, and a couple of hundred distinct 'plate names' defined as the script moves through the file line-by-line), e.g. while read line ; do echo "${line:0:11}" ; done < filename.txt

kehmsen

Posted 2016-03-25T01:17:27.683

Reputation: 1

Please take a look at: What should I do when someone answers my question?

– Cyrus – 2016-03-25T08:28:13.547

Insert a string or blank line after specific search criteria, in a loop

Answers