0
I'm wondering if someone could help me with a specific coding question. I have a DNA sequencing file that reads something like this (as an example):
Plate1A1_R1_AGTAGTACGACTAGCATCAGCATACGATCAGCATCAGCATCAG
Plate1A1_R1_GTAGATCGATGCATGCATGCTAGCTAGCTAGCTAGCTAGCTAA
Plate1A1_R1_AGCTAGCATCGATCGATGCTAGCATGCATCGATCGATGCATGC
Plate1A1_R2_AGCATCGATGCAGCATGCTAGCTAGCTAGCTAGCAGCTAGTCT
Plate1A1_R2_AGCATGCATCGATCGTAGCTAGCAGCGAGCGGCATCGATCGAT
Plate1A2_R1_CAGCTAGATGCATCGATCGATCGATCGATCGATGCTAGCTTAC
Plate1A2_R1_CAGTAGCATGCATGCATGCATGCATGCATCGATGCTAGCTAGC
Plate1A2_R1_ACAACGTAGCTAGCTAGCTACTACTAGTCATCATCGATGCTAG
Plate1A2_R1_CAGCTAGCTAGCTAGCTAGGCTACATCGATCGTAGCTAGTCGA
Plate1A2_R1_CAGTCAGCATGCTATCGATCGTAGCTAGTCATCGATGTAGTGA
....etc.
You can see that there are lines that belong to the same similar starting pattern (here: Plate1A1_R1, Plate1A1_R2, Plate1A2_R1). I'd like to place a blank line after each grouping, e.g.:
Plate1A1_R1_AGTAGTACGACTAGCATCAGCATACGATCAGCATCAGCATCAG
Plate1A1_R1_GTAGATCGATGCATGCATGCTAGCTAGCTAGCTAGCTAGCTAA
Plate1A1_R1_AGCTAGCATCGATCGATGCTAGCATGCATCGATCGATGCATGC
Plate1A1_R2_AGCATCGATGCAGCATGCTAGCTAGCTAGCTAGCAGCTAGTCT
Plate1A1_R2_AGCATGCATCGATCGTAGCTAGCAGCGAGCGGCATCGATCGAT
Plate1A2_R1_CAGCTAGATGCATCGATCGATCGATCGATCGATGCTAGCTTAC
Plate1A2_R1_CAGTAGCATGCATGCATGCATGCATGCATCGATGCTAGCTAGC
Plate1A2_R1_ACAACGTAGCTAGCTAGCTACTACTAGTCATCATCGATGCTAG
Plate1A2_R1_CAGCTAGCTAGCTAGCTAGGCTACATCGATCGTAGCTAGTCGA
Plate1A2_R1_CAGTCAGCATGCTATCGATCGTAGCTAGTCATCGATGTAGTGA
....etc.
This means I need to be able to grab the first 11 characters of each line, search for where that pattern no longer occurs in the line below, and insert a blank line at that point.
I've tried sed and awk efforts with 'while read line' loops, but can't seem to find a way to hold the first 11 characters in a search variable to be used through the consecutive lines of a text file, if that search variable is 'stuck' in the processing of an individual line.
I'm hoping someone can help with a solution that would allow the referenced file to be accessed with a redirect (<
) (with hundreds of lines of DNA sequence data in this format, and a couple of hundred distinct 'plate names' defined as the script moves through the file line-by-line), e.g. while read line ; do echo "${line:0:11}" ; done < filename.txt
Please take a look at: What should I do when someone answers my question?
– Cyrus – 2016-03-25T08:28:13.547