A version that doesn't require looping, and uses only four calls to sed. Granted though, my version doesn't check that the two numerics are equal. In fact, the second one is ignored and can even be omitted, as with "Gene Code (91K - Q) D2 fragment, D74F"
. Also the low bound and high bound can appear in either order. If the low bound is greater than the high bound, then the output sequence is reversed.
$ cat foo
#!/usr/bin/env bash
# Script to expand $1 passed as:
# "Gene Code (91K - 91Q) D2 fragment, D74F"
#
# into the output:
#
# Gene Code, 91K, D2 fragment, D74F
# Gene Code, 91L, D2 fragment, D74F
# Gene Code, 91M, D2 fragment, D74F
# Gene Code, 91N, D2 fragment, D74F
# Gene Code, 91O, D2 fragment, D74F
# Gene Code, 91P, D2 fragment, D74F
# Gene Code, 91Q, D2 fragment, D74F
# Copy $1 into FMT_STRING, replacing the " (91K - 91Q)" bit with a ', %s,'
# printf directive, such as 'Gene Code, %s, D2 fragment, D74F':
FMT_STRING="$(sed -e 's/ (.* - .*)/, %s,/' <<< "$1")"
# Parse the beginning and ending bounds and format them with just a
# space between, such as '91K 91Q':
BOUNDS="$(sed -e 's/^[^(]*(\(.*\) - \(.*\)) .*/\1 \2/' <<< "$1")"
# Extract the (first) static numeric part from BOUNDS, e.g. '91'
NUMERIC="$(sed -e 's/[^0-9].*//' <<< "$BOUNDS")"
# remove all digits [0-9] from BOUNDS, e.g. 'K Q'
BOUNDS="$(sed -e 's/[0-9]//g' <<< "$BOUNDS")"
FMT_STRING="$(printf "$FMT_STRING" "${NUMERIC}%c")"
jot -w "$FMT_STRING" - $BOUNDS
Sample output:
$ ./foo "Gene Code (737L - 737X) D2 fragment, D74F"
Gene Code, 737L, D2 fragment, D74F
Gene Code, 737M, D2 fragment, D74F
Gene Code, 737N, D2 fragment, D74F
Gene Code, 737O, D2 fragment, D74F
Gene Code, 737P, D2 fragment, D74F
Gene Code, 737Q, D2 fragment, D74F
Gene Code, 737R, D2 fragment, D74F
Gene Code, 737S, D2 fragment, D74F
Gene Code, 737T, D2 fragment, D74F
Gene Code, 737U, D2 fragment, D74F
Gene Code, 737V, D2 fragment, D74F
Gene Code, 737W, D2 fragment, D74F
Gene Code, 737X, D2 fragment, D74F
Reversing the bounds reverses the output:
$ ./foo "Gene Code (737X - 737L) D2 fragment, D74F"
Gene Code, 737X, D2 fragment, D74F
Gene Code, 737W, D2 fragment, D74F
Gene Code, 737V, D2 fragment, D74F
Gene Code, 737U, D2 fragment, D74F
Gene Code, 737T, D2 fragment, D74F
Gene Code, 737S, D2 fragment, D74F
Gene Code, 737R, D2 fragment, D74F
Gene Code, 737Q, D2 fragment, D74F
Gene Code, 737P, D2 fragment, D74F
Gene Code, 737O, D2 fragment, D74F
Gene Code, 737N, D2 fragment, D74F
Gene Code, 737M, D2 fragment, D74F
Gene Code, 737L, D2 fragment, D74F
Is this performance-sensitive? An easy solution with a for loop would be not very fast. – Eugen Rieck – 2018-12-30T21:53:04.223