unix command to verify span of word in text


What unix command(s) can I use to determine the line span that a word appears in text? The "span" being equal to the line number of the last instance of a word minus the line number of the first instance of the word.

1| unix is on two lines
2| once above, and once below
3| unix

In the example above the "span" of 'unix' would be 2 (3-1).

So far I've been trying to make use of grep -n but I don't think that grep is powerful enough. Maybe some use of sed or awk?


Ocasta Eshu

Posted 2012-06-27T02:14:15.613

Reputation: 778

1Althoug I already answered. The span will be 2 because there are two lines in which the unix word appear or because the unix word apears two times in the same line? – fmanco – 2012-06-27T02:20:01.957

Span = (last line with 'unix' - first line with 'unix') so because 'unix' appears on lines 1,2,3 (or 0,1,2 if you prefer) 3-1 equals 2 (or again 2-0=2), so the "span" is 2. Sorry that wasn't clear. – Ocasta Eshu – 2012-06-27T02:45:36.167

Post edited for clarity. – Ocasta Eshu – 2012-06-27T02:55:01.303



Using awk


awk '{ if($0 ~ /PATTERN/) { if(!FIRST) FIRST=NR; LAST=NR } } END { print LAST-FIRST }' FILE

How it works

  • awk '{ COMMANDS } END { FINALCOMMAND }' FILE executes COMMMANDS for every line of FILE.

    Afterwards, it executes FINALCOMMAND.

  • if($0 ~ /PATTERN/) { ... } checks if PATTERN occurs in the line ($0).

    If it does, ... gets executed.

  • The first time the pattern occurs,FIRST` will be empty.

    Therefore, if(!FIRST) FIRST=NR will store the line number (NR) in FIRST.

  • For every occurrence, LAST=NR will store the line number (NR) in LAST.

    After processing all occurrences, LAST will hold the line number of the last occurrence.

  • print LAST-FIRST prints the difference between the last and first line number.

Using only grep, head and tail


FIRST=$(echo "$MATCHES" | head -n 1 | grep -Po "^\d+"); [ $FIRST ] || FIRST=0
LAST=$(echo "$MATCHES" | tail -n 1 | grep -Po "^\d+"); [ $LAST ] || LAST=0

How it works

  • grep -n PATTERN FILE shows all lines in FILE matching PATTERN, preceded by their line number.

  • echo "$MATCHES" | head -n 1 shows the first line of MATCHES, and grep -Po "^ *\d+" filters out everything but the line number.

    Afterwards. [ $FIRST ] || FIRST=0 checks if FIRST has been defined. If it hasn't, it gets set to 0.

  • echo "$MATCHES" | tail -n 1 shows the last line of MATCHES, and grep -Po "^ *\d+" filters out everything but the line number.

    Afterwards. [ $LAST ] || LAST=0 checks if LAST has been defined. If it hasn't, it gets set to 0.

  • $(($LAST - $FIRST)) calculates the difference between the last and first line number.


Posted 2012-06-27T02:14:15.613

Reputation: 42 934


This will find the span between the first and last occurrence of a word (ie. intermediate words are not considered)...

Note: The sed commands i and a (insert and append) must be the last command on a line.

eval "$(sed -ne "1 i b=
                 /\<$word\>/{=; i ;e=
                 $ {a ;echo \$((e-b))
                " "$file" | tr -d '\n')"

Or this one, which pipes sed to sed, but is perhaps simpler.

eval "$(sed -n "/\<$word\>/=" "$file" |
        sed -n '1{i b=
             p;   a;echo \$((e-b))
              }' | tr -d '\n')"   


Posted 2012-06-27T02:14:15.613

Reputation: 2 743


This might work for you:

sed '/unix/=;d' file | sed '1h;$!d;G;s/\n/-/' | bc


Posted 2012-06-27T02:14:15.613

Reputation: 156